BOM — Thuật ngữ — Peasy Formats

A special Unicode character at the start of a file indicating its encoding and byte order.

Chi tiết kỹ thuật

BOM relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.

Ví dụ

```javascript
// UTF-8 encode/decode
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8');

const bytes = encoder.encode('Hello 世界');
// → Uint8Array [72, 101, ..., 228, 184, 150, 231, 149, 140]

decoder.decode(bytes);  // 'Hello 世界'
```

Công cụ liên quan

Đ Đếm Từ C Chuyển Đổi Kiểu Chữ S Sắp Xếp Dòng T Tạo Lorem Ipsum T Tạo Slug T Tìm Và Thay Thế X Xóa Dòng Trùng Lặp M Mã Hóa/Giải Mã Base64 M Mã Hóa/Giải Mã URL Đ Định Dạng JSON M Mã Hóa/Giải Mã HTML Entity Đ Đảo Ngược Văn Bản T Thêm/Xóa Số Dòng S So Sánh Văn Bản T Trích Xuất Văn Bản

Thuật ngữ liên quan

Plain Text Rich Text Line Ending Word Count Case Conversion Slug Whitespace String Interpolation Escape Character Unicode ASCII Lorem Ipsum Truncation Stemming Tokenization N-gram Readability Score String Distance Text Encoding Diacritics Ligature Kerning Leading CJK RTL Normalization (Text) Grep Transliteration ROT13 Text Diff