BOM
Byte Order Mark
A special Unicode character at the start of a file indicating its encoding and byte order.
Chi tiết kỹ thuật
BOM relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.
Ví dụ
```javascript
// UTF-8 encode/decode
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8');
const bytes = encoder.encode('Hello 世界');
// → Uint8Array [72, 101, ..., 228, 184, 150, 231, 149, 140]
decoder.decode(bytes); // 'Hello 世界'
```
Công cụ liên quan
Thuật ngữ liên quan
Plain Text
Rich Text
Line Ending
Word Count
Case Conversion
Slug
Whitespace
String Interpolation
Escape Character
Unicode
ASCII
Lorem Ipsum
Truncation
Stemming
Tokenization
N-gram
Readability Score
String Distance
Text Encoding
Diacritics
Ligature
Kerning
Leading
CJK
RTL
Normalization (Text)
Grep
Transliteration
ROT13
Text Diff