All References

Unicode Table

Unicode character ranges, code points, and encoding formats. Quick reference for block ranges, planes, UTF-8/UTF-16 encoding, and common symbols.

What is Unicode?

Unicode is the universal character standard that assigns a unique number (code point) to every character in every writing system. As of Unicode 15.1, it covers 149,813 characters across 161 scripts.

Code points are written as U+XXXX in hexadecimal. For example, the Latin capital letter A is U+0041, the Euro sign is U+20AC, and 😀 is U+1F600.

Unicode defines the code points; UTF-8, UTF-16, and UTF-32 are encodings that specify how to store those code points as bytes.

Unicode Planes

Plane Range Name Contents
0U+0000–U+FFFFBasic Multilingual Plane (BMP)Most common characters; all modern scripts
1U+10000–U+1FFFFSupplementary Multilingual Plane (SMP)Historic scripts, emoji, musical notation, math
2U+20000–U+2FFFFSupplementary Ideographic Plane (SIP)Rare CJK unified ideographs
3U+30000–U+3FFFFTertiary Ideographic Plane (TIP)Oracle bone script and archaic CJK
14U+E0000–U+EFFFFSupplementary Special-Purpose PlaneLanguage tags, variation selectors
15–16U+F0000–U+10FFFFPrivate Use Area (PUA)Custom characters; no standard meaning

Common Character Blocks

Block Range Count Examples
Basic Latin (ASCII)U+0000–U+007F128A–Z, a–z, 0–9, punctuation
Latin-1 SupplementU+0080–U+00FF128à, é, ü, ñ, ©, ®, °
Latin Extended-AU+0100–U+017F128Central/Eastern European Latin
Latin Extended-BU+0180–U+024F208African, phonetic Latin extensions
Greek and CopticU+0370–U+03FF144α, β, γ, Ω, π
CyrillicU+0400–U+04FF256А, Б, В…Я (Russian, Bulgarian, etc.)
HebrewU+0590–U+05FF112א, ב, ג… (right-to-left)
ArabicU+0600–U+06FF256ا, ب, ت… (right-to-left)
DevanagariU+0900–U+097F128Hindi, Sanskrit, Marathi
HiraganaU+3040–U+309F96あ, い, う, え, お…
KatakanaU+30A0–U+30FF96ア, イ, ウ, エ, オ…
CJK Unified IdeographsU+4E00–U+9FFF20,902Chinese, Japanese kanji, Korean hanja
ArrowsU+2190–U+21FF112←, →, ↑, ↓, ⟶, ⟹
Mathematical OperatorsU+2200–U+22FF256∀, ∃, ∈, ∑, ∞, ≤, ≥, ≠
Miscellaneous SymbolsU+2600–U+26FF256☀, ☁, ☂, ♠, ♥, ★, ✓
Emoticons (Emoji)U+1F600–U+1F64F80😀, 😂, ❤️, 🙏
Misc Symbols & PictographsU+1F300–U+1F5FF768🌍, 🌈, 🎉, 🏠, 🚀
Supplemental SymbolsU+1F900–U+1F9FF256🤖, 🧠, 🦊, 🧩

Encoding Formats

Encoding Bytes per char Notes
UTF-8 1–4 bytes U+0000–U+007F = 1 byte (ASCII compatible). U+0080–U+07FF = 2 bytes. U+0800–U+FFFF = 3 bytes. U+10000–U+10FFFF = 4 bytes. Default encoding for the web.
UTF-16 2 or 4 bytes BMP characters = 2 bytes. Supplementary (above U+FFFF) use surrogate pairs (2 × 2 bytes). Used by JavaScript strings internally, Windows APIs.
UTF-32 4 bytes (fixed) Every character uses exactly 4 bytes. Simple but memory-inefficient. Used internally by Python 3 strings (on some builds) and Linux/macOS wchar_t.

Common Symbols Quick Reference

Char Code Point HTML Entity Name
©U+00A9©Copyright Sign
®U+00AE®Registered Sign
U+2122™Trade Mark Sign
°U+00B0°Degree Sign
U+20AC€Euro Sign
£U+00A3£Pound Sign
¥U+00A5¥Yen Sign
U+2192→Rightwards Arrow
U+2190←Leftwards Arrow
U+221E∞Infinity
U+2713✓Check Mark
U+2717✗Ballot X
U+2026…Horizontal Ellipsis
U+2014—Em Dash

Look up any character by name, code point, or category with the Unicode Lookup tool.

Frequently Asked Questions

What is the difference between Unicode and UTF-8?

Unicode is the character standard — it assigns a unique number (code point) to every character. UTF-8 is an encoding — it's the algorithm for converting those code points to bytes for storage and transmission. Other encodings (UTF-16, UTF-32) represent the same Unicode code points differently. "UTF-8" and "Unicode" are often used interchangeably in casual speech, but technically UTF-8 is just one way to encode Unicode.

How many characters does Unicode support?

Unicode can represent up to 1,114,112 code points (U+0000 to U+10FFFF). As of Unicode 15.1 (2023), approximately 149,813 of those positions are assigned to characters. The rest are either reserved for future use or designated as private use areas.

What is a surrogate pair in UTF-16?

UTF-16 uses 2 bytes per character for the Basic Multilingual Plane (U+0000–U+FFFF). For supplementary characters (emoji, rare CJK, etc.) above U+FFFF, it uses two 2-byte units called a surrogate pair: a high surrogate (U+D800–U+DBFF) followed by a low surrogate (U+DC00–U+DFFF). Together they encode one supplementary code point. This is why JavaScript's string.length can report 2 for a single emoji — it counts UTF-16 code units, not code points.