Unicode Table
Unicode character ranges, code points, and encoding formats. Quick reference for block ranges, planes, UTF-8/UTF-16 encoding, and common symbols.
What is Unicode?
Unicode is the universal character standard that assigns a unique number (code point) to every character in every writing system. As of Unicode 15.1, it covers 149,813 characters across 161 scripts.
Code points are written as U+XXXX in hexadecimal. For example, the Latin capital letter A is U+0041, the Euro sign is U+20AC, and 😀 is U+1F600.
Unicode defines the code points; UTF-8, UTF-16, and UTF-32 are encodings that specify how to store those code points as bytes.
Unicode Planes
| Plane | Range | Name | Contents |
|---|---|---|---|
| 0 | U+0000–U+FFFF | Basic Multilingual Plane (BMP) | Most common characters; all modern scripts |
| 1 | U+10000–U+1FFFF | Supplementary Multilingual Plane (SMP) | Historic scripts, emoji, musical notation, math |
| 2 | U+20000–U+2FFFF | Supplementary Ideographic Plane (SIP) | Rare CJK unified ideographs |
| 3 | U+30000–U+3FFFF | Tertiary Ideographic Plane (TIP) | Oracle bone script and archaic CJK |
| 14 | U+E0000–U+EFFFF | Supplementary Special-Purpose Plane | Language tags, variation selectors |
| 15–16 | U+F0000–U+10FFFF | Private Use Area (PUA) | Custom characters; no standard meaning |
Common Character Blocks
| Block | Range | Count | Examples |
|---|---|---|---|
| Basic Latin (ASCII) | U+0000–U+007F | 128 | A–Z, a–z, 0–9, punctuation |
| Latin-1 Supplement | U+0080–U+00FF | 128 | à, é, ü, ñ, ©, ®, ° |
| Latin Extended-A | U+0100–U+017F | 128 | Central/Eastern European Latin |
| Latin Extended-B | U+0180–U+024F | 208 | African, phonetic Latin extensions |
| Greek and Coptic | U+0370–U+03FF | 144 | α, β, γ, Ω, π |
| Cyrillic | U+0400–U+04FF | 256 | А, Б, В…Я (Russian, Bulgarian, etc.) |
| Hebrew | U+0590–U+05FF | 112 | א, ב, ג… (right-to-left) |
| Arabic | U+0600–U+06FF | 256 | ا, ب, ت… (right-to-left) |
| Devanagari | U+0900–U+097F | 128 | Hindi, Sanskrit, Marathi |
| Hiragana | U+3040–U+309F | 96 | あ, い, う, え, お… |
| Katakana | U+30A0–U+30FF | 96 | ア, イ, ウ, エ, オ… |
| CJK Unified Ideographs | U+4E00–U+9FFF | 20,902 | Chinese, Japanese kanji, Korean hanja |
| Arrows | U+2190–U+21FF | 112 | ←, →, ↑, ↓, ⟶, ⟹ |
| Mathematical Operators | U+2200–U+22FF | 256 | ∀, ∃, ∈, ∑, ∞, ≤, ≥, ≠ |
| Miscellaneous Symbols | U+2600–U+26FF | 256 | ☀, ☁, ☂, ♠, ♥, ★, ✓ |
| Emoticons (Emoji) | U+1F600–U+1F64F | 80 | 😀, 😂, ❤️, 🙏 |
| Misc Symbols & Pictographs | U+1F300–U+1F5FF | 768 | 🌍, 🌈, 🎉, 🏠, 🚀 |
| Supplemental Symbols | U+1F900–U+1F9FF | 256 | 🤖, 🧠, 🦊, 🧩 |
Encoding Formats
| Encoding | Bytes per char | Notes |
|---|---|---|
| UTF-8 | 1–4 bytes | U+0000–U+007F = 1 byte (ASCII compatible). U+0080–U+07FF = 2 bytes. U+0800–U+FFFF = 3 bytes. U+10000–U+10FFFF = 4 bytes. Default encoding for the web. |
| UTF-16 | 2 or 4 bytes | BMP characters = 2 bytes. Supplementary (above U+FFFF) use surrogate pairs (2 × 2 bytes). Used by JavaScript strings internally, Windows APIs. |
| UTF-32 | 4 bytes (fixed) | Every character uses exactly 4 bytes. Simple but memory-inefficient. Used internally by Python 3 strings (on some builds) and Linux/macOS wchar_t. |
Common Symbols Quick Reference
| Char | Code Point | HTML Entity | Name |
|---|---|---|---|
| © | U+00A9 | © | Copyright Sign |
| ® | U+00AE | ® | Registered Sign |
| ™ | U+2122 | ™ | Trade Mark Sign |
| ° | U+00B0 | ° | Degree Sign |
| € | U+20AC | € | Euro Sign |
| £ | U+00A3 | £ | Pound Sign |
| ¥ | U+00A5 | ¥ | Yen Sign |
| → | U+2192 | → | Rightwards Arrow |
| ← | U+2190 | ← | Leftwards Arrow |
| ∞ | U+221E | ∞ | Infinity |
| ✓ | U+2713 | ✓ | Check Mark |
| ✗ | U+2717 | ✗ | Ballot X |
| … | U+2026 | … | Horizontal Ellipsis |
| — | U+2014 | — | Em Dash |
Look up any character by name, code point, or category with the Unicode Lookup tool.
Frequently Asked Questions
What is the difference between Unicode and UTF-8?
Unicode is the character standard — it assigns a unique number (code point) to every character. UTF-8 is an encoding — it's the algorithm for converting those code points to bytes for storage and transmission. Other encodings (UTF-16, UTF-32) represent the same Unicode code points differently. "UTF-8" and "Unicode" are often used interchangeably in casual speech, but technically UTF-8 is just one way to encode Unicode.
How many characters does Unicode support?
Unicode can represent up to 1,114,112 code points (U+0000 to U+10FFFF). As of Unicode 15.1 (2023), approximately 149,813 of those positions are assigned to characters. The rest are either reserved for future use or designated as private use areas.
What is a surrogate pair in UTF-16?
UTF-16 uses 2 bytes per character for the Basic Multilingual Plane (U+0000–U+FFFF). For supplementary characters (emoji, rare CJK, etc.) above U+FFFF, it uses two 2-byte units called a surrogate pair: a high surrogate (U+D800–U+DBFF) followed by a low surrogate (U+DC00–U+DFFF). Together they encode one supplementary code point. This is why JavaScript's string.length can report 2 for a single emoji — it counts UTF-16 code units, not code points.