Unicode Table

Unicode character ranges, code points, and encoding formats. Quick reference for block ranges, planes, UTF-8/UTF-16 encoding, and common symbols.

What is Unicode?

Unicode is the universal character standard that assigns a unique number (code point) to every character in every writing system. As of Unicode 15.1, it covers 149,813 characters across 161 scripts.

Code points are written as U+XXXX in hexadecimal. For example, the Latin capital letter A is U+0041, the Euro sign is U+20AC, and 😀 is U+1F600.

Unicode defines the code points; UTF-8, UTF-16, and UTF-32 are encodings that specify how to store those code points as bytes.

Unicode Planes

Plane	Range	Name	Contents
0	`U+0000–U+FFFF`	Basic Multilingual Plane (BMP)	Most common characters; all modern scripts
1	`U+10000–U+1FFFF`	Supplementary Multilingual Plane (SMP)	Historic scripts, emoji, musical notation, math
2	`U+20000–U+2FFFF`	Supplementary Ideographic Plane (SIP)	Rare CJK unified ideographs
3	`U+30000–U+3FFFF`	Tertiary Ideographic Plane (TIP)	Oracle bone script and archaic CJK
14	`U+E0000–U+EFFFF`	Supplementary Special-Purpose Plane	Language tags, variation selectors
15–16	`U+F0000–U+10FFFF`	Private Use Area (PUA)	Custom characters; no standard meaning

Common Character Blocks

Block	Range	Count	Examples
Basic Latin (ASCII)	`U+0000–U+007F`	128	A–Z, a–z, 0–9, punctuation
Latin-1 Supplement	`U+0080–U+00FF`	128	à, é, ü, ñ, ©, ®, °
Latin Extended-A	`U+0100–U+017F`	128	Central/Eastern European Latin
Latin Extended-B	`U+0180–U+024F`	208	African, phonetic Latin extensions
Greek and Coptic	`U+0370–U+03FF`	144	α, β, γ, Ω, π
Cyrillic	`U+0400–U+04FF`	256	А, Б, В…Я (Russian, Bulgarian, etc.)
Hebrew	`U+0590–U+05FF`	112	א, ב, ג… (right-to-left)
Arabic	`U+0600–U+06FF`	256	ا, ب, ت… (right-to-left)
Devanagari	`U+0900–U+097F`	128	Hindi, Sanskrit, Marathi
Hiragana	`U+3040–U+309F`	96	あ, い, う, え, お…
Katakana	`U+30A0–U+30FF`	96	ア, イ, ウ, エ, オ…
CJK Unified Ideographs	`U+4E00–U+9FFF`	20,902	Chinese, Japanese kanji, Korean hanja
Arrows	`U+2190–U+21FF`	112	←, →, ↑, ↓, ⟶, ⟹
Mathematical Operators	`U+2200–U+22FF`	256	∀, ∃, ∈, ∑, ∞, ≤, ≥, ≠
Miscellaneous Symbols	`U+2600–U+26FF`	256	☀, ☁, ☂, ♠, ♥, ★, ✓
Emoticons (Emoji)	`U+1F600–U+1F64F`	80	😀, 😂, ❤️, 🙏
Misc Symbols & Pictographs	`U+1F300–U+1F5FF`	768	🌍, 🌈, 🎉, 🏠, 🚀
Supplemental Symbols	`U+1F900–U+1F9FF`	256	🤖, 🧠, 🦊, 🧩

Encoding Formats

Encoding	Bytes per char	Notes
UTF-8	1–4 bytes	U+0000–U+007F = 1 byte (ASCII compatible). U+0080–U+07FF = 2 bytes. U+0800–U+FFFF = 3 bytes. U+10000–U+10FFFF = 4 bytes. Default encoding for the web.
UTF-16	2 or 4 bytes	BMP characters = 2 bytes. Supplementary (above U+FFFF) use surrogate pairs (2 × 2 bytes). Used by JavaScript strings internally, Windows APIs.
UTF-32	4 bytes (fixed)	Every character uses exactly 4 bytes. Simple but memory-inefficient. Used internally by Python 3 strings (on some builds) and Linux/macOS wchar_t.

Common Symbols Quick Reference

Char	Code Point	HTML Entity	Name
©	`U+00A9`	`©`	Copyright Sign
®	`U+00AE`	`®`	Registered Sign
™	`U+2122`	`™`	Trade Mark Sign
°	`U+00B0`	`°`	Degree Sign
€	`U+20AC`	`€`	Euro Sign
£	`U+00A3`	`£`	Pound Sign
¥	`U+00A5`	`¥`	Yen Sign
→	`U+2192`	`→`	Rightwards Arrow
←	`U+2190`	`←`	Leftwards Arrow
∞	`U+221E`	`∞`	Infinity
✓	`U+2713`	`✓`	Check Mark
✗	`U+2717`	`✗`	Ballot X
…	`U+2026`	`…`	Horizontal Ellipsis
—	`U+2014`	`—`	Em Dash

Look up any character by name, code point, or category with the Unicode Lookup tool.

Frequently Asked Questions

What is the difference between Unicode and UTF-8?

Unicode is the character standard — it assigns a unique number (code point) to every character. UTF-8 is an encoding — it's the algorithm for converting those code points to bytes for storage and transmission. Other encodings (UTF-16, UTF-32) represent the same Unicode code points differently. "UTF-8" and "Unicode" are often used interchangeably in casual speech, but technically UTF-8 is just one way to encode Unicode.

How many characters does Unicode support?

Unicode can represent up to 1,114,112 code points (U+0000 to U+10FFFF). As of Unicode 15.1 (2023), approximately 149,813 of those positions are assigned to characters. The rest are either reserved for future use or designated as private use areas.

What is a surrogate pair in UTF-16?

UTF-16 uses 2 bytes per character for the Basic Multilingual Plane (U+0000–U+FFFF). For supplementary characters (emoji, rare CJK, etc.) above U+FFFF, it uses two 2-byte units called a surrogate pair: a high surrogate (U+D800–U+DBFF) followed by a low surrogate (U+DC00–U+DFFF). Together they encode one supplementary code point. This is why JavaScript's string.length can report 2 for a single emoji — it counts UTF-16 code units, not code points.