Unicode & ASCII Lookup
Look up any character by typing it, entering a code point (U+1F600), decimal (128512), or hex (0x41).
Look up any character by typing it, entering a code point (U+1F600), decimal (128512), or hex (0x41).
Unicode is a universal character encoding standard that assigns a unique number (code point) to every character in every language. There are over 140,000 characters in Unicode, covering modern and historic scripts, symbols, emoji, and more. Code points are written as U+XXXX where XXXX is a hexadecimal number.
UTF-8, UTF-16, and UTF-32 are different ways of encoding Unicode code points as bytes. UTF-8 is the dominant encoding on the web and is backward-compatible with ASCII for the first 128 characters.
UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII. It's the standard for web content. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows. For ASCII characters they differ: 'A' is 0x41 in UTF-8 but 0x0041 in UTF-16.
For BMP characters (U+0000 to U+FFFF) use the \uXXXX escape syntax — e.g., \u00A9 for ©. For characters above U+FFFF (like emoji) use the ES6 \u{XXXXX} syntax — e.g., \u{1F600} for 😀. The lookup tool provides the correct escape for each character.
HTML entities are special codes for representing characters in HTML. They start with & and end with ;. You can use numeric forms like © (decimal) or © (hex) for any Unicode character, or named entities like © for common symbols.