Unicode Encode

Convert text to Unicode escape sequences — \uXXXX, U+XXXX, HTML entities, and UTF-8 byte formats.

What is Unicode Encoding?

Unicode is the universal character encoding standard that assigns a unique code point (number) to every character in every writing system in the world — over 140,000 characters covering 154 modern and historical scripts, symbols, emojis, and more. Unicode encoding converts these characters to their escape sequence representations for use in source code, HTML, CSS, and other technical contexts.

Different programming languages and contexts use different Unicode escape formats. JavaScript and JSON use \uXXXX (for code points up to U+FFFF) or \uXXXX\uXXXX surrogate pairs for higher code points. Python uses \uXXXX or \UXXXXXXXX. HTML uses &#xXXXX; (hex) or &#XXXXX; (decimal). CSS uses \XXXXXX without the "u" prefix.

Unicode encoding is useful when you need to represent non-ASCII characters in ASCII-only source files, embed special characters in JSON, create cross-platform compatible strings, obfuscate text in source code, or when working with systems that don't support full Unicode input.

Frequently Asked Questions

What is a Unicode code point?
A code point is a unique integer assigned to each character in the Unicode standard, written as U+XXXX. For example, the letter A is U+0041, © is U+00A9, ♥ is U+2665, and the smiley emoji 😀 is U+1F600. Code points range from U+0000 to U+10FFFF.
What are surrogate pairs?
JavaScript internally uses UTF-16 encoding, which represents characters above U+FFFF (like most emojis) as two 16-bit values called a surrogate pair. For example, 😀 (U+1F600) is represented as \uD83D\uDE00 in JavaScript. This tool correctly handles surrogate pairs when encoding emojis and other supplementary characters.
What's the difference between Unicode and UTF-8?
Unicode is the abstract character set that assigns code points. UTF-8 is an encoding that maps those code points to actual bytes for storage and transmission. UTF-8 uses 1–4 bytes per character and is backward compatible with ASCII. Most files, databases, and web content today use UTF-8 encoding.