Unicode

Unicode also known as Universal Coded Character Set is really just another type of character encoding, it’s still a lookup of bits -> characters. The main difference between Unicode and ASCII is that Unicode allows characters to be up to 32 bits wide. That’s over 4 billion unique values. But for various reasons not all of that space will ever be used, there will actually only ever be 1,111,998 characters in Unicode.

But with Unicode, won’t all my documents, emails and web pages take up 4x the amount of space compared with ASCII? Well, luckily no. even though Unicode is a wide-character set, together with Unicode comes several mechanisms to represent or encode the characters. These are primarily the UTF-8 and UTF-16 encoding schemes which both take a really smart approach to the size problem.

Unicode encoding schemes like UTF-8 are more efficient in how they use their bits. With UTF-8, if a character can be represented with 1 byte that’s all it will use.

Unicode Code Points