Code points and UTF

Very simply put, main part of the Unicode standard is just a giant table, which assigns a number to every glyph, would that be a letter, a punctuation, a diacritic and so on. Those numbers are called code points and normally a Unicode code point is referred to by writing U+ followed by its number in hexadecimal form. For example, U+0050 refers to LATIN CAPITAL LETTER P, and U+00A9 is the COPYRIGHT SIGN etc.

But this giant table of code points itself yet has nothing to do with programming, computers or whatsoever. To actually use the power of Unicode in your programs you have to deal with the notion of Unicode Transformation Format (UTF) encodings, i.e. rules by which the code points could be translated into sequence of bits. The dominant (and most preferrable) character encoding for the World-Wide Web is UTF-8 and in this tutorial we'll be dealing only with this representation of Unicode standard.