Unicode Basics

Unicode (or the "Unicode standard") is basically "just" a big list of characters. The standard defines a unique number for each character, the so called code points.

The number of a code point is usually written in hexadecimal notation with a least four digits and prefixed with U+. For example, U+000A (less than four hex digits) for code point 10, or U+1D538 (more than four digits) for 120,120.

The characters are grouped by blocks (like "Basic Latin", "Greek and Coptic"). Each block belongs to a plane.

Historically there was only one plane: the Basic Multilingual Plane (BMP, plane 0). The code points 0x0000 to 0xFFFF belong this plane. Later additional planes (1 - 16) were added. These planes are called the Supplementary Planes (or sometimes also "Astral Planes"). Each plane again contains 0xFFFF code points.

The plane of each code point is identified by the code point's third byte. So, for example, all code points from the third plane start with U+03xxxx while all code points from the fifth plane start with U+05xxxx. For the zeroth plane (BMP) the plane number is usually omitted (i.e. U+xxxx).

The highest possible code point (as defined by the Unicode standard) is 0x10FFFF (1,114,111). However, not all number are in use. Some facts:

Each plane can address the code points up to U+FFFD (65,534 - including U+0000). The code points U+FFFEand u+FFFF are never used in any plane. So with 17 planes up to 1,114,078 code points can be defined.
Out of the 1,114,078 possible code points, only 246,917 (22.2%) are defined.
Out of 17 planes, only 7 are in use (0 - 3, 14 - 16).

The terms UTF-8 and UTF-16 (among others) are often used together with Unicode. How do they relate? They are encodings for code points, i.e. they define an algorithm of how to transform a code point into a series of bytes. Note, however, that all encodings can encode all Unicode code points. They just differ in the way they do this.

UTF-32 is the easiest encoding but also requires the most space. Each code point is simply stored as 4 byte integer. (more information)
UTF-16 is the oldest encoding. It stores each code point either with one or with two 2 byte integers. Code points from plane 0 (BMP) are stored with two bytes. Code points from all other planes are stored as a so called Surrogate Pair with 4 bytes (2 x 2 bytes). (more information)
UTF-8 is the complexest encoding but also requires the least space. Each code point is stored as series of up to four 1 byte integers. (more information)