Unicode defines 1,114,112 (0x110000) code points (think of them as characters). UTF-8 is one way to transform a code point (i.e. a number) into a byte sequence. It's the most compact but also then complexest UTF encoding. The other, commonly used UTF formats are:
Note that each UTF format can encode/transform all code points. They just provide different representations.
Beside the standard conform UTF-8 there two (unofficial) variants:
C0 80
        instead of 00. Used by Java and Tcl.Note that this variants shouldn't be used to exchange data.
UTF-8 has a Byte Order Mark (BOM). If used, it needs to be placed at the beginning of the string.
The BOM is EF BB BF.
Note that UTF-8 is independent of endianess (i.e. little endian or big endian).
The design of UTF‑8 is most easily seen in the following table. The xs are replaced by the bits of the
code point:
| Bits | Last code point | Byte 1 | Byte 2 | Byte 3 | Byte 4 | 
|---|---|---|---|---|---|
| 7 | U+007F (127) | 0xxxxxxx | |||
| 11 | U+07FF (2,047) | 110xxxxx | 10xxxxxx | ||
| 16 | U+FFFF (65,535) | 1110xxxx | 10xxxxxx | 10xxxxxx | |
| 21 | U+1FFFFF (2,097,151) | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx | 
Explanation:
1s, while
    continuation bytes all have 10 in the high-order position.
  1s in the leading byte of a multi-byte sequence indicates the number
    of bytes in the sequence (including the leading byte), so that the length of the sequence can be determined without
    examining the continuation bytes.
  The xs in the table above are filled with the bits of the code point (beginning at the right most byte).
The following table shows some examples:
| Character | Binary code point | Binary UTF-8 | Hexadecimal UTF-8 | |
|---|---|---|---|---|
| $ | U+0024 | 00100100 | 00100100 | 24 | 
| ¢ | U+00A2 | 00000000 10100010 | 11000010 10100010 | C2 A2 | 
| € | U+20AC | 00100000 10101100 | 11100010 10000010 10101100 | E2 82 AC | 
| 𤭢 | U+24B62 | 00000010 01001011 01100010 | 11110000 10100100 10101101 10100010 | F0 A4 AD A2 | 
Remarks:
Java code for converting a code point into UTF-8:
private static final int CONTINUATION_BYTE_MARKER =  0x80; // 10xxxxxx
private static final int SIX_BIT_MASK = 0x3F; // 00111111
// NOTE: Since most programming language provide their own UTF-8 encoding facilities, this
//   method isn't optimized for speed. Instead it's implementation focuses on making it easy
//   to understand.
public static byte[] encodeUTF8(int codePoint) {
  if (codePoint <= 127) {
    // MSB is 0 - single byte
    return new byte[] { (byte)codePoint };
  }
  // multi byte sequence
  // NOTE: In November 2003 UTF-8 was restricted by RFC 3629 to end at U+10FFFF, in order to
  //   match the constraints of the UTF-16 character encoding. This removed all 5- and
  //   6-byte sequences.
  if (codePoint > 0x10FFFF) {
    throw new IllegalArgumentException("Invalid code point: " + codePoint);
  }
  byte[] bytes = new byte[4];
  int byteCount = 0;
  int leadingByteMask = 1 << 5; // 00011111
  while (true) {
    // Extract the first (= low order, right most) 6 bits from the code point and create a
    // continuation byte with them.
    byte curByte = (byte)((codePoint & SIX_BIT_MASK) | CONTINUATION_BYTE_MARKER);
    bytes[byteCount] = curByte;
    // Remove the 6 bits we just encoded.
    // NOTE: Use ">>>" (shift zeros into the left most position)
    codePoint = codePoint >>> 6;
    byteCount++;
    if (codePoint <= leadingByteMask) {
      // Remaining bits fit into the leading byte
      // Calculate most significant bits:
      //  1. A "1" for each byte used (including the leading byte)
      //  2. Followed by a "0"
      int msbs;
      switch (byteCount) { // number of continuation bytes
      case 1:
        msbs = 0xC0; // 110xxxxx
        break;
      case 2:
        msbs = 0xE0; // 1110xxxx
        break;
      case 3:
        msbs = 0xF0; // 11110xxx
        break;
      default:
        // Continuation bytes are limited to 3 (see "invalid code point" exception above).
        throw new IllegalStateException();
      }
      curByte = (byte)(msbs | codePoint);
      bytes[byteCount] = curByte;
      byteCount++;
      break;
    }
    else {
      // We need another continuation byte
      leadingByteMask = leadingByteMask >>> 1;
    }
  }
  // NOTE: Bytes are in reversed order. Make it correct.
  switch (byteCount) {
  case 2:
    return new byte[] { bytes[1], bytes[0] };
  case 3:
    return new byte[] { bytes[2], bytes[1], bytes[0] };
  case 4:
    return new byte[] { bytes[3], bytes[2], bytes[1], bytes[0] };
  default:
    // Byte count is limited to 4 (see "invalid code point" exception above).
    throw new IllegalStateException();
  }
}
import java.nio.*;
import javax.nio.charset.*;
public class UnicodeTest {
  public static void main(String[] args) {
    CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
    // Code point: 120120 (mathematical double-struck capital A)
    ByteBuffer bytes = ByteBuffer.wrap(new byte[] {
        (byte)0xF0, (byte)0x9D, (byte)0x94, (byte)0xB8
      });
    String decoded;
    try {
      decoded = decoder.decode(bytes).toString();
    }
    catch (CharacterCodingException e) {
      throw new RuntimeException(e);
    }
  }
}