Introduction

Unicode defines 1,114,112 (0x110000) code points (think of them as characters). UTF-32 is one way to transform a code point (i.e. a number) into a byte sequence. It's the simplest UTF encoding. The other, commonly used UTF formats are:

Note that each UTF format can encode/transform all code points. They just provide different representations.

Notes

Example Code

import java.nio.*;
import javax.nio.charset.*;

public class UnicodeTest {
  public static void main(String[] args) {
    // little endian encoding
    CharsetDecoder decoder = Charset.forName("UTF-32LE").newDecoder();

    // Code point: 120120/0x1D538 (mathematical double-struck capital A)
    ByteBuffer bytes = ByteBuffer.wrap(new byte[] {
        (byte)0x38, (byte)0xD5, (byte)0x01, (byte)0x00
      });
    String decoded;

    try {
      decoded = decoder.decode(bytes).toString();
    }
    catch (CharacterCodingException e) {
      throw new RuntimeException(e);
    }
  }
}