Introduction

Unicode defines 1,114,112 (0x110000) code points (think of them as characters). UTF-32 is one way to transform a code point (i.e. a number) into a byte sequence. It's the simplest UTF encoding. The other, commonly used UTF formats are:

UTF-8
UTF-16

Note that each UTF format can encode/transform all code points. They just provide different representations.

Notes

UTF-32 is a fixed-length encoding. Each code point is either represented by a single 32 bit value.
UTF-32 doesn't use any transformation. The code point's (i.e. the number) binary value is used directly.
UTF-32 can be encoded with little endian (UTF-32LE) or big endian (UTF-32BE). Little endian is more common.

To be able to distinguish these two format, either specify them explicitly or use a Byte Order Mark (BOM) at the beginning of the string.

The BOM for little endian is FF FE 00 00. For big endian it is 00 00 FE FF.

Example Code

import java.nio.*;
import javax.nio.charset.*;

public class UnicodeTest {
  public static void main(String[] args) {
    // little endian encoding
    CharsetDecoder decoder = Charset.forName("UTF-32LE").newDecoder();

    // Code point: 120120/0x1D538 (mathematical double-struck capital A)
    ByteBuffer bytes = ByteBuffer.wrap(new byte[] {
        (byte)0x38, (byte)0xD5, (byte)0x01, (byte)0x00
      });
    String decoded;

    try {
      decoded = decoder.decode(bytes).toString();
    }
    catch (CharacterCodingException e) {
      throw new RuntimeException(e);
    }
  }
}