The Unicode characters can be categorized in many different ways. Unicode code points can be logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal of the first two positions in six position format (hhhhhh). Six of these planes also have names.
Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify. While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 21-bit limit (17 planes with each 65,536 code points leads to 1,114,112 code points in total; to encode these using a bit-array we need at least 21 bits as 221 = 2,097,152) is therefore unlikely to be reached in the near future.
Sometimes, terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (i.e., planes 1, 2… 16) and their characters.
Basic Multilingual Plane Edit
The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
The High Surrogates (U+D800..U+DBFF) and Low Surrogate (U+DC00..U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
Template:As of, the BMP comprises the following blocks:
Supplementary Multilingual Plane Edit
Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.
Template:As of, the SMP comprises the following blocks:
Supplementary Ideographic Plane Edit
Plane 2, the Supplementary Ideographic Plane (SIP), is used for Unified Han (CJK) Ideographs that were mostly not included in earlier character encoding standards.
Template:As of, the SIP comprises the following blocks:
- CJK Unified Ideographs Extension B (20000–2A6DF)
- CJK Unified Ideographs Extension C (2A700–2B73F)
- CJK Unified Ideographs Extension D (2B740–2B81F)
- CJK Compatibility Ideographs Supplement (2F800–2FA1F)
Tertiary Ideographic Plane Edit
Template:As of, the TIP does not include any blocks.
Unassigned planes Edit
Unicode has not yet assigned any characters to Planes 4 through 13. It is not anticipated that all these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless.
Supplementary Special-purpose Plane Edit
Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the
xml:lang attribute in XML). The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.
Template:As of, the SSP comprises the following blocks:
- Tags (E0000–E007F)
- Variation Selectors Supplement (E0100–E01EF)
Private Use Area planes Edit
Two planes (planes 15 and 16) have been set aside for character assignment by parties outside the ISO and the Unicode Consortium. Use of such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support characters assignments by other parties. Especially if the characters have unusual properties such as right-to-left characters, other implementations may treat those characters inappropriately.
<ref>tags exist, but no
<references/>tag was found