Private Use (Unicode)

In Unicode, Private Use is a concept to allow characters to be defined and used by private agreement between parties (that is, not involving Unicode), using specified code points. Such a private definition may include publication of a font that supports the definition (showing the characters), and processes to support privately-defined graphic or even control effects (e.g. a clickable <do print> character). As a stability rule, the Unicode Standard guarantees these Private Use code points will never be assigned regular characters, so Unicode will never interfere with the private agreement. The private agreement may be published, and often is.

For example, Apple Inc. has published the Apple control key sign () to be encoded at Private-use code point U+F8FF <private-use-F8FF>, and maintains this in its fonts and systems.

By definition, multiple private parties may define a specific code point this way, with the consequence that a user can experience using the wrong font, seeing characters from another definition set.

Contents

Definition

Unicode defines that Private-use code points are assigned characters (as opposed to, say, reserved code points), but no specifics are defined, and properties can be overruled by the private agreement. Part of the stability of the standard is that these code points will never be assigned a regular Unicode character:

Characters in these [Private Use] areas will never be defined by the Unicode Standard. These code points can be freely used for characters of any purpose, but successful interchange requires an agreement between sender and receiver on their interpretation.[1][2]

Just all Private-use characters have General Category=Other, private use (Co).

Private Use Areas

There are three blocks of private-use code points, each is a Private Use Area. In the Basic Multilingual Plane (plane 0) is block Private Use Area with 6400 code points, and in plane 15 and 16 are blocks Supplemental Private Use Area-A and Supplemental Private Use Area-B respectively with 65.534 code points each. The two PUA Planes in Unicode are composed by using surrogate pairs from the basic BMP plane. The high surrogates are those in BMP-block High Private Use Surrogates (U+DB80..U+DBFF, 128 code points), combined with all low surrogates (1028 code points). The 1-to-1 mapping between surrogate-pair and U+xxxxxx code point is defined in UTF-16.

Background

In earlier encodings, the concept of private use was present. East Asian systems used End User Character Definition (EUCD)[1].

In ASCII, the C1 control block containes two Private Use codes: U+0091 <control-0091> (Named: private use one, PU1) and U+0092 <control-0092> (Named: private use two, PU2). Although the C1 controls are incorporated in Unicode, PU1 and PU2 are not considered Private Use characters by Unicode.[3]

Usage

Tentative coordination

A lot of persons and institutes have published self-defined charqacters in using PUA. To prevent unnecessary overlap, an informal organisation maintains and publishes an incomplete list of private-use publications. By publishing this overview, publishers can aim for unused or less used code points, thereby preventing overlaps. But by definition, this cannot be a guaranteed single-use because every party can use PUA code points at free choiche.

The list is maintained by ConScript Unicode Registry (which is not related to Unicode Consortium).

Example code point U+F8FF

Unicode code point U+F8FF or  is the last code point in the Private Use Area in BMP. Its meaning and appearance vary depending on the font in use, but its usage in several fonts makes it the most notable code point in the private use area.

References

  1. ^ a b Unicode Standard chapter 16.5 Private Use characters
  2. ^ Unicode Standard chapter 2: General Structure
  3. ^ ISO C1 Control Character Set of ISO 6429 (1983)