X.690
X.690 is an ITU-T standard specifying several ASN.1 encoding formats:
- Basic Encoding Rules (BER)
- Canonical Encoding Rules (CER)
- Distinguished Encoding Rules (DER)
The Basic Encoding Rules were the original rules laid out by the ASN.1 standard for encoding abstract information into a concrete data stream. The rules, collectively referred to as a transfer syntax in ASN.1 parlance, specify the exact octet sequences which are used to encode a given data item. The syntax defines such elements as: the representations for basic data types, the structure of length information, and the means for defining complex or compound types based on more primitive types. The BER syntax, along with two subsets of BER (the Canonical Encoding Rules and the Distinguished Encoding Rules), are defined by the ITU-T's X.690 standards document, which is part of the ASN.1 document series.
BER encoding
The format for Basic Encoding Rules specifies a self-describing and self-delimiting format for encoding ASN.1 data structures. Each data element is encoded as a type identifier, a length description, the actual data elements, and, where necessary, an end-of-content marker. These types of encodings are commonly called type-length-value or TLV encodings. This format allows a receiver to decode the ASN.1 information from an incomplete stream, without requiring any pre-knowledge of the size, content, or semantic meaning of the data.[1]
Encoding Structure
The encoding of data does generally consist of four components which appear in the following order:
Identifier octets Type |
Length octets Length |
Contents octets Value |
End-of-contents octets |
The End-of-contents octets are optional and only used if the indefinite length form is used. The Contents octet may also be omitted if there is no content to encode like in the NULL type.
Identifier octets
The identifier octets encode the ASN.1 tag (class and number) of the type of the data value. Its structure is defined as follows:
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
---|---|---|---|---|---|---|---|
Class | P/C | Tag Number |
Bit 8 and 7 of the identifier octet describe the class of the object. Note that some of the ASN.1 types can be encoded using either primitive or a constructed encoding at the option of the sender. The following values are possible:
Class | bit 8 | bit 7 | description |
---|---|---|---|
Universal | 0 | 0 | The type is native to ASN.1 |
Application | 0 | 1 | The type is only valid for one specific application |
Context-specific | 1 | 0 | Meaning of this type depends on the context (such as within a sequence, set or choice) |
Private | 1 | 1 | Defined in private specifications |
Bit 6 (P/C) states whether the content is primitive, like an INTEGER, or constructed, which means it holds further TLV values, like a SET.
P/C | bit 6 |
---|---|
Primitive | 0 |
Constructed | 1 |
The remaining bits 5 to 1 contain the tag, which serves as the identifier of the type of the content.
The following tags are native to ASN.1:
Name | P/C | Number (decimal) | Number (hexadecimal) |
---|---|---|---|
EOC (End-of-Content) | P | 0 | 0 |
BOOLEAN | P | 1 | 1 |
INTEGER | P | 2 | 2 |
BIT STRING | P/C | 3 | 3 |
OCTET STRING | P/C | 4 | 4 |
NULL | P | 5 | 5 |
OBJECT IDENTIFIER | P | 6 | 6 |
Object Descriptor | P/C | 7 | 7 |
EXTERNAL | C | 8 | 8 |
REAL (float) | P | 9 | 9 |
ENUMERATED | P | 10 | A |
EMBEDDED PDV | C | 11 | B |
UTF8String | P/C | 12 | C |
RELATIVE-OID | P | 13 | D |
(reserved) | - | 14 | E |
(reserved) | - | 15 | F |
SEQUENCE and SEQUENCE OF | C | 16 | 10 |
SET and SET OF | C | 17 | 11 |
NumericString | P/C | 18 | 12 |
PrintableString | P/C | 19 | 13 |
T61String | P/C | 20 | 14 |
VideotexString | P/C | 21 | 15 |
IA5String | P/C | 22 | 16 |
UTCTime | P/C | 23 | 17 |
GeneralizedTime | P/C | 24 | 18 |
GraphicString | P/C | 25 | 19 |
VisibleString | P/C | 26 | 1A |
GeneralString | P/C | 27 | 1B |
UniversalString | P/C | 28 | 1C |
CHARACTER STRING | P/C | 29 | 1D |
BMPString | P/C | 30 | 1E |
(use long-form) | - | 31 | 1F |
Identifier tags greater than 30
If the identifier is not universal, its tag may be a number that is greater than 30. In that case, the tag does not fit in the 5-bit tag field, and must be encoded in subsequent octets. The value 111112
is reserved for identifying such encodings.
These long-form identifiers are encoded as follows:
- Bits 5 to 1 of the leading octet are encoded as
111112
. - The subsequent octets encode the tag's number:
- bit 8 of each octet is set to 1, except in the last octet.
- bits 7 to 1 of each octet encode an unsigned binary integer. When concatenated, these amount to the tag number. In each octet, bit 7 is the most significant bit.
- bits 7 to 1 of the first octet must not be all zeroes.
Data (especially members of sequences and sets and choices) can be tagged with a unique tag number (shown in ASN.1 within square brackets []) to distinguish that data from other members. Such tags can be implicit (where they are encoded as the TLV tag of the value instead of using the base type as the TLV tag) or explicit (where the tag is used in a constructed TLV that wraps the base type TLV). The default tagging style is explicit, unless implicit is set at ASN.1 module-level. Such tags have a default class of context-specific, but that can be overridden by using a class name in front of the tag.
The encoding of a choice value is the same as the encoding of a value of the chosen type. The encoding may be primitive or constructed, depending on the chosen type. The tag used in the identifier octets is the tag of the chosen type, as specified in the ASN.1 definition of the chosen type.
Length Octets
There are two forms of the length octets: The definite form and the indefinite form.
The definite form
This encoding is always used if the encoding is primitive or the encoding is constructed and data is immediately available. Depending on the actual length of the content the length octets are encoded using either a short form or a long form. Both forms store numeric data as unsigned binary integers in big-endian encoding.
In the short form, the length octets consist of a single octet in which bits 7 to 1 encode the number of octets in the contents octets (which may be zero). Bit 8 of the length octet is zero to indicate that this is the short form.
Example: L = 38 can be encoded as 00100110
In contrast to the short form, the long form length octets consist of an initial octet and one or more subsequent octets. According to the X.690 standard [1] the initial length octet shall be encoded as follows:
- bit 8 shall be one;
- bits 7 to 1 shall encode the number of subsequent octets in the length octets, as an unsigned binary integer with bit 7 as the most significant bit;
- the value
111111112
shall not be used.
All bits of the subsequent octets form the encoding of an unsigned binary integer equal to the number of octets in the contents octets.
Example: L = 435 can be encoded as 10000010 // long form with two subsequent length octets 00000001 10110011 // both octets together form the binary string 0000000110110011
The indefinite form
If the first bit of the length octet is set to one but all other bits are set to zero (e.g. 100000002
, hex value 0x80
) the indefinite form is used.
This encoding is applicable to constructed types and is typically used if not all of the content is immediately available at encoding time.
In this case, two end-of-contents octets (see Identifier octets) must terminate the data stream.
Content Octets
The content octets encode the data value as specified in.[1]
Note that the content octets can be omitted if there is no value to be submitted other than the pure existence of the ASN.1 object. This is the case when transmitting an ASN.1 NULL value (e.g. for acknowledgements.)
CER encoding
CER is a restricted variant of BER for producing unequivocal transfer syntax for data structures described by ASN.1. Whereas BER gives choices as to how data values may be encoded, CER (together with DER) selects just one encoding from those allowed by the basic encoding rules, eliminating rest of the options. CER is useful when the encodings must be preserved, e.g. in security exchanges.
DER encoding
DER is a restricted variant of BER for producing unequivocal transfer syntax for data structures described by ASN.1. Like CER, DER encodings are valid BER encodings. DER is the same thing as BER with all but one sender's options removed.
DER is a subset of BER providing for exactly one way to encode an ASN.1 value. DER is intended for situations when a unique encoding is needed, such as in cryptography, and ensures that a data structure that needs to be digitally signed produces a unique serialized representation. DER can be considered a canonical form of BER. For example, in BER a Boolean value of true can be encoded as any of 255 non-zero byte values, while in DER there is one way to encode a boolean value of true.
The most significant DER encoding constraints are:
- Length encoding must use the definite form
- Additionally, the shortest possible length encoding must be used
- Bitstring, octetstring, and restricted character strings must use the primitive encoding
- Elements of a Set are encoded in sorted order, based on their tag value
DER is widely used for digital certificates such as X.509.
BER, CER and DER compared
The key difference between the BER format and the CER or DER formats is the flexibility provided by the Basic Encoding Rules. BER, as explained above, is the basic set of encoding rules given by ITU X.690 for the transfer of ASN.1 data structures. It gives senders clear rules for encoding data structures they want to send, but also leaves senders some encoding choices. As stated in the X.690 standard, "Alternative encodings are permitted by the basic encoding rules as a sender's option. Receivers who claim conformance to the basic encoding rules shall support all alternatives".[1]
A receiver must be prepared to accept all legal encodings in order to legitimately claim BER-compliance. By contrast, both CER and DER restrict the available length specifications to a single option. As such, CER and DER are restricted forms of BER and serve to disambiguate the BER standard.
CER and DER differ in the set of restrictions that they place on the sender. The basic difference between CER and DER is that DER uses definitive length form and CER uses indefinite length form in some precisely defined cases. That is, DER always has leading length information, while CER uses the end-of-contents octet instead of providing the length of the encoded data. Because of this, CER requires less metadata for large encoded values, while DER does it for small ones.
In order to facilitate a choice between encoding rules, the X.690 standards document provides the following guidance:
The distinguished encoding rules is more suitable than the canonical encoding rules if the encoded value is small enough to fit into the available memory and there is a need to rapidly skip over some nested values. The canonical encoding rules is more suitable than the distinguished encoding rules if there is a need to encode values that are so large that they cannot readily fit into the available memory or it is necessary to encode and transmit a part of a value before the entire value is available. The basic encoding rules is more suitable than the canonical or distinguished encoding rules if the encoding contains a set value or set-of value and there is no need for the restrictions that the canonical and distinguished encoding rules impose.
Criticisms of BER encoding
There is a common perception of BER as being "inefficient" compared to alternative encoding rules. It has been argued by some that this perception is primarily due to poor implementations, not necessarily any inherent flaw in the encoding rules.[2] These implementations rely on the flexibility that BER provides to use encoding logic that is easier to implement, but results in a larger encoded data stream than necessary. Whether this inefficiency is reality or perception, it has led to a number of alternative encoding schemes, such as the Packed Encoding Rules, which attempt to improve on BER performance and size.
Other alternative formatting rules, which still provide the flexibility of BER but use alternative encoding schemes, are also being developed. The most popular of these are XML-based alternatives, such as the XML Encoding Rules and ASN.1 SOAP.[3] In addition, there is a standard mapping to convert an XML Schema to an ASN.1 schema, which can then be encoded using BER.[4]
Usage
Despite its perceived problems, BER is a popular format for transmitting data, particularly in systems with different native data encodings.
- The SNMP and LDAP protocols specify ASN.1 with BER as their required encoding scheme.
- The EMV standard for credit and debit cards uses BER to encode data onto the card
- The digital signature standard PKCS #7 also specifies ASN.1 with BER to encode encrypted messages and their digital signature or digital envelope.
- Many telecommunication systems, such as ISDN, toll-free call routing, and most cellular phone services use ASN.1 with BER to some degree for transmitting control messages over the network.
- GSM TAP (Transferred Account Procedures), NRTRDE (Near Real Time Roaming Data Exchange) files are encoded using BER.
By comparison, the more definite DER encoding is widely used to transfer digital certificates such as X.509.
See also
- Kerberos
- Packed Encoding Rules (PER, X.691)
- Structured Data eXchange Format (SDXF)
- Serialization
References
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
- 1 2 3 4 Information technology – ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER), ITU-T X6.90, 07/2002
- ↑ Lin, Huai-An. “Estimation of the Optimal Performance of ASN.1/BER Transfer Syntax”. ACM Computer Communication Review. July 93, 45 - 58.
- ↑ ITU-T Rec. X.892, ISO/IEC 24824-2
- ↑ ITU-T X.694, ISO/IEC ISO/IEC 8825-5
External links
- ITU-T X.690, ISO/IEC 8825-1
- ITU-T X.892, ISO/IEC 24824-2
- ITU-T X.694, ISO/IEC ISO/IEC 8825-5
- PKCS #7
- jASN1 Java ASN.1 BER encoding/decoding library at openmuc.org, LGPL-licensed
- PHPASN1 PHP ASN.1 BER encoding/decoding library at github, GPL-licensed
- ASN1js JavaScript ASN.1 BER encoding/decoding library at github, GPL-licensed
- Peter Gutmann's 'X.509 Style Guide'
- RSA's 'A Layman's Guide to a Subset of ASN.1, BER, and DER '