Comparison of data serialization formats
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Overview
Name | Creator/Maintainer | Based on | Standardized? | Specification | Binary? | Human-readable? | Supports references?e | Schema/IDL? | Standard APIs |
---|---|---|---|---|---|---|---|---|---|
Apache Avro | Apache Software Foundation | N/A | Yes | Apache Avro™ 1.7.5 Specification | Yes | No | N/A | Yes (built-in) | N/A |
ASN.1 | ISO, IEC, ITU-T | N/A | Yes | ISO/IEC 8824; X.680 series of ITU-T Recommendations | Yes (BER, DER, PER, OER, or custom via ECN) |
Yes (XER, GSER, or custom via ECN) |
Partialf | Yes (built-in) | N/A |
Bencode | Bram Cohen (creator) BitTorrent, Inc. (maintainer) |
N/A | Yes | Part of BitTorrent protocol specification | Partially (numbers and delimiters are ASCII) |
Partially | No | No | No |
BSON | MongoDB | JSON | Yes | BSON Specification | Yes | No | No | No | No |
Candle Markup | Henry Luo | XML, JSON, JavaFX | Yes | Candle Markup Reference | No | Yes | Yes (XPointer, XPath) |
Yes (Candle Pattern Reference) |
Yes (XQuery, XPath) |
Comma-separated values (CSV) | RFC author: Yakov Shafranovich |
N/A | Partial (myriad informal variants used) |
RFC 4180 (among others) |
No | Yes | No | No | No |
D-Bus Message Protocol | freedesktop.org | N/A | Yes | D-Bus Specification | Yes | Yes (Type Signatures) |
No | No | Yes (see D-Bus) |
Fast Infoset | ISO, IEC, ITU-T | XML | Yes | ITU-T X.891 and ISO/IEC 24824-1:2007 | Yes | Yes (XML) |
Yes (XPointer, XPath) |
Yes (XML schema) |
Yes (DOM, SAX, XQuery, XPath) |
JSON | Douglas Crockford | JavaScript syntax | Yes | RFC 4627 (ancillary: RFC 6901, RFC 6902) |
No, but see UBJSON, BSON | Yes | Yes (JSON Pointer (RFC 6901); alternately: JSONPath, JPath, JSPON, json:select()) |
Partial (JSON Schema Proposal, Kwalify, Rx, Itemscript Schema) |
Partial (Clarinet, JSONQuery, JSONPath) |
MessagePack | Sadayuki Furuhashi | JSON (loosely) | Yes | MessagePack format specification | Yes | No | No | No | No |
Netstrings | Dan Bernstein | N/A | Yes | netstrings.txt | Yes | Yes | No | No | No |
OGDL | Rolf Veen | ? | Yes | Specification | Yes (Binary Specification) |
Yes | Yes (Path Specification) |
Yes (Schema WD) |
|
OpenDDL | Eric Lengyel | C, PHP | Yes | OpenDDL.org | No | Yes | Yes | No | Yes (OpenDDL Library) |
PHP's serialize() & unserialize() |
PHP Group | N/A | Yes | No | Yes | Yes | Yes | No | Yes |
Data::Dumper format (Core Perl Module) | Gurusamy Sarathy (ActiveState developer) | Perl data types | Yes | No | ? | Yes | No | ? | Yes |
Property list | NeXT (creator) Apple (maintainer) |
? | Partial | Public DTD for XML format | Yesa | Yesb | No | ? | Cocoa, CoreFoundation, OpenStep, GnuStep |
Protocol Buffers (protobuf) | N/A | Partial | Developer Guide: Encoding | Yes | Partiald | No | Yes (built-in) | ||
ROOT | CERN & FNAL | N/A | No | N/A | Yes | Yes (optional XML output for debugging) |
Yes | Yes (C++ object persistency framework) |
Yes (Native C++ API, bindings for Python, Ruby, and others) |
S-expressions | Internet Draft author: Ron Rivest |
Lisp, Netstrings | Partial (largely de facto) |
"S-Expressions" Internet Draft | Yes ("Canonical representation") |
Yes ("Advanced transport representation") |
No | No | |
SCaViS | jWork.ORG | N/A | Yes | N/A | Yes | Yes (XML, Java Serialization, ProtocolBuffers) |
Yes | Yes (Java object persistency, XML, ProtocolBuffers) |
Yes (Native Java API, bindings for Jython, JRuby, Groovy and others) |
Smile | Tatu Saloranta | JSON | Yes | Smile Format Specification | Yes | No | No | No | No |
Structured Data eXchange Formats | Max Wildgrube | N/A | Yes | RFC 3072 | Yes | No | No | No | |
Thrift | Facebook (creator) Apache (maintainer) |
N/A | No | Original whitepaper | Yes | Partialc | No | Yes (built-in) | |
UBJSON | The Buzz Media, LLC | JSON, BSON | No | Yes | No | No | No | No | |
eXternal Data Representation (XDR) | Sun Microsystems (creator) IETF (maintainer) |
N/A | Yes | RFC 4506 | Yes | No | Yes | Yes | Yes |
XML | W3C | SGML | Yes | W3C Recommendations: 1.0 (Fifth Edition) 1.1 (Second Edition) |
Partial (Binary XML) |
Yes | Yes (XPointer, XPath) |
Yes (XML schema, RELAX_NG) |
Yes (DOM, SAX, XQuery, XPath) |
XML-RPC | Dave Winer[1] | XML, SOAP[1] | Yes | XML-RPC Specification | No | Yes | No | No | No |
YAML | Clark Evans, Ingy döt Net, and Oren Ben-Kiki |
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] | Yes | Version 1.2 | No | Yes | Yes | Partial (Kwalify, Rx, built-in language type-defs) |
No |
- a. ^ The current default format is binary.
- b. ^ The "classic" format is plain text, and an XML format is also supported.
- c. ^ Theoretically possible due to abstraction, but no implementation is included.
- d. ^ The primary format is binary, but a text format is available.[3]
- e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
Syntax comparison of human-readable formats
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|---|
ASN.1 (XML Encoding Rules) |
<foo /> |
<foo>true</foo> |
<foo>false</foo> |
<foo>685230</foo> |
<foo>6.8523015e+5</foo> |
<foo>A to Z</foo> |
<SeqOfUnrelatedDatatypes> <isMarried>true</isMarried> <hobby /> <velocity>-42.1e7</velocity> <bookname>A to Z</bookname> <bookname>We said, "no".</bookname> </SeqOfUnrelatedDatatypes> |
An object (the key is a field name):
<person> <isMarried>true</isMarried> <hobby /> <height>1.85</height> <name>Bob Peterson</name> </person> A data mapping (the key is a data value): <competition> <measurement> <name>John</name> <height>3.14</height> </measurement> <measurement> <name>Jane</name> <height>2.718</height> </measurement> </competition> |
Candle Markup | (), "" |
true |
false |
685230 -685230 |
6.8523015e+5 |
"A to Z" """ |
(true, (), -42.1e7, "A to Z") |
_{%342=true A%20to%20Z=(1, 2, 3)}or _{ _{key=42 value=true} _{key="A to Z" value=(1, 2, 3)} } |
CSVb | null a(or an empty element in the row)a |
1 atrue a |
0 afalse a |
685230 -685230 a |
6.8523015e+5 a |
A to Z "We said, ""no""." |
true,,-42.1e7,"A to Z" |
42,1 A to Z,1,2,3 |
Netstringsc | 0:, a4:null, a |
1:1, a4:true, a |
1:0, a5:false, a |
6:685230, a |
9:6.8523e+5, a |
6:A to Z, |
29:4:true,0:,7:-42.1e7,6:A to Z,, |
41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,, a |
JSON | null |
true |
false |
685230 -685230 |
6.8523015e+5 |
"A to Z" |
[true, null, -42.1e7, "A to Z"] |
{"42": true, "A to Z": [1, 2, 3]} |
OGDL | null a |
true a |
false a |
685230 a |
6.8523015e+5 a |
"A to Z" 'A to Z' NoSpaces |
true null -42.1e7 "A to Z"
|
42 true "A to Z" 1 2 3 42 true "A to Z", (1, 2, 3) |
OpenDDL | ref {null} |
bool {true} |
bool {false} |
int32 {685230} int32 {0x74AE} int32 {0b111010010101110} |
float {6.8523015e+5} |
string {"A to Z"} |
Homogeneous array:
int32 {1, 2, 3, 4, 5} Heterogeneous array: array { bool {true} ref {null} float {-42.1e7} string {"A to Z"} } |
dict { value (key = "42") {bool {true}} value (key = "A to Z") {int32 {1, 2, 3}} } |
PHP's serialize() & unserialize() |
N; |
b:1; |
b:0; |
i:685230; i:-685230; |
d:685230.150000000023283064365386962890625; d:INF; d:-INF; d:NAN; |
s:6:"A to Z"; |
a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} |
Associative array:a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}} Object: O:8:"stdClass":2:{s:4:"John";d:3.140000000000000124344978758017532527446746826171875;s:4:"Jane";d:2.717999999999999971578290569595992565155029296875;} |
Property list (plain text format)[4] |
N/A | <*BY> |
<*BN> |
<*I685230> |
<*R6.8523015e+5> |
"A to Z" |
( <*BY>, <*R-42.1e7>, "A to Z" ) |
{ "42" = <*BY>; "A to Z" = ( <*I1>, <*I2>, <*I3> ); } |
Property list (XML format)[5][6] |
N/A | <true /> |
<false /> |
<integer>685230</integer> |
<real>6.8523015e+5</real> |
<string>A to Z</string> |
<array> <true /> <real>-42.1e7</real> <string>A to Z</string> </array> |
<dict> <key>42</key> <true /> <key>A to Z</key> <array> <integer>1</integer> <integer>2</integer> <integer>3</integer> </array> </dict> |
S-expressions | NIL nil |
T #t etrue |
NIL #f efalse |
685230 |
6.8523015e+5 |
abc "abc" #616263# 3:abc {MzphYmM=} |YWJj| |
(T NIL -42.1e7 "A to Z") |
((42 T) ("A to Z" (1 2 3))) |
YAML | ~ null Null NULL [7] |
y Y yes Yes YES on On ON true True TRUE [8] |
n N no No NO off Off OFF false False FALSE [8] |
685230 +685_230 -685230 02472256 0x_0A_74_AE 0b1010_0111_0100_1010_1110 190:20:30 [9] |
6.8523015e+5 685.230_15e+03 685_230.15 190:20:30.15 .inf -.inf .Inf .INF .NaN .nan .NAN [10] |
A to Z "A to Z" 'A to Z' |
[y, ~, -42.1e7, "A to Z"]
- y - - -42.1e7 - A to Z |
{"John":3.14, "Jane":2.718}
42: y A to Z: [1, 2, 3] |
XMLd | <null /> a |
<boolean val="true"/> a
|
<boolean val="false"/> a
|
<integer>685230</integer> a |
<float>6.8523015e+5</float> a |
A to Z |
a<array> <element type="boolean">true</element> <element type="null"/> <element type="float">-42.1e7</element> <element type="string">A to Z</element> </array> |
a<associative-array> <entry> <key type="integer">42</key> <value type="boolean">true</value> </entry> <entry> <key type="string">A to Z</key> <value> <array> <element type="integer" val="1"/> <element type="integer" val="2"/> <element type="integer" val="3"/> </array> </value> </entry> </associative-array> |
XML-RPC | <value><boolean>1</boolean></value> |
<value><boolean>0</boolean></value> |
<value><int>685230</int></value> |
<value><double>6.8523015e+5</double></value> |
<value><string>A to Z</string></value> |
<value><array> <data> <value><boolean>1</boolean></value> <value><double>-42.1e7</double></value> <value><string>A to Z</string></value> </data> </array></value> |
<value><struct> <member> <name>42</name> <value><boolean>1</boolean></value> </member> <member> <name>A to Z</name> <value> <array> <data> <value><int>1</int></value> <value><int>2</int></value> <value><int>3</int></value> </data> </array> </value> </member> </struct> |
- a. ^ One possible encoding; the specification document does not specifically give an encoding for this datatype.
- b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- d. ^ XML in and of itself is not a data serialization language, but many data serialization formats have been derived from it; as such, there are many different ways, in addition to those shown, to serialize programming data structures into XML.
- e. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats
Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|
ASN.1 (BER, PER or OER encoding) |
NULL type | BOOLEAN; BER: as 1 byte in binary form; PER: as 1 bit; OER: as 1 byte | INTEGER; BER: variable-length big-endian binary representation (up to 2^(2^1024) bits); PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise; PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise; OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise | REAL; base-10 real values are represented as character strings in ISO 6093 format; binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent; the special values NaN, -INF, +INF, and negative zero are also supported | Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) | data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) | user definable type |
BSON[11] | Null type - 0 bytes for value | True: one byte \x01 False: \x00 |
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | double: little-endian binary64 | UTF-8 encoded, preceded by int32 encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
Concise Binary Object Representation (CBOR)[12] | \xf6 |
True: \xf5 False: \xf4 |
Small positive number \x00-\x17 , small negative number \x20-\x37 (abs(N) <= 23) 8bit: positive |
Typecode (one byte) + IEEE half/single/double | Typecode with length (like integer coding) and content. Bytestring and UTF-8 have different typecode |
Typecode with count (like integer coding) and items | Typecode with pairs count (like integer coding) and pairs |
MessagePack | \xc0 |
True: \xc3 False: \xc2 |
Single byte "fixnum" (values -32..127)
or typecode (one byte) + big-endian (u)int8/16/32/64 |
Typecode (one byte) + IEEE single/double | Typecode + up to 15 bytes or typecode + length as uint8/16/32 + bytes; encoding is unspecified[13] |
As "fixarray" (single-byte prefix + up to 15 array items)
or typecode (one byte) + 2-4 bytes length + array items |
As "fixmap" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + 2-4 bytes length + key-value pairs |
Netstrings | 0:, |
True: 1:1,
False: |
|||||
OGDL Binary | |||||||
Property list (binary format) |
|||||||
Protocol Buffers[14] | Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)
Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded |
floats: little-endian binary32
doubles: little-endian binary64 |
UTF-8 encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag | N/A | ||
Sereal | 0x25 |
True: 0x3b False: 0x3a |
Single byte POS/NEG (values -16..15)
or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer |
Typecode (one byte) + IEEE single/double/quad | As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)
or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes |
As "ARRAYREF" (single-byte prefix + up to 15 array items)
or typecode (one byte) + "varint" encoded length + array items |
As "HASHREF" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances. |
Smile | \x21 |
True: \x23 False: \x22 |
Single byte "small" (values -16..15 encoded using \xc0 - \xdf ),
zigzag-encoded |
IEEE single/double, BigDecimal |
Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references | Arbitrary-length heterogenous arrays with end-marker | Arbitrary-length key/value pairs with end-marker |
Structured Data eXchange Formats (SDXF) | big-endian signed 24bit or 32bit integer | big-endian IEEE double | either UTF-8 or ISO 8859-1 encoded | list of elements with identical ID and size, preceded by array header with int16 length | chunks can contain other chunks to arbitrary depth | ||
Thrift | |||||||
Transenc | 0x82 |
True: 0x81 False: 0x80 |
Single byte integers in the range [-32;127] Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers. Encoded as two's complement little-endian values. |
Little-endian IEEE single/double precision numbers. | UTF-8 encoded type-length-value string. | Balanced brackets with an optional array count. Arrays can be nested. | Balanced brackets with an optional object count. Objects can be nested. |
It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.
See also
References
- ↑ 1.0 1.1 http://www.xml.com/pub/a/ws/2001/04/04/soap.html
- ↑ Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
- ↑ https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
- ↑ http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
- ↑ http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
- ↑ http://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
- ↑ 8.0 8.1 Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ http://bsonspec.org
- ↑ RFC 7049
- ↑ https://github.com/msgpack/msgpack/blob/master/spec.md#formats-str
- ↑ https://developers.google.com/protocol-buffers/docs/encoding
External links
- XML-QL Proposal discussing XML benefits
- When to use XML
- XmlSucks at the Portland Pattern Repository
- Daring to Do Less with XML