Comparison of data serialization formats
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Overview
Name | Creator-maintainer | Based on | Standardized? | Specification | Binary? | Human-readable? | Supports references?e | Schema-IDL? | Standard APIs | Supports Zero-copy operations |
---|---|---|---|---|---|---|---|---|---|---|
Apache Avro | Apache Software Foundation | N/A | Yes | Apache Avro™ 1.7.5 Specification | Yes | No | N/A | Yes (built-in) | N/A | N/A |
ASN.1 | ISO, IEC, ITU-T | N/A | Yes | ISO/IEC 8824; X.680 series of ITU-T Recommendations | Yes (BER, DER, PER, OER, or custom via ECN) |
Yes (XER, GSER, or custom via ECN) |
Partialf | Yes (built-in) | N/A | N/A |
Bencode | Bram Cohen (creator) BitTorrent, Inc. (maintainer) |
N/A | Yes | Part of BitTorrent protocol specification | Partially (numbers and delimiters are ASCII) |
No | No | No | No | N/A |
Binn | Bernardo Ramos | N/A | Yes | Binn Specification | Yes | No | No | No | No | Yes |
Bond | Microsoft | N/A | No | Bond IDL Specification | Yes | Yes (JSON, XML) |
No | Yes | No | N/A |
BSON | MongoDB | JSON | Yes | BSON Specification | Yes | No | No | No | No | N/A |
Candle Markup | Henry Luo | XML, JSON, JavaFX | Yes | Candle Markup Reference | No | Yes | Yes (XPointer, XPath) |
Yes (Candle Pattern Reference) |
Yes (XQuery, XPath) |
N/A |
Cap’n Proto | Kenton Varda | N/A | No | Cap'n Proto Encoding Spec | Yes | No | Yes | Yes | No | Yes |
Comma-separated values (CSV) | RFC author: Yakov Shafranovich |
N/A | Partial (myriad informal variants used) |
RFC 4180 (among others) |
No | Yes | No | No | No | No |
D-Bus Message Protocol | freedesktop.org | N/A | Yes | D-Bus Specification | Yes | No | No | Partial (Signature strings) |
Yes (see D-Bus) |
N/A |
Flat Buffers | N/A | N/A | flatbuffers github page Specification | Yes | No | N/A Supports references?e | Yes | C++, with support for Java, C# and Go | Yes | |
GVariant | GLib | D-Bus MP | Yes | GVariant Serialization | Yes | No | No | Yes (Type strings) |
No | N/A |
Fast Infoset | ISO, IEC, ITU-T | XML | Yes | ITU-T X.891 and ISO/IEC 24824-1:2007 | Yes | Yes (XML) |
Yes (XPointer, XPath) |
Yes (XML schema) |
Yes (DOM, SAX, XQuery, XPath) |
N/A |
HOCON | Typesafe Inc. | JSON | No | "HOCON (Human-Optimized Config Object Notation)" | No | Yes | Yes | ? | Yes (native Java API for all JVM languages) |
No |
JSON | Douglas Crockford | JavaScript syntax | Yes | RFC 7159 (ancillary: RFC 6901, RFC 6902) |
No, but see BSON, Smile, UBJSON | Yes | Yes (JSON Pointer (RFC 6901); alternately: JSONPath, JPath, JSPON, json:select()) |
Partial (JSON Schema Proposal, Kwalify, Rx, Itemscript Schema) |
Partial (Clarinet, JSONQuery, JSONPath) |
No |
KMIP | Oasis | n/a | Yes | Oasis | Yes (Tag, Type, Length, Value) | Yes | No | No | No | N/A |
MessagePack | Sadayuki Furuhashi | JSON (loosely) | Yes | MessagePack format specification | Yes | No | No | No | No | Yes |
Netstrings | Dan Bernstein | N/A | Yes | netstrings.txt | Yes | Yes | No | No | No | Yes |
OGDL | Rolf Veen | ? | Yes | Specification | Yes (Binary Specification) |
Yes | Yes (Path Specification) |
Yes (Schema WD) |
N/A | |
OpenDDL | Eric Lengyel | C, PHP | Yes | OpenDDL.org | No | Yes | Yes | No | Yes (OpenDDL Library) |
N/A |
PHP's serialize() & unserialize() |
PHP Group | N/A | Yes | No | Yes | Yes | Yes | No | Yes | N/A |
Data::Dumper format (Core Perl Module) | Gurusamy Sarathy (ActiveState developer) | Perl data types | Yes | No | ? | Yes | No | ? | Yes | N/A |
Property list | NeXT (creator) Apple (maintainer) |
? | Partial | Public DTD for XML format | Yesa | Yesb | No | ? | Cocoa, CoreFoundation, OpenStep, GnuStep | No |
Protocol Buffers (protobuf) | N/A | Yes | Developer Guide: Encoding | Yes | Partiald | No | Yes (built-in) | C++, Java, Python | No | |
ROOT | CERN & FNAL | N/A | No | N/A | Yes | Yes (optional XML output for debugging) |
Yes | Yes (C++ object persistency framework) |
Yes (Native C++ API, bindings for Python, Ruby, and others) |
N/A |
S-expressions | Internet Draft author: Ron Rivest |
Lisp, Netstrings | Partial (largely de facto) |
"S-Expressions" Internet Draft | Yes ("Canonical representation") |
Yes ("Advanced transport representation") |
No | No | N/A | |
SCaViS | jWork.ORG | N/A | Yes | N/A | Yes | Yes (XML, Java Serialization, ProtocolBuffers) |
Yes | Yes (Java object persistency, XML, ProtocolBuffers) |
Yes (Native Java API, bindings for Jython, JRuby, Groovy and others) |
N/A |
Smile | Tatu Saloranta | JSON | Yes | Smile Format Specification | Yes | No | No | Partial (JSON Schema Proposal, other JSON schemas/IDLs) |
Partial (via JSON APIs implemented with Smile backend, on Jackson, Python) |
N/A |
Structured Data eXchange Formats | Max Wildgrube | N/A | Yes | RFC 3072 | Yes | No | No | No | N/A | |
Thrift | Facebook (creator) Apache (maintainer) |
N/A | No | Original whitepaper | Yes | Partialc | No | Yes (built-in) | N/A | |
UBJSON | The Buzz Media, LLC | JSON, BSON | No | Yes | No | No | No | No | N/A | |
eXternal Data Representation (XDR) | Sun Microsystems (creator) IETF (maintainer) |
N/A | Yes | RFC 4506 | Yes | No | Yes | Yes | Yes | N/A |
XML | W3C | SGML | Yes | W3C Recommendations: 1.0 (Fifth Edition) 1.1 (Second Edition) |
Partial (Binary XML) |
Yes | Yes (XPointer, XPath) |
Yes (XML schema, RELAX_NG) |
Yes (DOM, SAX, XQuery, XPath) |
No |
XML-RPC | Dave Winer[1] | XML, SOAP[1] | Yes | XML-RPC Specification | No | Yes | No | No | No | No |
YAML | Clark Evans, Ingy döt Net, and Oren Ben-Kiki |
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] | Yes | Version 1.2 | No | Yes | Yes | Partial (Kwalify, Rx, built-in language type-defs) |
No | No |
- a. ^ The current default format is binary.
- b. ^ The "classic" format is plain text, and an XML format is also supported.
- c. ^ Theoretically possible due to abstraction, but no implementation is included.
- d. ^ The primary format is binary, but a text format is available.[3]
- e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
Syntax comparison of human-readable formats
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|---|
ASN.1 (XML Encoding Rules) |
<foo /> |
<foo>true</foo> |
<foo>false</foo> |
<foo>685230</foo> |
<foo>6.8523015e+5</foo> |
<foo>A to Z</foo> |
<SeqOfUnrelatedDatatypes>
<isMarried>true</isMarried>
<hobby />
<velocity>-42.1e7</velocity>
<bookname>A to Z</bookname>
<bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
|
An object (the key is a field name):
<person>
<isMarried>true</isMarried>
<hobby />
<height>1.85</height>
<name>Bob Peterson</name>
</person>
A data mapping (the key is a data value): <competition>
<measurement>
<name>John</name>
<height>3.14</height>
</measurement>
<measurement>
<name>Jane</name>
<height>2.718</height>
</measurement>
</competition>
|
Candle Markup | (), "" |
true |
false |
685230 -685230 |
6.8523015e+5 |
"A to Z" """ |
(true, (), -42.1e7, "A to Z") |
_{%342=true A%20to%20Z=(1, 2, 3)}or _{ _{key=42 value=true} _{key="A to Z" value=(1, 2, 3)} } |
CSVb | null a(or an empty element in the row)a |
1 atrue a |
0 afalse a |
685230 -685230 a |
6.8523015e+5 a |
A to Z "We said, ""no""." |
true,,-42.1e7,"A to Z" |
42,1 A to Z,1,2,3 |
Netstringsc | 0:, a4:null, a |
1:1, a4:true, a |
1:0, a5:false, a |
6:685230, a |
9:6.8523e+5, a |
6:A to Z, |
29:4:true,0:,7:-42.1e7,6:A to Z,, |
41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,, a |
JSON | null |
true |
false |
685230 -685230 |
6.8523015e+5 |
"A to Z" |
[true, null, -42.1e7, "A to Z"] |
{"42": true, "A to Z": [1, 2, 3]} |
OGDL | null a |
true a |
false a |
685230 a |
6.8523015e+5 a |
"A to Z" 'A to Z' NoSpaces |
true null -42.1e7 "A to Z"
|
42 true "A to Z" 1 2 3 42 true "A to Z", (1, 2, 3) |
OpenDDL | ref {null} |
bool {true} |
bool {false} |
int32 {685230} int32 {0x74AE} int32 {0b111010010101110} |
float {6.8523015e+5} |
string {"A to Z"} |
Homogeneous array:
int32 {1, 2, 3, 4, 5} Heterogeneous array: array { bool {true} ref {null} float {-42.1e7} string {"A to Z"} } |
dict { value (key = "42") {bool {true}} value (key = "A to Z") {int32 {1, 2, 3}} } |
PHP's serialize() & unserialize() |
N; |
b:1; |
b:0; |
i:685230; i:-685230; |
d:685230.150000000023283064365386962890625; d:INF; d:-INF; d:NAN; |
s:6:"A to Z"; |
a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} |
Associative array:a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}} Object: O:8:"stdClass":2:{s:4:"John";d:3.140000000000000124344978758017532527446746826171875;s:4:"Jane";d:2.717999999999999971578290569595992565155029296875;} |
Property list (plain text format)[4] |
N/A | <*BY> |
<*BN> |
<*I685230> |
<*R6.8523015e+5> |
"A to Z" |
( <*BY>, <*R-42.1e7>, "A to Z" ) |
{ "42" = <*BY>; "A to Z" = ( <*I1>, <*I2>, <*I3> ); } |
Property list (XML format)[5][6] |
N/A | <true /> |
<false /> |
<integer>685230</integer> |
<real>6.8523015e+5</real> |
<string>A to Z</string> |
<array>
<true />
<real>-42.1e7</real>
<string>A to Z</string>
</array>
|
<dict>
<key>42</key>
<true />
<key>A to Z</key>
<array>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</array>
</dict>
|
S-expressions | NIL nil |
T #t etrue |
NIL #f efalse |
685230 |
6.8523015e+5 |
abc "abc" #616263# 3:abc {MzphYmM=} |YWJj| |
(T NIL -42.1e7 "A to Z") |
((42 T) ("A to Z" (1 2 3))) |
YAML | ~ null Null NULL [7] |
y Y yes Yes YES on On ON true True TRUE [8] |
n N no No NO off Off OFF false False FALSE [8] |
685230 +685_230 -685230 02472256 0x_0A_74_AE 0b1010_0111_0100_1010_1110 190:20:30 [9] |
6.8523015e+5 685.230_15e+03 685_230.15 190:20:30.15 .inf -.inf .Inf .INF .NaN .nan .NAN [10] |
A to Z "A to Z" 'A to Z' |
[y, ~, -42.1e7, "A to Z"]
- y - - -42.1e7 - A to Z |
{"John":3.14, "Jane":2.718}
42: y A to Z: [1, 2, 3] |
XMLd | <null /> a |
<boolean val="true"/> a
|
<boolean val="false"/> a
|
<integer>685230</integer> a |
<float>6.8523015e+5</float> a |
A to Z |
a<array>
<element type="boolean">true</element>
<element type="null"/>
<element type="float">-42.1e7</element>
<element type="string">A to Z</element>
</array>
|
a<associative-array>
<entry>
<key type="integer">42</key>
<value type="boolean">true</value>
</entry>
<entry>
<key type="string">A to Z</key>
<value>
<array>
<element type="integer" val="1"/>
<element type="integer" val="2"/>
<element type="integer" val="3"/>
</array>
</value>
</entry>
</associative-array>
|
XML-RPC | <value><boolean>1</boolean></value> |
<value><boolean>0</boolean></value> |
<value><int>685230</int></value> |
<value><double>6.8523015e+5</double></value> |
<value><string>A to Z</string></value> |
<value><array>
<data>
<value><boolean>1</boolean></value>
<value><double>-42.1e7</double></value>
<value><string>A to Z</string></value>
</data>
</array></value>
|
<value><struct>
<member>
<name>42</name>
<value><boolean>1</boolean></value>
</member>
<member>
<name>A to Z</name>
<value>
<array>
<data>
<value><int>1</int></value>
<value><int>2</int></value>
<value><int>3</int></value>
</data>
</array>
</value>
</member>
</struct>
|
- a. ^ One possible encoding; the specification document does not specifically give an encoding for this datatype.
- b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- d. ^ XML in and of itself is not a data serialization language, but many data serialization formats have been derived from it; as such, there are many different ways, in addition to those shown, to serialize programming data structures into XML.
- e. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats
Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|
ASN.1 (BER, PER or OER encoding) |
NULL type | BOOLEAN:
|
INTEGER:
|
REAL:
base-10 real values are represented as character strings in ISO 6093 format; binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent; the special values NaN, -INF, +INF, and negative zero are also supported |
Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) | data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) | user definable type |
Binn[11] | \x00 |
True: \x01 False: \x02 |
big-endian 2's complement signed and unsigned 8/16/32/64 bits | single: big-endian binary32 double: big-endian binary64 |
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes | Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items | Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs |
BSON[12] | Null type - 0 bytes for value | True: one byte \x01 False: \x00 |
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | double: little-endian binary64 | UTF-8 encoded, preceded by int32 encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
Concise Binary Object Representation (CBOR)[13] | \xf6 |
True: \xf5 False: \xf4 |
Small positive number \x00-\x17 , small negative number \x20-\x37 (abs(N) <= 23) 8bit: positive |
Typecode (one byte) + IEEE half/single/double | Typecode with length (like integer coding) and content. Bytestring and UTF-8 have different typecode |
Typecode with count (like integer coding) and items | Typecode with pairs count (like integer coding) and pairs |
MessagePack | \xc0 |
True: \xc3 False: \xc2 |
Single byte "fixnum" (values -32..127)
or typecode (one byte) + big-endian (u)int8/16/32/64 |
Typecode (one byte) + IEEE single/double | Typecode + up to 15 bytes or typecode + length as uint8/16/32 + bytes; encoding is unspecified[14] |
As "fixarray" (single-byte prefix + up to 15 array items)
or typecode (one byte) + 2-4 bytes length + array items |
As "fixmap" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + 2-4 bytes length + key-value pairs |
Netstrings | 0:, |
True: 1:1,
False: |
|||||
OGDL Binary | |||||||
Property list (binary format) |
|||||||
Protocol Buffers[15] | Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)
Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded |
floats: little-endian binary32 | UTF-8 encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag | N/A | ||
Sereal | 0x25 |
True: 0x3b False: 0x3a |
Single byte POS/NEG (values -16..15)
or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer |
Typecode (one byte) + IEEE single/double/quad | As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)
or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes |
As "ARRAYREF" (single-byte prefix + up to 15 array items)
or typecode (one byte) + "varint" encoded length + array items |
As "HASHREF" (single-byte prefix + up to 15 key-value pairs)
or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances. |
Smile | \x21 |
True: \x23 False: \x22 |
Single byte "small" (values -16..15 encoded using \xc0 - \xdf ),
zigzag-encoded |
IEEE single/double, BigDecimal |
Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references | Arbitrary-length heterogenous arrays with end-marker | Arbitrary-length key/value pairs with end-marker |
Structured Data eXchange Formats (SDXF) | big-endian signed 24bit or 32bit integer | big-endian IEEE double | either UTF-8 or ISO 8859-1 encoded | list of elements with identical ID and size, preceded by array header with int16 length | chunks can contain other chunks to arbitrary depth | ||
Thrift | |||||||
Transenc | 0x82 |
True: 0x81 False: 0x80 |
Single byte integers in the range [-32;127] Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers. Encoded as two's complement little-endian values. |
Little-endian IEEE single/double precision numbers. | UTF-8 encoded type-length-value string. | Balanced brackets with an optional array count. Arrays can be nested. | Balanced brackets with an optional object count. Objects can be nested. |
It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.
See also
References
- 1 2 http://www.xml.com/pub/a/ws/2001/04/04/soap.html
- ↑ Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
- ↑ https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
- ↑ http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
- ↑ http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
- ↑ http://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
- 1 2 Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ↑ https://github.com/liteserver/binn/blob/master/spec.md
- ↑ http://bsonspec.org
- ↑ RFC 7049
- ↑ https://github.com/msgpack/msgpack/blob/master/spec.md#formats-str
- ↑ https://developers.google.com/protocol-buffers/docs/encoding
External links
- XML-QL Proposal discussing XML benefits
- When to use XML
- XmlSucks at the Portland Pattern Repository
- Daring to Do Less with XML