Comparison of data serialization formats

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
Apache Avro Apache Software Foundation N/A Yes Apache Avro™ 1.7.5 Specification Yes No N/A Yes (built-in) N/A N/A
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, OER, or custom via ECN)
Yes
(XER, GSER, or custom via ECN)
Partialf Yes (built-in) N/A N/A
Bencode Bram Cohen (creator)
BitTorrent, Inc. (maintainer)
N/A Yes Part of BitTorrent protocol specification Partially
(numbers and delimiters are ASCII)
No No No No N/A
Binn Bernardo Ramos N/A Yes Binn Specification Yes No No No No Yes
Bond Microsoft N/A No Bond IDL Specification Yes Yes
(JSON,
XML)
No Yes No N/A
BSON MongoDB JSON Yes BSON Specification Yes No No No No N/A
Candle Markup Henry Luo XML, JSON, JavaFX Yes Candle Markup Reference No Yes Yes
(XPointer, XPath)
Yes
(Candle Pattern Reference)
Yes
(XQuery, XPath)
N/A
Cap’n Proto Kenton Varda N/A No Cap'n Proto Encoding Spec Yes No Yes Yes No Yes
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No No
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes No No Partial
(Signature strings)
Yes
(see D-Bus)
N/A
Flat Buffers Google N/A N/A flatbuffers github page Specification Yes No N/A Supports references?e Yes C++, with support for Java, C# and Go Yes
GVariant GLib D-Bus MP Yes GVariant Serialization Yes No No Yes
(Type strings)
No N/A
Fast Infoset ISO, IEC, ITU-T XML Yes ITU-T X.891 and ISO/IEC 24824-1:2007 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
HOCON Typesafe Inc. JSON No "HOCON (Human-Optimized Config Object Notation)" No Yes Yes ? Yes
(native Java API for all JVM languages)
No
JSON Douglas Crockford JavaScript syntax Yes RFC 7159
(ancillary:
RFC 6901,
RFC 6902)
No, but see BSON, Smile, UBJSON Yes Yes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select())
Partial
(JSON Schema Proposal, Kwalify, Rx, Itemscript Schema)
Partial
(Clarinet, JSONQuery, JSONPath)
No
KMIP Oasis n/a Yes Oasis Yes (Tag, Type, Length, Value) Yes No No No N/A
MessagePack Sadayuki Furuhashi JSON (loosely) Yes MessagePack format specification Yes No No No No Yes
Netstrings Dan Bernstein N/A Yes netstrings.txt Yes Yes No No No Yes
OGDL Rolf Veen ? Yes Specification Yes
(Binary Specification)
Yes Yes
(Path Specification)
Yes
(Schema WD)
N/A
OpenDDL Eric Lengyel C, PHP Yes OpenDDL.org No Yes Yes No Yes
(OpenDDL Library)
N/A
PHP's serialize() & unserialize() PHP Group N/A Yes No Yes Yes Yes No Yes N/A
Data::Dumper format (Core Perl Module) Gurusamy Sarathy (ActiveState developer) Perl data types Yes No ? Yes No ? Yes N/A
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep No
Protocol Buffers (protobuf) Google N/A Yes Developer Guide: Encoding Yes Partiald No Yes (built-in) C++, Java, Python No
ROOT CERN & FNAL N/A No N/A Yes Yes
(optional XML output for debugging)
Yes Yes
(C++ object persistency framework)
Yes
(Native C++ API, bindings for Python, Ruby, and others)
N/A
S-expressions Internet Draft author:
Ron Rivest
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No N/A
SCaViS jWork.ORG N/A Yes N/A Yes Yes
(XML, Java Serialization, ProtocolBuffers)
Yes Yes
(Java object persistency, XML, ProtocolBuffers)
Yes
(Native Java API, bindings for Jython, JRuby, Groovy and others)
N/A
Smile Tatu Saloranta JSON Yes Smile Format Specification Yes No No Partial
(JSON Schema Proposal, other JSON schemas/IDLs)
Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)
N/A
Structured Data eXchange Formats Max Wildgrube N/A Yes RFC 3072 Yes No No No N/A
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in) N/A
UBJSON The Buzz Media, LLC JSON, BSON No Yes No No No No N/A
eXternal Data Representation (XDR) Sun Microsystems (creator)
IETF (maintainer)
N/A Yes RFC 4506 Yes No Yes Yes Yes N/A
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Binary XML)
Yes Yes
(XPointer, XPath)
Yes
(XML schema, RELAX_NG)
Yes
(DOM, SAX, XQuery, XPath)
No
XML-RPC Dave Winer[1] XML, SOAP[1] Yes XML-RPC Specification No Yes No No No No
YAML Clark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] Yes Version 1.2 No Yes Yes Partial
(Kwalify, Rx, built-in language type-defs)
No No

Syntax comparison of human-readable formats

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

Candle Markup (), "" true false 685230
-685230
6.8523015e+5 "A to Z"
"""
A
to
Z
"""
(true, (), -42.1e7, "A to Z")
_{%342=true A%20to%20Z=(1, 2, 3)}
or
_{
  _{key=42 value=true}
  _{key="A to Z" value=(1, 2, 3)}
}
CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z" [true, null, -42.1e7, "A to Z"] {"42": true, "A to Z": [1, 2, 3]}
OGDL nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
OpenDDL ref {null} bool {true} bool {false} int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}
float {6.8523015e+5} string {"A to Z"} Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
PHP's serialize() & unserialize() N; b:1; b:0; i:685230;
i:-685230;
d:685230.150000000023283064365386962890625;
d:INF;
d:-INF;
d:NAN;
s:6:"A to Z"; a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} Associative array:
a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}
Object:
O:8:"stdClass":2:{s:4:"John";d:3.140000000000000124344978758017532527446746826171875;s:4:"Jane";d:2.717999999999999971578290569595992565155029296875;}
Property list
(plain text format)[4]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[5][6]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
S-expressions NIL
nil
T
#te
true
NIL
#fe
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
YAML ~
null
Null
NULL[7]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[8]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[8]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[9]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[10]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLd <null />a <boolean val="true"/>a

<true />a

<boolean val="false"/>a

<false />a

<integer>685230</integer>a <float>6.8523015e+5</float>a A to Z a
<array>
  <element type="boolean">true</element>
  <element type="null"/>
  <element type="float">-42.1e7</element>
  <element type="string">A to Z</element>
</array>
a
<associative-array>
  <entry>
    <key type="integer">42</key>
    <value type="boolean">true</value>
  </entry>
  <entry>
    <key type="string">A to Z</key>
    <value>
      <array>
        <element type="integer" val="1"/>
        <element type="integer" val="2"/>
        <element type="integer" val="3"/>
      </array>
    </value>
  </entry>
</associative-array>
XML-RPC <value><boolean>1</boolean></value> <value><boolean>0</boolean></value> <value><int>685230</int></value> <value><double>6.8523015e+5</double></value> <value><string>A to Z</string></value>
<value><array>
  <data>
  <value><boolean>1</boolean></value>
  <value><double>-42.1e7</double></value>
  <value><string>A to Z</string></value>
  </data>
  </array></value>
<value><struct>
  <member>
    <name>42</name>
    <value><boolean>1</boolean></value>
    </member>
  <member>
    <name>A to Z</name>
    <value>
      <array>
        <data>
          <value><int>1</int></value>
          <value><int>2</int></value>
          <value><int>3</int></value>
          </data>
        </array>
      </value>
    </member>
</struct>

Comparison of binary formats

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER, PER or OER encoding)
NULL type BOOLEAN:
  • BER: as 1 byte in binary form;
  • PER: as 1 bit;
  • OER: as 1 byte
INTEGER:
  • BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
REAL:

base-10 real values are represented as character strings in ISO 6093 format;

binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;

the special values NaN, -INF, +INF, and negative zero are also supported

Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
Binn[11] \x00 True: \x01
False: \x02
big-endian 2's complement signed and unsigned 8/16/32/64 bits single: big-endian binary32
double: big-endian binary64
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
BSON[12] Null type - 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
Concise Binary Object Representation (CBOR)[13] \xf6 True: \xf5
False: \xf4
Small positive number \x00-\x17, small negative number \x20-\x37 (abs(N) <= 23)

8bit: positive \x18\xhh, negative \x38\xhh
16bit: positive \x19<uint16_t>, negative \x39<uint16_t>
32bit: positive \x1A<uint32_t>, negative \x3A<uint32_t>
64bit: positive \x1B<uint64_t>, negative \x3B<uint64_t>
Negative number x encoded as ~x (binary inversion) or as (-x-1)
Byte order - Big-endian

Typecode (one byte) + IEEE half/single/double Typecode with length (like integer coding) and content.

Bytestring and UTF-8 have different typecode

Typecode with count (like integer coding) and items Typecode with pairs count (like integer coding) and pairs
MessagePack \xc0 True: \xc3
False: \xc2
Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double Typecode + up to 15 bytes
or
typecode + length as uint8/16/32 + bytes;
encoding is unspecified[14]
As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2-4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2-4 bytes length + key-value pairs

Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers[15] Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 64 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes Repeated value with the same tag N/A
Sereal 0x25 True: 0x3b
False: 0x3a
Single byte POS/NEG (values -16..15)

or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer

Typecode (one byte) + IEEE single/double/quad As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)

or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes

As "ARRAYREF" (single-byte prefix + up to 15 array items)

or typecode (one byte) + "varint" encoded length + array items

As "HASHREF" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances.

Smile \x21 True: \x23
False: \x22
Single byte "small" (values -16..15 encoded using \xc0 - \xdf),

zigzag-encoded varints (1 - 11 databytes), or BigInteger

IEEE single/double, BigDecimal Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references Arbitrary-length heterogenous arrays with end-marker Arbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF) big-endian signed 24bit or 32bit integer big-endian IEEE double either UTF-8 or ISO 8859-1 encoded list of elements with identical ID and size, preceded by array header with int16 length chunks can contain other chunks to arbitrary depth
Thrift
Transenc 0x82 True: 0x81
False: 0x80
Single byte integers in the range [-32;127]

Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers.

Encoded as two's complement little-endian values.

Little-endian IEEE single/double precision numbers. UTF-8 encoded type-length-value string. Balanced brackets with an optional array count. Arrays can be nested. Balanced brackets with an optional object count. Objects can be nested.

It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.

See also

References

  1. 1 2 http://www.xml.com/pub/a/ws/2001/04/04/soap.html
  2. Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
  3. https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
  4. http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
  5. http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
  6. http://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
  7. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
  8. 1 2 Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  9. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  10. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  11. https://github.com/liteserver/binn/blob/master/spec.md
  12. http://bsonspec.org
  13. RFC 7049
  14. https://github.com/msgpack/msgpack/blob/master/spec.md#formats-str
  15. https://developers.google.com/protocol-buffers/docs/encoding

External links

This article is issued from Wikipedia - version of the Thursday, December 10, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.