JSON Streaming
JSON streaming are communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols (such as TCP), that ensures individual JSON objects are recognized, when the server and clients use the same one (e.g. implicitly coded in).
Introduction
JSON is a popular format for exchanging object data between systems. Frequently there's a need for a stream of objects to be sent over a single connection, such as a stock ticker or application log records.[1] In these cases there's a need to identify where one JSON encoded object ends and the next begins. Technically this is known as framing.
There are two common ways to achieve this:
- Send the JSON objects formatted without newlines and use a newline as the delimiter.[2]
- Send the JSON objects concatenated with no delimiters and rely on a streaming parser to extract them.
Line delimited JSON
Line delimited JSON streaming makes use of the fact that the JSON format does not allow newline characters within values (they have to be escaped as `\n`) and that most JSON formatters default to not including any whitespace, including newlines. These features allows the newline characters to be used as a delimiter.
This example shows two JSON objects (the implicit newline characters at the end of each line are not shown):
{"some":"thing\n"}
{"may":{"include":"nested","objects":["and","arrays"]}}
The use of a newline as a delimiter enables this format to work very well with traditional line-oriented UNIX tools.
Concatenated JSON
Concatenated JSON streaming allows the sender to simply write each JSON object into the stream with no delimiters. It relies on the receiver using a streaming parser to recognize and emit each JSON object as the terminating character is parsed. Concatenated JSON isn't a new format, it's simply a name for streaming multiple JSON objects without any delimiters.
The advantage of this format is that it can handle JSON objects that have been formatted with embedded newline characters, e.g., pretty-printed for human readability. For example, these two inputs are both valid and produce the same output:
{"some":"thing\n"}{"may":{"include":"nested","objects":["and","arrays"]}}
{"some":"thing\n"}
{"may": {
"include":"nested",
"objects":[
"and","arrays"
]
}}
Implementations that rely on line-based input may require a newline character after each JSON object in order for the object to be emitted by the parser in a timely manner. (Otherwise the line may remain in the input buffer without being passed to the parser.) This is rarely recognised as an issue because terminating JSON objects newline character is very common.
Comparison
Line delimited JSON works very well with traditional line-oriented tools.
Concatenated JSON works with pretty-printed JSON but requires more effort and complexity to parse. It doesn't work well with traditional line-oriented tools. Concatenated JSON streaming is a superset of line delimited JSON streaming.
Compatibility
Line delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines within a JSON object can't be read by a line delimited JSON parser.
The terms "line delimited JSON" and "newline delimited JSON" are often used without clarifying if embedded newlines are supported.
There's also a format known as NDJ ("Newline delimited JSON")[3] which allows comments to be embedded if the first two characters of a given line are "//". This can't be used with standard JSON parsers if comments are included.
Concatenated JSON can be converted into Line delimited JSON by a suitable JSON utility such as jq. For example
jq --compact-output . < concatenated.json > lines.json
Applications and tools
Line delimited JSON
- Fluentd tries to structure data as JSON as much as possible: this allows Fluentd to unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations (Unified Logging Layer). The downstream data processing is much easier with JSON, since it has enough structure to be accessible while retaining flexible schemas.[4]
- logstash includes a json_lines codec.[5]
- ldjson-stream module for [Node.js]
Concatenated JSON
- Noggit Solr's streaming JSON parser for Java [6]
- jq lightweight flexible command-line JSON processor
- Yajl - Yet Another JSON Library. YAJL is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator.
References
- ↑ Ryan, Film Grain. "How We Built Filmgrain, Part 2 of 2". filmgrainapp.com. Retrieved 4 July 2013.
- ↑ "JSON Lines".
- ↑ "Newline Delimited JSON".
- ↑ "What is Fluentd?". Fluentd. Retrieved 13 August 2015.
- ↑ "Centralized Logging with Monolog, Logstash, and Elasticsearch".
- ↑ "Noggit Streaming JSON parser".