Data Format Description Language

Data Format Description Language (DFDL, often pronounced daff-o-dil), published as an Open Grid Forum Proposed Recommendation in January 2011, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema allows any text or binary data to be read (or "parsed") from its native format and to be presented as an instance of an information set. (An information set is a logical representation of the data contents, independent of the physical format. For example, two records could be in different formats, because one has fixed-length fields and the other uses delimiters, but they could contain exactly the same data, and would both be represented by the same information set). The same DFDL schema also allows data to be taken from an instance of an information set and written out (or "serialized") to its native format.

DFDL is descriptive and not prescriptive. DFDL is not a data format, nor does it impose the use of any particular data format. Instead it provides a standard way of describing many different kinds of data format. This approach has several advantages.[1] It allows an application author to design an appropriate data representation according to their requirements while describing it in a standard way which can be shared, enabling multiple programs to directly interchange the data.

DFDL achieves this by building upon the facilities of W3C XML Schema 1.0. A subset of XML Schema is used, enough to enable the modeling of non-XML data. The motivations for this approach are to avoid inventing a completely new schema language, and to make it easy to convert general text and binary data, via a DFDL information set, into a corresponding XML document.

Educational material is available in the form of a DFDL Tutorial, videos and several hands-on DFDL labs.

History

DFDL was created in response to a need for grid APIs to be able to understand data regardless of source. A language was needed capable of modeling a wide variety of existing text and binary data formats. A working group was established at the Global Grid Forum (which later became the Open Grid Forum) in 2003 to create a specification for such a language.

A decision was made early on to base the language on a subset of W3C XML Schema, using <xs:appinfo> annotations to carry the extra information necessary to describe non-XML physical representations. This is an established approach that is already being used today in commercial systems. DFDL takes this approach and evolves it into an open standard capable of describing many text or binary data formats.

Work continued on the language, resulting in the publication of a DFDL 1.0 specification as OGF Proposed Recommendation GFD.174 in January 2011. The latest revision is GFD.207 published in November 2014 which obsoletes GFD.174 and incorporates all issues noted to date (also available as html). A summary of DFDL and its features is available at the OGF. Any issues with the specification are being tracked using Redmine issue trackers.

Implementations

Implementations of DFDL processors that can parse and serialize data using DFDL schemas are available.

A presentation is available that describes IBM DFDL and Daffodil.

A public repository for DFDL schemas that describe commercial and scientific data formats has been established on GitHub. DFDL schemas for formats like UN/EDIFACT, NACHA and ISO8583 are available for free download.

Example

Take as an example the following text data stream which gives the name, age and location of a person:

Joe Bloggs,46,Hampshire,England

The logical model for this data can be described by the following fragment of an XML Schema document. The order, names, types and cardinality of the fields are expressed by the XML schema model.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" ...>

<xs:complexType name="person_type">
  <xs:sequence>
    <xs:element name="name" type="xs:string"/>
    <xs:element name="age" type="xs:short"/>
    <xs:element name="county" type="xs:string"/>
    <xs:element name="country" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

</xs:schema>

To additionally model the physical representation of the data stream, DFDL augments the XML schema fragment with annotations on the xs:element and xs:sequence objects, as follows:

<xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:xs="http://www.w3.org/2001/XMLSchema" ...>

<xs:complexType name="person_type">
  <xs:sequence>
    <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:sequence encoding="ASCII" sequenceKind="ordered" 
                       separator="," separatorType="infix" separatorPolicy="required"/>                   
    </xs:appinfo></xs:annotation>
    <xs:element name="name" type="xs:string">
      <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:element lengthKind="delimited" encoding="ASCII"/>                   
      </xs:appinfo></xs:annotation>
    </xs:element>
    <xs:element name="age" type="xs:short">
      <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:element representation="text" lengthKind="delimited" encoding="ASCII"
                      textNumberRep="standard" textNumberPattern="#0" textNumberBase="10"/>                   
      </xs:appinfo></xs:annotation>
    </xs:element>
    <xs:element name="county" type="xs:string">
      <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:element lengthKind="delimited" encoding="ASCII"/>                   
      </xs:appinfo></xs:annotation>
    </xs:element>
    <xs:element name="country" type="xs:string">
      <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:element lengthKind="delimited" encoding="ASCII"/>                   
      </xs:appinfo></xs:annotation>
    </xs:element>
  </xs:sequence>
</xs:complexType>

</xs:schema>

The property attributes on these DFDL annotations express that the data are represented in an ASCII text format with fields being of variable length and delimited by commas

An alternative, more compact syntax is also provided, where DFDL properties are carried as non-native attributes on the XML Schema objects themselves.

<xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:xs="http://www.w3.org/2001/XMLSchema" ...>

<xs:complexType name="person_type">
  <xs:sequence dfdl:encoding="ASCII" dfdl:sequenceKind="ordered" 
               dfdl:separator="," dfdl:separatorType="infix" dfdl:separatorPolicy="required">
    <xs:element name="name" type="xs:string"
                dfdl:lengthKind="delimited" dfdl:encoding="ASCII"/>                   
    <xs:element name="age" type="xs:short"
                dfdl:representation="text" dfdl:lengthKind="delimited" dfdl:encoding="ASCII"
                dfdl:textNumberRep="standard" dfdl:textNumberPattern="##0" dfdl:textNumberBase="10"/>                   
    <xs:element name="county" type="xs:string"
                dfdl:lengthKind="delimited" dfdl:encoding="ASCII"/>                   
    <xs:element name="country" type="xs:string"
                dfdl:lengthKind="delimited" dfdl:encoding="ASCII"/>                   
  </xs:sequence>
</xs:complexType>

</xs:schema>

Features

The goal of DFDL is to provide a rich modeling language capable of representing any text or binary data format. The 1.0 release is a major step towards this goal. The capability includes support for:

See also

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.