Fielded text
Fielded Text is a proposed standard which provides structure and schema definition to text files which contain tables of values (for example, CSV files). The standard allows the format and structure of the data within the text file to be specified by a Meta file. This Meta file can then be used to access the data in the file in manner similar to which data is accessed in a database.
Meta files
The Meta files are XML files/streams which describe how the fielded text file is structured and how the data in fields is formatted. The information it contains is analogous to the Meta information for a database.
The Meta contains the following groups of information
- Main Section which specifies properties applying to the whole text file.
- Field Sections which specify the properties of each field of data used within the text file
- Substitution Sections which specify which substitutions are used within the text file. Substitutions are similar to Escape Sequences used in some CSV files (e.g. \n).
- Sequence Sections. A Fielded Text file can have lines with different sets of fields depending on the value of a key field(s). The Sequence Sections in the Meta File specify the sequence of fields which can follow a key field.
Meta files typically have a file extension of "ftm"
Declared and Undeclared Fielded Text files
A Fielded Text file can be either declared or undeclared.
A Declared Fielded Text file starts with 2 special lines which reference the Meta associated with the text file. The Meta reference can either be a URI, a file or embedded within the Text File as comments. Declared Fielded Text files always begin with the characters "|!Fielded Text^|" (without quotes) to identify the files as a declared Fielded Text file. A file extension of "ftx" is often used for Declared Fielded Text files. The standard also proposes that the MIME type text/fielded be used to identify Fielded Text data streams.
An Undeclared Fielded Text file does not start with the 2 special lines. It is not implicitly associated with a Fielded Text meta file/stream. Applications need to explicitly associate a Meta file with an undeclared Fielded Text file in order to determine its structure and format. Existing CSV, Fixed Length field and other text files containing table(s) of values are undeclared Fielded Text files.
Basic Example
Below is a basic CSV file. It has 2 heading lines and 4 data lines. The lines contain 7 fields of various types.
"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type" , (Years), , , (Dollars), , "Rover", 4.5, Brown, 12 Feb 2004, 80, True, "Dog" "Charlie", , Gold, 5 Apr 2007, 12.3, False, "Fish" "Molly", 2, Black, 12 Dec 2006, 25, False, "Cat" "Gilly", , White, 10 Apr 2007, 10, False, "Guinea Pig"
The following Fielded Text Meta file specifies the structure and layout (schema) of the above text file.
<?xml version="1.0" encoding="utf-16"?> <FieldedText HeadingLineCount="2"> <Field Name="PetName" /> <Field DataType="Float" Name="Age" /> <Field Name="Color" /> <Field DataType="DateTime" Name="DateReceived" Format="d MMM yyyy" /> <Field DataType="Decimal" Name="Price" /> <Field DataType="Boolean" Name="NeedsWalking" /> <Field Name="Type" /> </FieldedText>
Following is a Declared Fielded Text file which contains the above CSV text together with the its meta embedded as comments. The ~ character specifies a comment line.
~|!Fielded Text^| Version="1.0" ~ MetaEmbedded="True" ~ <?xml version="1.0" encoding="utf-16"?> ~ <FieldedText LineCommentChar="~" HeadingLineCount="2"> ~ <Field Name="PetName" /> ~ <Field DataType="Float" Name="Age" /> ~ <Field Name="Color" /> ~ <Field DataType="DateTime" Name="DateReceived" Format="d MMM yyyy" /> ~ <Field DataType="Decimal" Name="Price" /> ~ <Field DataType="Boolean" Name="NeedsWalking" /> ~ <Field Name="Type" /> ~ </FieldedText> "Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type" , (Years), , , (Dollars), , "Rover", 4.5, Brown, 12 Feb 2004, 80, True, "Dog" "Charlie", , Gold, 5 Apr 2007, 12.3, False, "Fish" "Molly", 2, Black, 12 Dec 2006, 25, False, "Cat" "Gilly", , White, 10 Apr 2007, 10, False, "Guinea Pig"
Capabilities
The Fielded Text standard aims to provide sufficient capabilities to handle nearly all existing text files containing tables of values while keeping the schema of the Meta as simple as possible. The following list summarises the capabilities of the Fielded Text standard:
- Fields separated by a delimiter character
- Fixed Length Fields
- Mixed Fixed Length and Delimiter separated fields in a line
- Quoted Fields (Optional or Explicit)
- New Lines in Quotes
- Automatic New Line detection or specified New Line character
- Comments
- Ignoring Blank Lines
- Ignoring extra characters/fields in lines
- Handling language cultures
- Multiple Heading Lines (both delimited and fixed length)
- Substitutions (escape sequences)
- Embedded (Stuffed) Quote Characters
- Boolean, DateTime, Decimal, Float, Integer and String fields
- Field Heading Constraints
- Null fields
- Constant fields
- Specify format and styles of fields
- Lines can have different sequences of fields based on the value of “key” fields
Specification
The draft specification of the Fielded Text standard can be found at the Fielded Text home page
See also
External links
- Fielded Text home page
- RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files