Data Interchange Format

From Wikipedia, the free encyclopedia

Data Interchange Format (.dif) is a text file format used to import/export single spreadsheets between spreadsheet programs (Excel, Gnumeric, StarCalc, Lotus 1-2-3, FileMaker, dBase, Framework, Multiplan, etc.). One limitation is that DIF format cannot handle multiple spreadsheets in a single workbook.

[edit] Syntax

DIF stores everything in an ASCII text file to mitigate many cross-platform issues. The file is divided into 2 sections: header and data. Everything in DIF is represented by a 2- or 3-line chunk. Headers get a 3-line chunk; data, 2. Header chunks start with a text identifier that is all caps, only alphabetic characters, and less than 32 letters. The following line must be a pair of numbers, and the third line must be a quoted string. On the other hand, data chunks start with a number pair and the next line is a string.


For example, assume we have two columns with one column header row and two data rows:

Text Number
hello 1
has a double quote " in text -3

In a .dif file, this would be

TABLE
0,1
"EXCEL"
VECTORS
0,2
""
TUPLES
0,3
""
DATA
0,0
""
-1,0
BOT
1,0
"Text"
1,0
"Number"
-1,0
BOT
1,0
"hello"
0,1
V
-1,0
BOT
1,0
"has a double quote "" in text"
0,-3
V
-1,0
EOD

[edit] Specification

The DIF File Structure
DIF (Data Interchange Format) is a program-independent method of storing 
data. DIF files are ASCll text files. The format uses a brief line 
length to make the files as universally compatible as possible with 
application software, languages, operating systems and computer 
hardware. 

 A DIF file is oriented towards row-and-column data, such as a 
spreadsheet or data-base manager might produce. Because individual 
programs may "rotate" the rows and columns, DIF uses the terms vector 
and tuple. You may generally interpret vector as column and tuple as 
row. DIF files contain two sections: a file header and a data section. 


The DIF Header

There are four required entries in the DIF header, and a number of 
optional entries. The format of all header entries is 


  < topic >
  < vector # > , < numerical value >
  " < string value > "

  where

  < topic > is a "token," generally 32 characters or fewer.
  < vector # > is O if specifying the entire file.
  < numerical value > is O unless a value is specified.
  < string value > is "" (double quotations with no space between) if it is not used.

The first required item in a DIF file is the title. For a typical 
spreadsheet, this would look like: 


  TABLE
  0, < version # >
  " < title > "

  where
  < version # > is 1.
  < title > is the title of the table.

                     
The next required item is the vector count. This specifies the number of 
vectors (columns). Its format is 


  VECTORS
  0, < count >

  where
  
  < count > is the number of vectors. This entry may appear anywhere in 
           the header, but must appear before any entries that specify vector 
           numbers. 


The third required item is the tuple count. This specifies the length of 
the vectors (the number of rows). Its format is 


  TUPLES
  0, < count >

  where
   < count > is the number of tuples.

The final required header item is DATA, which specifies the division of 
the header information from the data proper. DATA must be the last 
header item. Its format is: 


  DATA
  0,0

Optional Header Items
Other header entries are optional. DIF Clearinghouse has included 
optional entries. Some are "standard" as a result of their being used in 
particular software products. The optional header entry items are: 
label, comment, field size, time series, significant values, and 
measure. 


   - Permits enhanced description of a vector
     COMMENT
- Labels a specific < vector # > , < line # >
       " < comment > "
  LABEL
   < vector # > , < line # >
   " < label > "" < comment > "

   where
   < vector # > is the label < vector # > is the commented vector.
   < line # > allows for labeling more than one < line # > 
              may refer to more than one line.
   < label > is the label string.< comment > is the comment string.

   
   - Allocates fixed field sizes for each vector
   
     SIZE
     < vector # > , < # bytes >
   
     where
     < vector # > is the vector being sized.
     < # bytes > is the size.
   
   - Specifies the period in a time series:
   
     PERIODICITY
     < vector # > , < period >
   
     where
     < vector # > is the specified vector.
     < period > is the time period.
   
   - Indicates first year of a time series:
   - Indicates first period of a time series:
   
     MINORSTART
     < VeCtOr # ) , < Start )< vector # > , < start >
   
     where
     < vector # > is the specified vector.
     < start > is the start of the time series.
   
     
     - Indicates the portion of a vector that contains 
       significant values:
     
       TRUELENGTH
       < vector # > , < length >
     
       where
       < vector # > is the specified vector.
       < length > is the length of that vector that contains 
                  significant values.
     
     - Units of measure for a given vectoc
     
       UNITS
       < vector # > ,0
       " < name > "
     
       where 
       < vector # > is the specified vector.
       < name > is the name string of the units to be applied.
     
     - Units in which a given vector should be displayed:
     
       DISPLAYUNITS
        < vector # >,0
       " < name > "
     
       where
        < vector # > is the specified vector.
        < name > is the name string of the units used to display 
                 the vector. (This may be different from the units 
                 used to measure the vector.)
     
DIF Data Section
The data section is organized in a series of tuples. Data within each 
tuple is organized in vector sequence. Essentially, using a spreadsheet 
as a data model, this means one data entry to a cell, in ascending 
column position, then by ascending row position. 


There are two "special data values," BOT (Beginning of Tuple) and EOD 
(End of Data). BOT marks the start of each tuple. EOD terminates the DIF 
file. 

Each data entry is organized in the following manner

  < type indicator >, < numerical value >
  < string value >

  where
  < type indicator > is one of three different indicators:

        -1       special data value
                 < numeric value > is O
                 < string value> is BOT, EOD
         O       numeric data (signed decimal number)
                 < numeric value > is numeric data
                 < string value > is one of the Value Indicators
                 (see below)
         1       string data
                 < numeric value > is O
                 < string value> is string data

Value Indicator

There are five value indicators to use as the < string value> when the
<type indicator> = 0:
            V       value

            NA      not available
                    < numeric value > must be O

            ERROR   error condition
                    < numeric value > must be O
            TRUE    < numeric value > is 1
            FALSE   < numeric value > is O

[edit] External links

Languages