Flat file

From Wikipedia, the free encyclopedia

A flat file is a computer file that is usually read or written sequentially and does not have indexes that can be individuated from the individual records. It consists of one or more records. Each record contains one or more field instances. Each field instance can contain a data value, or be omitted. Some definitions state that all records must be of the same type. This restriction is usual when discussing a flat file database. However, most usages allow a flat file to have more than one record type.

Flat files may have variable length records or fixed length records. In flat files that have variable length records the file must be read sequentially, however, with fixed length records a flat file may be accessed randomly, although the lack of indexes makes this approach less desireable than, for instance, using a database table for quickly finding a specific record.

Flat files date back to the earliest days of computer processing. Originally flat files were stored on punch cards, paper tape, or magnetic tape. These are inherently sequential. Flat files are still widely used, even for files stored on a disk. One reason is that sequential access is faster than indexed access, (also known as random access or direct access). Flat files are often used to transmit data between batch processing systems, especially on mainframes.

Flat files are often described using a COBOL copybook which defines the type, length, and other properties of the fields and records[citation needed].

Often each field has a fixed width. In the common case when all the fields and all the records are fixed width the flat file can be called a fixed width file. In a fixed width file there typically is no field delimiter and no record delimiter and field instances are never omitted. An empty field is indicated using a special filler value, e.g. spaces or zeroes. Fixed width records often contain 80 bytes, the width of a punch card.

In a variable width record the fields are separated using a special character such as the tab character, the comma, or the pipe character. Sometimes field values are enclosed in quotation marks, and any internal quotation marks are doubled. The most common record delimiter is the newline. See CSV file for a more detailed description of this kind of file.

There can be records of many different types in the same flat file. A typical approach is for a file to have zero or more header records, one or more detail records, zero or more summary records, and zero or more trailer records.

A flat file does not have any indexes, and does not have any internal pointers. An ISAM file or a VSAM file is not a flat file, because these file types support indexed access in addition to the sequential access method.

Flat files are still widely used for data transmission because they are compact and support high performance operations. Transmitting the same data using a relational approach would require many tables, one for each different record type. Another difference between flat files and relational tables is that in a flat file the order of the records can matter. Yet another difference is that in a flat file a field can occur more than once in a record. An ETL system will generally sort the input file before submitting it to the database's bulk loader, in order to reduce total elapsed time. Long before there were any databases a master file was "joined" to a detail file by sorting them both on a common key, e.g. part number, and then doing a merge.

Like a flat file, an XML file can contain many different types of data. There are many possible ways to represent the information in a flat file using XML. For example, each field and each record could be an XML element. One advantage of using XML would be that each field is named. A disadvantage is that the file would be larger. A file containing XML is not generally called a flat file, even though it satisfies the definition. It usually is called an XML file.

Flat files are also called feed files, or batch files. They are often transmitted over a network using ftp, the file transfer protocol, or a newer secure alternative, e.g. sftp. Flat files are also used in EDI, Electronic Data Interchange.

The Jargon file and The New Hacker's Dictionary edited by Eric S. Raymond, 1991 contain this definition for flat-file:

A flattened representation of some database or tree or network structure, as a single file from which the structure could implicitly be rebuilt, esp. one in flat-ASCII form.

Flat file may also refer to:

[edit] See also