Filename extension | .csv or .txt |
---|---|
Internet media type | text/csv |
Type of format | multiplatform, serial data streams |
Container for | database information organized as field separated lists |
Standard(s) | RFC 4180 |
A comma-separated values (CSV) file stores tabular data (numbers and text) in plain-text form. As a result, such a file is easily human-readable (e.g., in a text editor).
CSV is a simple file format that is widely supported by consumer, business, and scientific applications. Among its most common uses is to move tabular data between programs that naturally operate on a more efficient or complete proprietary format. For example: a CSV file might be used to transfer information from a database program to a spreadsheet.
Contents |
The name indicates the use of the comma to separate data fields, but "CSV" is often applied to files using other delimiters. Different CSV-like implementations frequently arise as the format is modified to handle richer table content such as allowing a different field separator character (necessary if numeric fields are written with a comma instead of a decimal point) or extensions to allow numbers, the separator character, or newline characters in text fields.
This has led to the assertion that there is no “CSV standard”:[1][2] only the understanding that plain text is delimited by a symbol. In common usage almost any delimiter-separated text data may be referred to as a CSV file. Traditionally, however, lines in the text file represent rows in a table, and commas separate the columns. This traditional understanding is embodied in RFC 4180, which is the best known effort to formalize a CSV standard.
Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma):
Year,Make,Model,Length 1997,Ford,E350,2.34 2000,Mercury,Cougar,2.38
Example of an analogous German and Dutch CSV/DSV file (where the decimal separator is a comma and the value separator is a semicolon):
Year;Make;Model;Length 1997;Ford;E350;2,34 2000;Mercury;Cougar;2,38
The latter format is not RFC 4180 compliant. Compliance could be achieved by the use of a comma instead of a semicolon as a separator and either the international notation for the representation of the decimal mark or the practice of quoting all numbers that have a decimal mark.
The format dates back to the early days of business computing and is widely used to pass data between computers with different internal word sizes, data formatting needs, and so forth. For this reason, CSV files are common on all computer platforms.
CSV is a delimited text file that uses a comma to separate values (many implementations of CSV import/export tools allow other separators to be used). Simple CSV implementations will not allow field values that contain a comma or other special characters such as newlines. More sophisticated CSV implementations permit commas and other special characters in a field value. Many implementations use " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or newlines); embedded double quote characters may be represented by a pair of consecutive double quotes. (Creativyst 2010) Some CSV implementations may use an escape character such as a backslash to encode reserved characters as an escape sequence, such as Sybase Central.
In computer science terms, a CSV file is a "flat file".
Comma-separated values are old technology and pre-date personal computers by more than a decade: the IBM Fortran (level G) compiler under OS/360 supported them in 1967. Comma-separated value lists were often easier to type into punched cards than fixed-column-aligned data, and were less prone to producing incorrect results if a value was punched one column off from its intended location.
The comma separated list (CSL) is a data format originally known as comma-separated values (CSV) in the oldest days of simple computers. In the industry of personal computers (then more commonly known as "Home Computers"), the most common use was small businesses generating solicitations using boilerplate form letters and mailing lists.
Some early software applications, such as word processors, allowed a stream of "variable data" to be merged between two files: a form letter, and a CSL of names, addresses, and other data fields. Many applications still do, simply because tasks requiring human input (such as constructing a list) are natural and easy using comma delimiters. CSL/CSVs were also used for simple databases.
Comma separated lists date from before the earliest personal computers, but were widely used in the earliest pre-IBM PC era personal computers for tape storage backup and interchange of database information from machines of two different architectures. In that day, affordable hard drives did not exist, and many small businesses tried to achieve the benefits of computing using floppy disk based software.
No general standard specification for CSV exists. Variations between CSV implementations in different programs are quite common and can lead to interoperation difficulties. For Internet communication of CSV files, an Informational IETF document (RFC 4180 from October 2005) describes the format for the "text/csv" MIME type registered with the IANA. (Shafranovich 2005) Another relevant specification is provided by Fielded Text which also covers the CSV format.
Many informal documents exist that describe the CSV format. Creativyst (2010) provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.
The basic rules from a lot of these specifications are as follows:
CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. If a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.
1997,Ford,E350
1997, Ford , E350 not same as 1997,Ford,E350
1997,Ford,E350,"Super, luxurious truck"
1997,Ford,E350,"Super, ""luxurious"" truck"
1997,Ford,E350,"Go get one now they are going fast"
1997,Ford,E350," Super luxurious truck "
"1997","Ford","E350"
Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar
Year | Make | Model | Description | Price |
---|---|---|---|---|
1997 | Ford | E350 | ac, abs, moon | 3000.00 |
1999 | Chevy | Venture "Extended Edition" | 4900.00 | |
1999 | Chevy | Venture "Extended Edition, Very Large" | 5000.00 | |
1996 | Jeep | Grand Cherokee | MUST SELL! air, moon roof, loaded |
4799.00 |
The above table of data may be represented in CSV format as follows:
Year,Make,Model,Description,Price 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00
This CSV example illustrates that:
The CSV file format is very simple and supported by almost all spreadsheets and database management systems. Many programming languages have libraries available that support CSV files. Even modern software applications support CSV imports and/or exports because the format is so widely recognized. In fact, many applications allow .csv-named files to use any delimiter character.
Microsoft Excel will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator. Also, many regional versions of Excel will not be able to deal with Unicode in CSV. One simple solution when encountering such difficulties is to change the filename extension from .csv to .txt; then opening the file from an already running Excel with the "Open" command.
When pasting text data into Excel, the tab character is used as a separator: If you copy "hello<tab>goodbye" into the clipboard and paste it into Excel, it goes into two cells. "hello,goodbye" pasted into Excel goes into one cell, including the comma.
OpenOffice.org Calc and LibreOffice Calc handle CSV files and pasted text with a Text Import dialog asking the user to manually specify the delimiters, encoding, format of columns, etc.