User:Omphaloscope/CSV

From Wikipedia, the free encyclopedia

Comma-separated values
File extension: .csv
MIME type: text/csv

The comma-separated values (or CSV) file format is a delimited data format commonly used for storing tabular data, such as an electronic spreadsheet. Data in CSV format typically appears like this, although there are variants:

"ID","Last name","First name","Email"
"42","Adams","Douglas","douglas.adams@wikipedia.org"

Each line has a number of fields separated (or, delimited) by comma characters. Rows are separated by line breaks (specifically, newlines). Fields which themselves contain a comma, newline, or double quotation mark character, or which start or end with whitespace, must be enclosed in double quotation marks. Furthermore, if a line contains a single entry which is the empty string, it must be enclosed in double quotation marks. If a field's value contains a double quotation mark character it is escaped by placing another double quotation mark character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

Contents

[edit] Specification

While no formal specification for CSV exists, RFC 4180 describes a common format and establishes "text/csv" as the MIME type registered with the IANA. Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.

[edit] Example

1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition" 4900.00
1996 Jeep Grand Cherokee MUST SELL!
air, moon roof, loaded
4799.00

The above table of data may be represented in CSV format as follows:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

This CSV example illustrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted,
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote,
  • space before and after delimiter commas may be trimmed, and
  • a line break within an element must be preserved.

[edit] Application support

The CSV file format is a very simple data file format that is supported by almost all spreadsheet software such as Excel (although some local versions use semicolons instead of commas), Calc, and Gnumeric. Any programming language that has input/output and string processing functionality will be able to read and write CSV files.

CSV files are ubiquitous for tabular data, as are ASCII files for text data.

[edit] Programming language tools

Programming language support for CSV files
Language Tool Notes
BASIC none required supported internally
C/C++ Free Tools:

CSV module, bcsv, CSV reading and manipulation

No comments in code. separated documentation.

Well documented, includes a CSV BNF grammar.

Haskell
Java Several free CSV tools exist:

CSVReader/Writer CSVFile [1] [2] [3] and commercial tools: Ricebridge Java CSV Component. There are also JDBC drivers available: [4] [5] [6] [7] and an ODBC driver: [8]

LISP fare-csv, csv-parser fare-csv is an ASDF package, csv-parser is a .lisp file
Mathematica
MATLAB csvread, dlmread. In the standard library.
.Net FileHelpers - An Automatic File Import/Export Framework by Marcos Meli (LGPL)

Fast CSV Reader by Sébastien Lorion. Open Source class (MIT licence).

CSV Reader

GemBox.Spreadsheet by GemBox Software for CSV <==> XLS conversion.

OCaml OCaml CSV

Col: conversion between lists of records and CSV files with header (Camlp4 syntax extension)

Perl Text::CSV_XS, Text::CSV_PP, or using a Perl DBI interface:

DBD::CSV, DBD::AnyData, csvdiff - compare two csv-files

from CPAN
PHP fgetcsv() function In the standard library. Does not support newlines within element.
Python Python CSV module In the standard library.
R read.csv In the standard library.
Ruby Ruby CSV module, or FasterCSV by James Gray In the standard library.
Scheme Chicken Scheme CSV module

[edit] Utilities

The csvprint utility will reformat CSV input based on a format string. This can be useful for reordering fields or generating source code or tables as illustrated in the following example:

 $ csvprint data.csv "\t{ %0, %1, %2, \"%3\" },\n"
         { 0xC0000008, 0x00060001, NT_STATUS_INVALID_HANDLE, "The handle is invalid." },

csvdiff is a perl script to compare/diff two (comma) separated files with each other. The part that is different to standard diff is, that you'll get the number of the record where the difference occours and the field/column which is different. The separator can be set to the value you want it to, not just comma. Also you can to provide a third file which contains the columnnames in one(!) line separated by your separator. If you do so, columnnames are shown if a difference is found. Example:

$ perl csvdiff.pl -a act.csv -e exp.csv -s ";" -c col_names.csv -k "2" -t -i
Record with key "200100500" is different:
 Actual   line 006 > 200100500;200100500;6;;;;;;000;0;2005-12-20;55 <
 Expected line 008 > 200100500;200100500;6;;;;;;000;0;2005-12-19;55 <
  Difference in field no.: 11 - field name: Dat_Rueckgabe
   Actual   > 2005-12-20 <
   Expected > 2005-12-19 <

[edit] External links