Talk:Comma-separated values

From Wikipedia, the free encyclopedia

Contents


[edit] Comma vs Character

  • The C in CSV usually refers to comma, (anonymous insertion: Actually this entire article is wrong, I have it on good authority that it actually stands for "Common" As in Commonly seperated value) but I prefer character as in character separated values. The principal is the same and it is more universal in application. Where commas can be embedded in the values, the presence or absence of quotes is critical. The use of uncommon characters for delimiters (such as vertical bar |) reduces the need for quotes and is much safer to use in practice. [This is wrong - delimiting with a common-use character such as comma helps ensure correct use of escaping and encoding, thereby assuring safety. Use of an uncommon character like bar or caret is leaving a time bomb in the system, triggered when the uncommon character turns up in the business data at a later time. Use of uncommon characters for delimiters should always be avoided where possible. --Jaymax 16:50, 2 April 2007 (UTC)]
  • Most file format conversion programs allow a variety of delimiter characters to be used in place of comma. Tabs are common delimiters.
  • The character delimiter is widely known to experienced users, but novices are easily confounded when delimiters are not commas. [and really experienced users have dealt with trying to fix an interface where someone years ago decided that | or ¦ or ^ or similar would be 'safe' and therefore never noticed that the escaping logic was broken.]
  • When CSV is used for search, disambiguation is warranted for comma vs character separated values.

john 07:21, 26 Sep 2004 (UTC)

[edit] Making guidelines more formal

I think that the general guidelines of a CSV should be explained in the Formal Specifications section rather than in the Example section (see note 1 below), stating clearly that those are not the standard, but perhaps the most widely used ones. I propose the Creativyst guidelines to be used (already linked from mentioned section). It would also be good to note the differences between the last and the RFC 4180. Juan Loman. 00:44, 13 December 2005 (PST)

Note 1: The guidelines in the "Example" section are good for an example, but should not be the only ones in the entire document.

[edit] Example

Does anyone else think that the format shown in the example CSV data is a poor poor choice to show other people, with spaces between the comma and the text qualifier?

I don't think it's even valid CSV -- I'm not sure how the comma-space-quote should be parsed. Richard W.M. Jones 09:53, 3 November 2005 (UTC)
I agree. A good example would exercise the full range of valid input (note that spaces after the commas is valid input though). How about:
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996, Jeep, Grand Cherokee, "air, moon roof, loaded
MUST SELL!", 4799.00
Yes, this is a much better example. Perhaps also a table showing how it should be interpreted in a spreadsheet? Particularly the multi-line "air ... SELL!" cell which would be very well demonstrated by the table. Richard W.M. Jones 21:29, 3 November 2005 (UTC)
Small problem is that in both the text and the table, the CR looks like natural wrapping. A bit of tweaking could put the CR such that the first line is shorter than the first in the table, and the CR in the text truncates the line early. Also, would it be appropriate to include field labels at the top of the CSV text only, showing the optional convention of the first line being a comma delimited list of fieldnames? --195.92.40.49 14:57, 2 March 2006 (UTC)

[edit] Programming language tools

There was a huge unwieldy section which basically was a list of everyone (me included) promoting their own little CSV tool. I've changed this into a slightly less unwieldy table. Richard W.M. Jones 15:59, 21 October 2005 (UTC)

Man that table is ugly. The sectionated arrangement was much easier to understand. The Java section is just a run-on sentence of confusion. Is there a special way to roll back that one update without disturbing the two legitimate updates that followed? Mike
It's a lot better than it was before. If anything needs to be rolled back it's the decision to have a huge list of implementations, which are basically adverts for various products. It adds nothing at all to the article or to the understanding of the CSV format. Richard W.M. Jones 09:07, 29 October 2005 (UTC)
I disagree. After looking at it again I can see you stripped a lot of useful information about each tool. Unfortunately I can't figure out how to get to the original content. Also that SiMX thing should be moved into the Programming Tools section. You wouldn't happen to be affiliated with that product would you?
No. As you could find out with minimal research, I did the OCaml CSV tool. Before changing this list you should clearly understand Wikipedia:What_Wikipedia_is_not. Richard W.M. Jones 21:27, 1 November 2005 (UTC)
Then by your reasoning we would have to remove the whole section and leave nothing but the conceptual description, examples, and history. But I think that would be a mistake because people interested in CSV are invariably working with such files and need developer tools. So either take it out or leave it in. Don't leave some half baked chain of links. How about reformatting it without sub-sections so that it doesn't appear in the TOC. Just make one section with a list of bullets like the below? We MUST leave the descriptive text in.
As I said above, I think it needs to be removed. The change I made to a table was better than before when this section took up 2/3rds of the article; but the ideal would be to remove it altogether. This is an article about the CSV format, and not a list of links or promotions. If people want to find Java-based CSV implementations, they can easily do this on Google. If creators of these tools wish to advertise them, they can advertise them through paid search on Google. Richard W.M. Jones 09:21, 2 November 2005 (UTC)
Fine. Go ahead then. Remove it.
I'm actually looking for a suitable place to host the list. For example Wikibooks. If it was located there (or a suitable place) then we could just link to that list ([http://example.com/csv_implementations A list of CSV implementations]). Best of both worlds. Richard W.M. Jones 21:33, 3 November 2005 (UTC)

I would suggest condensing the application support, programming tools and utilities sections into a single paragraph that explains how CSV an extremely widely supported and implemented file format. You can then point to another article for that section and title it something like "Comma-separated values (implementations)". That way, the information can stay in the encyclopedia, the list can grow and it stays out of the main article where most people probably don't want it anyway. I think this article could use a lot of cleanup. --MattWright (talk) 23:47, 28 January 2006 (UTC)

OK, I moved it to another article as no one objected for over 9 months. :) --MattWright (talk) 14:52, 19 October 2006 (UTC)

[edit] Limit on the number of records

Is there any maximum limit on the number of records that a CSV file can have? Does the max limit for Excel which is around 65536 apply for the CSV file also? Pls post a reply. Thanks

Not in the basic CSV file, no. However applications which process the CSV file can certainly have such limits. Excel and OpenOffice.org are examples of this. Richard W.M. Jones 23:54, 25 January 2006 (UTC)
Excel limits to 65 K rows, but I've worked with CSV files that were gigabytes. A lot ( most? ) legacy apps can export data from their own proprietary format to CSV, including some huge transactional systems, like a lucent (telecomms) switch. The only real limit is storage space. But then you would use something like SQL Server or Oracle to load/analyze the data, instead of Excel. ForrestCroce 02:52, 20 December 2006 (UTC)

[edit] Infobox

Comma-separated values
File name extension .csv

Maybe add a infobox?

[edit] Confusing?

"CSV file format is a delimited data format that has fields separated by the comma character and records separated by newlines." Maybe it would be more clear to say a CSV file is a "flat file" that contains tabular data? Probably flat file isn't recognized outside the relational database world, but everyone is familiar with the idea of tabular data. ( Maybe some text along the lines of "... like an Excel table." I'm not saying the article shouldn't explain how CSVs are structured, but the introduction shouldn't be intimidating, even to someone from outside the IT world. ForrestCroce 01:45, 20 December 2006 (UTC)

[edit] This Article is Getting Messy

This article used to be pretty simple and clear but it's not so simple or clear anymore.

First, someone pandering Delimeter-separated_values has taken an interest in this page. I've never heard of delimiter separated values and searching the web for it comes up with basically nothing. I think someone is trying to organize concepts at the expense of history. We know "comma separated values" is a misnomer but the fact is that's what people have been calling this format forever. You can't invent things on Wikipedia.

Second, the examples with bullets of notes is odd. It's in the Specification section and starts "The basic rules are". That whole sequence of examples with bullets of notes should be in the Example section. The Specification section should only cite specifications. I think the old simple example and "The basic rules are" sequence should be merged. Specifically the old example should be a quick and simple example at the top of the examples section. The bullets from the old example should be replaced with "The basic rules are" sequence.

--Miallen 02:39, 17 May 2007 (UTC)

I propose merging this article with Delimiter-separated_values and redirecting Comma-separated values to it. Maybe Delimited file format would be a better title for the article. MFNickster 05:15, 7 July 2007 (UTC)

There are several problems here:

  • 1) The name "comma separated values" (hereinafter CSV) is well-recognized and well-established;
  • 2) The specification for CSV is not well-recognized, and there is no "authoritative specification" [1];
  • 3) The use of CSV is well-recognized, although there are variations because of the lack of a well-established specification;
  • 4) The use of "delimiter separated values" (or whatever you want to call it) is well-recognized *and* well-established, but there is no singular "well-recognized" name: in fact there are several "names" (See e.g., [2], [3], [4], [5])
  • 5) The fact that topic relates to something "well-used" but not "well-named" does not mean that a WP article on that topic is inappropriate, nor does that make it an "invention" established by WP. (See e.g., User:Dcoetzee/Named topic bias). Similarly, something that is not "well-specified" may still warrant independent treatment in WP if it is "well-used" and "well-named". Many people will come specifically looking for information on CSV, regardless of whether that name was poorly chosen or a misnomer. Since that is the case, there should be an article describing what CSV is.

Therefore, merging CSV and Delimiter separated values (or whatever you want to call it) sounds like a bad idea because the potential for confusion is already high, given that these articles talk about concepts with either poorly-chosen (but well-established) names, or no well-established name at all.

This article still could use some cleanup, no doubt about that, but that does not warrant a merge of loosely-related articles, especially since the potential for confusion is high. dr.ef.tymac 14:51, 7 July 2007 (UTC)

I concur with dr.ef.tymac's remarks. --Crath 21:05, 7 July 2007 (UTC)

Negative MFNickster. Please do not redirect to "Delimiter separated values". CSV is the term people recognise. Let's not get carried away with sematic details pls. --Miallen 21:23, 26 September 2007 (UTC)

[edit] Theres something wrong with either this article or, MS Excel and OOo Calc

I am creating a .csv file and one of my fields has a comma in it. Naturally I put enclosing parenthesis around it so that the comma does mess anything up. But, it is messing it up and the second part of the data in that field is in another field. Here is one line of the file:

Filename, Line Number, Developer, Date, Length, Version Added
attrs.c, 336, jsmiff, "Feb 17, 2006", 146, 1.1

Its messing up on the "Feb 17, 2006" data entry. Does anybody know how to deal with this. I can't understand :( Luxophage 17:14, 7 June 2007 (UTC)

Yes, remove the space before "Feb 17, 2006". The double quote must follow the comma immediately to be considered as an escape character. Just follow the RFC mentioned in the Article. —Preceding unsigned comment added by 84.56.171.65 (talk) 14:54, 21 September 2007 (UTC)

[edit] Trailing whitespaces

Our article says: "Leading and trailing spaces or tabs, adjacent to commas, are trimmed" opposed to RFC 4180 stating: "Spaces are considered part of a field and should not be ignored". This leads imho to the following question: What is the purpose of this article? To define 'CSV according to WP', 'CSV according to the creativyst article', 'CSV according to RFC' or -what I favor- information on all (notable) styles? Tierlieb 10:53, 12 June 2007 (UTC)

I agree, the WP article should document the various (notable) CSV styles in use. May I suggest the addition of a section the article that documetns the effect various major applications have had upon CSV; e.g., Excel's dominance as a spreadsheet of has caused many to understand the CSV format only as Excel understands it. --Crath 21:11, 7 July 2007 (UTC)

Note that RFCs are only informational and must be evaluated for relevance. Unfortunately, in the case of RFC 4180 it sounds like it not terribly relevant. The "specification" for CSV is defined by Microsoft Excel's CSV import / export code. I suspect that might be an unpopular idea to some but AFAIC any CSV emitted by an application MUST be completely compatible with Excel because of that application's long continuing history of support for CSV. If someone wants to do an RFC that's fine but I think it should go as far as to actually state that it is simply formalizing observed behavior of the Microsoft Excel spreadsheet application. --Miallen 21:43, 26 September 2007 (UTC)

US bias: I know for Germany that Excel as standard uses ; not ,. Maybe this is true for more local editions since lot of countries use , as a deciaml separator, IIRC there is also a international agreement on that. The RFC is not much relevant for CSV, CSV exists much longer. This notable separator issue was removed here http://en.wikipedia.org/w/index.php?title=Comma-separated_values&diff=54321682&oldid=54291230 UnLoCode (talk) 14:31, 2 April 2008 (UTC)

[edit] Image

With all these discussions about allowing use of fair-use images or not, how to avoid costly lawsuits... WHY does anyone put a copyrighted image in the article (referring to screenshot of import window of MS Access) if there is so much free software around? --Ben T/C 08:49, 27 September 2007 (UTC)

[edit] How to open CSV in OOo

IIRC OpenOffice Base 2.3 does _not_ support csv import. Not tested 2.4, but tired of that testing with every edition. UnLoCode (talk) 14:34, 2 April 2008 (UTC)

Of course it does. Do not look for "import" option, just try to create a new base as follows:
  • select "Connect to existing base", select "Text" from the pull-down list, click "Next"
  • select path for existing CSV files, mark CSV format, choose separators, click "Create"
  • name your new base, click "Save"
  • select "Tables" from the sidebar, choose any of CSV table and double-click to browse and edit that table.
I do not know exact names in English as I just installed OOo in Polish version. --Andrzej P. Wozniak (talk) 16:23, 2 April 2008 (UTC)
but if I have a odb already and want to add a new table and import csv data? Is that finally possibly without the sometimes described workaround via .ods? The thing is, I have to manage several tables, so I really need importing ... and exporting. And I need utf8. Would be great if OOo would finally support that. Export should please not need too much clicks. :-) Thanks for your help anyway. UnLoCode (talk) 22:58, 2 April 2008 (UTC)
This is not the right place for discussion about OOo or databases. Choose some of recommended by OOo support sites. Here only go two hints:
  • You do not need to use import/export/convert explicitly as those operations are done on-the-fly if you specify proper parameters for any table you create or connect to.
  • You do not need to use database operations if you only want to convert text files from some codepage to UTF-8, there are many text codepage/charset converters. --Andrzej P. Wozniak (talk) 09:14, 3 April 2008 (UTC)

[edit] Comma versus Semicolon

I have noticed that, in Windows at least, if you have the comma set as your decimal separator (which many countries use) then Excel will export a "comma delimited" CSV file will semi-colons instead of commas. This is something to watch out for if you are importing or exporting data from your own applications with the intention that people will be able to load it into Excel. I ran into this problem with a colleague in Europe. Does anybody know if this is common in other applications or if it's just an Excel thing? Also are there any common alternatives other than comma and semi-colon? Wjousts (talk) 19:36, 3 April 2008 (UTC)

[edit] Today's edit: Values to Volume

Does anyone besides me think that the edit applied to this entry today, where "values" was replaced with "volume", is inappropriate? I've been working in and around computers for 25 years and I have never before heard CSV referred to as Comma Separated Volume. Rather than simply revert the edit, I thought I'd see what others think of the edit. Christopher Rath (talk) 19:34, 17 April 2008 (UTC)

Yeah, google doesn't even turn up any real references to that name either, so it shouldn't be listed in the article without some sort of reference or usage basis... --MattWright (talk) 07:31, 18 April 2008 (UTC)