Talk:Newline

From Wikipedia, the free encyclopedia

Please discuss merging this article with End-of-line in the "Merge?" section.

Contents

[edit] 1958 version of ASCII

This requires further explanation, because of an apparent contradiction with "1960 Originated what have become the ASCII and ISO character codes." on http://www.unt.edu/isrc/Faculty/FacultyFellows/bemer.htm, and with the material on http://www.bobbemer.com. Also, http://homepages.cwi.nl/~dik/english/codes/stand.html says that the first standardised version dates from 1963.

[edit] AIX

AIX is a Unix variant after all. Does it really follow OS/360?? Yaron 21:47, Aug 2, 2004 (UTC)

[edit] Unclear text

While doing a thorough copyedit of this article, I marked a few places where the text was unclear to me with HTML comments. Please see the page source. - dcljr 08:10, 17 Sep 2004 (UTC)

You write "what is X'15'??" I believe it's simply what a C programmer would call 0x15 -- it's a convention for hexadecimal I've seen in a few places. (Given that it appears in the EBCDIC section, perhaps it's an IBM-specific notation?) JTN 20:47, 2004 Oct 4 (UTC)

[edit] Solutions

 Would be nice if this article included or linked to methods for converting the various 
 formats since this is still occasionally problematic.  I doubt my long-standing method of 
 running a sed command from vi involving CTRL-V CTRL-M is optimal...  --16:13, 17 May 
 2005 (UTC)

I just added perl program and a couple of other conversion hints including one for emacs. Hope that helps? It's the best I know. Grem 10:43, 13 September 2005 (UTC)

[edit] Merge?

This has been a problem for some time, and its only getting worse. The articles Line feed, CRLF, Carriage Return, and Newline all contain pretty much the same information, just phrased differently -- I propose we merge and redirect all other articles to this one, as it is the most complete and has a platform-neutral title. Comments? 63.188.144.35 19:16, 11 Jun 2005 (UTC)

I'm not entirely convinced that line feed and carriage return shouldn't retain their own articles, as their distinct semantics don't quite sit right under newline. I agree that all the stuff about line-ending conventions, including the whole of CRLF, should come to newline, though, with prominent pointers from the individual articles. -- JTN 22:48, 2005 Jun 11 (UTC)
I agree. It was just bad to begin with these separate articles for very closely related topics. Today, line feed and carriage return simply denotes a newline, and historical backgrounds can be easily put in each corresponding section. -- Taku 22:53, Jun 11, 2005 (UTC)

Well, it seems there's consensus for CRLF at the very least, so I'll go ahead and merge it in. As for line feed and carriage return, I realize they have historical signifigance, but its not exactly a practical usage nowadays. I think the historical implications of the terms can be best described in the Typewriter article; it already explains that the terms have been adopted amongst computer users. 63.188.168.95 17:48, 12 Jun 2005 (UTC)

This article should be merged with End-of-line, which has a {{mergeto|newline}} posted in it. --Mareklug talk 18:20, 5 September 2006 (UTC)

  • Yes, merge CWC(talk) 06:44, 6 September 2006 (UTC)
  • Yes- end-of-line could remain as a pointer to the Newline article, but i think the content fits best in newline: \js 17:04, 7 November 2006 (UTC)

[edit] "Line anchors"

I haven't found any reference to "line anchors" on the web, except in the context of regular expressions, where the term seems to be used for the '^' and maybe the '$' zero-width assertions when in multi-line mode. While these constructs obviously work with newlines, they are not newlines, so I think the statement in the intro that newlines are sometimes called line anchors is wrong. If there is some other use of the term line anchor that I'm not aware of, could somebody please cite a source? --K. Sperling 10:38, July 25, 2005 (UTC)

I thought that I heard the term "line anchor" somewhere I don't remember. I will restore the mention of this if I find some reference. -- Taku 23:18, July 25, 2005 (UTC)

[edit] Entering Special Characters / Editor Treatment and Conversion

I'm going to remove these two sections, because the content is inaccurate and partially also irrelevant. I wanted to give a more detailed explanation here first, though:

  • ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.
  • Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.
  • The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.
  • The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).
  • Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.
  • And last but not least, questions should go on the talk page, never in the article!

--K. Sperling (talk) 23:53, 13 September 2005 (UTC)



  • ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.

Most computers DO usually use ASCII.

  • Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.

This is one of the most common questions I get. I would like to be able to point users at Wikipedia to help solve it. Should I start a new article? Can you suggest a title?

  • The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.

Again, conversion is a very common question, and it not addressed elsewhere. Instead of just removing useful information, you might instead move it to a better place? Why is it wrong for the conversion section?

  • The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).

Good point. I would be happy to see the one liner in Conversion as mentioned that it will work under UNIX. Any reason why not?

  • Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.

Who said GNU? The fact is that not everyone is using the latest version of all-GNU products. In any case, the caveat is useful even without the particular examples.

  • And last but not least, questions should go on the talk page, never in the article!

I don't really understand this. As a reader, seeing a question in the article points me to relevant information where I am encouraged to edit and contribute. Isn't that the point of Wikipedia? There was some discussion of "publishable" versions, and I understand that questions look unprofessional in such a text. Which is more important?

Grem 11:35, 15 September 2005 (UTC)


I'm well aware that many computers use ASCII, but it's also a fact that many don't. Especially seeing that there is a fair bit of confusion even among programmers (e.g. many people don't realize that CR and LF exist in other codes besides ASCII and have different numerical representations there), it's important not to gloss over these details. A statement like "A line feed is usually typed ctrl-j" is simply too imprecise in this context -- not only because it ignores the issue of character sets other than ASCII; in a GUI-based application (e.g. on Windows) pressing crtl-j will often not produce any character at all.
Bash is GNU Bash, and 2.05 is not the latest version, they're up to 3.something. GNU make is probably one of the more widely used make implementations, too. You can't just go claiming that "Some unix programs (like make and bash) will silently ignore the last line if there is no newline at its end.", listing bash and make (without naming any versions or specific make implemenations) as examples when the problem doesn't exist in very widely used versions. It really wouldn't have hurt if you'd tried to verify these claims before adding them to the article. (Incidentally, the introduction already says that some programs have problems if the last line isn't NL terminated, without restricting it to Unix.)
About editors (and conversion utilities), if you include Emacs and vi, why not also include pico, nano, Scite, KWrite, UltraEdit, ...? This article is about newlines, not about how to use one editor or another. Also see Wikipedia:What_Wikipedia_is_not, particularly "wikipedia is not instructive". I realize that viewing text files from other platforms is a somewhat common problem for end-users, so I think it's OK to have a few hints for the most commonly used platforms, and I've added one way to do it for Windows and listed two methods for Unix, but generally this is not what this article (or probably any article of an encyclopedia) is about. I don't see much reason for including a Perl version; there's already the comfortable dos2unix one mentioned, and tr (which is part of the POSIX standard, and available on partically every Unix platform) for where dos2unix isn't available. It could also be done with sed, awk, or even in plain bash; I don't see what merits inclusion of the Perl version. If you get asked about this often, and want to point people somewhere, why not point them to the manual of whatever editor they're using.
As for the questions to prompt contributions, I don't have a link handy, but I'm fairly certain it's mentioned on some policy or guide. It's just not done, and the fact that you edited without being prompted by a question also proves that it's not necessary ;-) --K. Sperling (talk) 13:38, 15 September 2005 (UTC)


[edit] Using the diff Program with Different Line Endings

When you use programs like diff to compare the text in two file which uses different line endings there are some ambiguites. Unlike most modern text editors the original unix diff program and GNU diff seems to think that the files differ even though the content except for the line endings are the same. This makes porting between for example GNU/Linux and MS Windows more difficult. The http://www.GnuWin32.org/ port of GNU utilities have changed this behaviour so that diff does not care whether CR/LF or just LF is used as the line ending. This seems useful to me and I think that just as the text editors the diff program ought to think of line endings as just line endings and not care about its actual format when comparing two files.


[edit] Conversion using Windows Notepad/Edit?

It's true Notepad doesn't understand LF as "new line". But instead of using edit (and advising Windows users to use old text DOS program) I'd recommend Wordpad. It opens files greater than 64 KiB (Notepad can't do this) and easily converts LF to CR LF with just open/save. Wordpad is part of standard Windows distribution and is for sure more Notepad-user-friendly than any DOS tool. I'd change the page myself, but, as can be seen in this text, my English skills won't suffice :)

Hm, I don't have wordpad installed on my windows xp... maybe I manually de-selected it during the installation, I don't really remember. Maybe we should just mention both Wordpad and EDIT then. --K. Sperling (talk) 13:20, 1 October 2005 (UTC)

[edit] Is the DOA mnemonic Original Research?

I'm troubled by one of my own inclusions on this page. I have used the DOA mnemonic, which I added to the Newline in programming languages section, to help myself remember the hexadecimal equivalent of CR-LF in assembly language using the debug program on Windows. It is short, simple, and (I think) useful to programmers; but in the interest of fairness, my conscience requires I state:

  • I have not seen this mnemonic mentioned in any book or website, and so it could be argued that it is unverifiable/OR (not suitable for Wikipedia); this was particularly a problem with my original phrasing, which seemed to indicate that it was widely-used (I've rephrased it).
  • However, it is so simple that it could be thought of by anyone familiar with the hex code for CR-LF. In other words, it might have been thought of before and I am simply unaware of it.
  • Short pieces of code and such are regularly contributed to technical articles (including this one) without any source, and some are obviously at least semi-original. This a form of "mental code", if you will, to aid memory. We do not hunt down every minutely original thing, do we? (If we did, how could we do anything but copy exact phrasings of others... and wouldn't that violate copyright?)

I'm really up in the air about this one. Any thoughts of anyone else would be appreciated. BlueGuy213 04:52, 30 January 2007 (UTC)

Any information which is not mentioned in any primary or secondary source (such as a textbook or research paper) is not appropriate for Wikipedia. That's covered by WP:OR. Also relevant: Wikipedia is not for things made up in school one day — which in the intro paragraph actually mentions "original mnemonics" as something that Wikipedia Is Not for. (I didn't know that that was there until just now, either.) WP:NOT also mentions that Wikipedia is not a "how-to" guide, so if an article contains a mnemonic, it should be because the mnemonic is encyclopedic (e.g., popular), not just "here's how to remember XYZ..."
That particular mnemonic is a tad morbid, don't you think? I'm not surprised it's not used in any textbooks!
I'm going to remove the "dead on arrival" mnemonic from the article, now that we've established that it's original research, and thus no citation can be provided. (Console yourself with the thought that since it's so simple, anyone who needs it will immediately think of it on their own, without needing help from Wikipedia. :) --Quuxplusone 08:17, 30 January 2007 (UTC)
I agree that it is somewhat morbid, but it has been useful to me also, so I was hoping it could be kept. I guess if policy specifically says no original mnemonics, then I've got no legs to stand on. Oh well, I'll move on to other things... but maybe someday I'll write a computer book using it (and then maybe somebody else will add it back)! 75.5.199.76 09:59, 30 January 2007 (UTC)
The previous post was mine (forgot to sign in). BlueGuy213 10:01, 30 January 2007 (UTC)