Whitespace character
In computer science, white space is any character or series of whitespace characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 space (HTML:  
), also ASCII 32, represents a blank space, used as a word divider in Western scripts.
The term "whitespace" is based on the resulting appearance on ordinary paper.
Definition and ambiguity
The most common whitespace characters may be typed via the space bar or the tab key. Depending on context, a line-break generated by the return or enter key may be considered whitespace as well.
Unicode
In Unicode (Unicode Character Database) the following 25 characters are defined as whitespace characters:
Whitespace[a] (Unicode character property WSpace=Y) | ||||
---|---|---|---|---|
Code point | Name | Script | General category | Remark |
000009U+0009 | Common | Other, control | HT, Horizontal Tab | |
000010U+000A | Common | Other, control | LF, Line feed | |
000011U+000B | Common | Other, control | VT, Vertical Tab | |
000012U+000C | Common | Other, control | FF, Form feed | |
000013U+000D | Common | Other, control | CR, Carriage return | |
000032U+0020 | space | Common | Separator, space | |
000133U+0085 | Common | Other, control | NEL, Next line | |
000160U+00A0 | no-break space | Common | Separator, space | |
005760U+1680 | ogham space mark | Ogham | Separator, space | |
008192U+2000 | en quad | Common | Separator, space | |
008193U+2001 | em quad | Common | Separator, space | |
008194U+2002 | en space | Common | Separator, space | |
008195U+2003 | em space | Common | Separator, space | |
008196U+2004 | three-per-em space | Common | Separator, space | |
008197U+2005 | four-per-em space | Common | Separator, space | |
008198U+2006 | six-per-em space | Common | Separator, space | |
008199U+2007 | figure space | Common | Separator, space | |
008200U+2008 | punctuation space | Common | Separator, space | |
008201U+2009 | thin space | Common | Separator, space | |
008202U+200A | hair space | Common | Separator, space | |
008232U+2028 | line separator | Common | Separator, line | |
008233U+2029 | paragraph separator | Common | Separator, paragraph | |
008239U+202F | narrow no-break space | Common | Separator, space | |
008287U+205F | medium mathematical space | Common | Separator, space | |
012288U+3000 | ideographic space | Common | Separator, space | |
a. ^ Unicode 6.3 property list |
Within the algorithm for bidirectional writing, Unicode uses another definition of "whitespace" (Bidirectional Character Type=WS). These Bidi-WS characters (18 out of the 25 listed in the table here) are "neutral": they follow the writing direction of neighboring characters rather than determining their own. The eight other characters listed here are also "neutral", but have a different bidi-type.
Usage
Computer languages
Runs of white space (beyond a first whitespace character) occurring within source code written in computer programming languages are generally ignored; such languages are free-form. However, in several languages, such as Haskell and Python, white space and indentation are used for syntactical purposes. In the language called Whitespace, white spaces are the only valid characters for programming, while any other characters are ignored.
Still, for most programming languages, excessive use of white space, especially trailing white space at the end of lines, is considered a nuisance. However correct use of white space can make the code easier to read and help group related logic. In interpreted languages, parsing of unnecessary white space may affect the speed of execution. In markup languages like HTML, unnecessary white space increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary white space can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright that was committed by copying and pasting.
The C language defines white space to be "... space, horizontal tab, new-line, vertical tab, and form-feed".[1] The HTTP network protocol requires different types of white space to be used in different parts of the protocol, such only the space character in the status line, CRLF at the end of a line, and "linear white space" in header values.[2]
Visible symbol
Sometimes the visible symbol ␣ (Unicode U+2423, decimal 9251, open box) is used to indicate a space. This symbol is used in a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, where it is necessary to explicitly indicate a space code. The symbol is also used in the keypad silkscreening of TI-8x series graphing calculators from Texas Instruments.[3]
File names
Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use an underscore (_) as a word separator, as_in_this_phrase.
Another such symbol was U+2422 ␢ blank symbol. This was used in the early years of computer programming when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".[citation needed]
See also
- Programming style
- Whitespace (programming language)
- Indent style
- Space (punctuation)
- Trimming (computer programming)
- Regular expression#POSIX character classes: the white space character class