Control character
From Wikipedia, the free encyclopedia
This article does not cite any references or sources. (September 2007) Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. |
In computing and telecommunication, a control character or non-printing character is a code point (a number) in a character set that does not in itself represent a written symbol. All entries in the ASCII table below code 32 (technically the C0 control code set) and 127 are of this kind, including BEL (which is intended to cause an audible signal in the receiving terminal), SYN (which is a synchronization signal), and ENQ (a signal that is intended to trigger a response at the receiving end, to see if it is still present). The EBCDIC character set contains 65 control codes, including all of the ASCII control codes as well as additional codes which are mostly used to control IBM peripherals. The Unicode standard has added many new non-printing characters, for example the Zero-width non-joiner. The remainder of this article covers control codes in general and some codes that are in common use. For detailed tables of the C0 and C1 control codes used in ASCII and ISO-8859-n, please see their respective articles.
Other characters are printing or printable characters, except perhaps for the "space" character (see ASCII printable characters).
[1] | 0x00 | 0x10 |
---|---|---|
0x00 | NUL | DLE |
0x01 | SOH | DC1 |
0x02 | STX | DC2 |
0x03 | ETX | DC3 |
0x04 | EOT | DC4 |
0x05 | ENQ | NAK |
0x06 | ACK | SYN |
0x07 | BEL | ETB |
0x08 | BS | CAN |
0x09 | TAB | EM |
0x0A | LF | SUB |
0x0B | VT | ESC |
0x0C | FF | FS |
0x0D | CR | GS |
0x0E | SO | RS |
0x0F | SI | US |
Contents |
[edit] In ASCII
The control characters in ASCII still in common use include:
- 7 (bell,
\a
), which may cause the device receiving it to emit a warning of some kind (usually audible). - 8 (backspace,
\b
), used either to erase the last character printed or to overprint it. - 9 (horizontal tab,
\t
), moves the printing position some spaces to the right. - 10 (line feed,
\n
), used as the end_of_line marker in most UNIX systems and variants. - 12 (form feed,
\f
), to cause a printer to eject paper to the top of the next page, or a video terminal to clear the screen. - 13 (carriage return,
\r
), used as the end_of_line marker in Mac OS, OS-9, FLEX (and variants). A carriage return/line feed pair is used by CP/M-80 and its derivatives including DOS and Windows, and by application layer protocols such as HTTP. - 27 (escape,
\e
).
Occasionally one might encounter modern uses of other codes, such as code 4 (End of transmission), used to end a Unix shell session or PostScript printer transmission. For the full list of control characters, see ASCII.
Even though many control characters are rarely used, the concept of sending device-control information intermixed with printable characters is so useful that device makers found a way to send hundreds of device instructions. Specifically, they used ASCII code 27 (escape), followed by a series of characters called a "control sequence" or "escape sequence". The mechanism was invented by Bob Bemer, the father of ASCII.
Typically, code 27 was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of code 27, followed by the printable characters "[2;10H", would cause a DEC VT-102 terminal to move its cursor to the 10th cell of the 2nd line of the screen. Several standards exist for these sequences, notably ANSI X3.64 (1979), which was based on the behavior of VT-100 series terminals. But the number of non-standard variations in use is large, especially among printers, where technology has advanced far faster than any standards body can possibly keep up with.
[edit] How control characters map to keyboards
ASCII-based keyboards have a key labelled "Control" or "Ctrl" (sometimes referred to as "Cntl") which is used much like a shift key, being depressed in combination with another letter or symbol key. In this way, the control key generates the code 64 places below the code for the (generally) uppercase letter it is pressed in combination with (i.e., subtract 64 from ASCII code value in decimal of the (generally) uppercase letter), producing one of the 32 ASCII control codes.
So, the octet code produced by a control key combination is the bit pattern generated when the control key is not pressed, but with bits 5 and 6 forced to zero. For example, pressing "control" and the letter "g" or "G" (code 103 or 71 in base 10, which is 01000111 in binary, produces the code 7 (Bell, 7 in base 10, or 00000111 in binary). A key press combination that produces a code with 0 in the 64th place (bit 6) is generally unaffected if the control key is held down as well.
This correspondence is used to represent control characters in printable form in what is called caret notation; for example, ^G represents code 7.
Keyboards also typically have a few single keys which produce control character codes. For example, the key labelled "Backspace" typically produces code 8, "Tab" code 9, "Enter" or "Return" code 13 (though some keyboards might produce code 10 for "Enter").
Modern keyboards have many keys that do not correspond to any ASCII printable or control character, for example cursor control arrows and word processing functions. These keyboards communicate these keys to the attached computer by one of three methods: appropriating some otherwise unused control character for the new use; using some encoding other than ASCII; or using multi-character control sequences. Keyboards attached to stand-alone personal computers typically use one (or both) of the first two methods. "Dumb" computer terminals typically use control sequences.
[edit] The design purpose
The control characters were designed to fall into a few groups: printing and display control, data structuring, transmission control, and miscellaneous.
[edit] Printing and Display control
Printing control characters were first used to control the physical mechanism of printers, the earliest output device. An early implementation of this idea was the out-of-band ASA carriage control characters. Later, control characters were integrated into the stream of data to be printed. The carriage return character (CR), when sent to such a device, causes it to put the character at the edge of the paper at which writing begins (it may, or may not, also move the printing position to the next line). The line feed character (LF/NL) causes the device to put the printing position on the next line. It may (or may not), depending on the device and its configuration, also move the printing position to the start of the next line (whichever direction is first -- left in Western languages and right in Hebrew and Arabic). The vertical and horizontal tab characters (VT and HT/TAB) cause the output device to move the printing position to the next tab stop in the direction of reading. The form feed character (FF/NP) starts a new sheet of paper, and may or may not move to the start of the first line. The backspace character (BS) moves the printing position one character space backwards. On printers, this is most often used so the printer can overprint characters to make other, not normally available, characters. On terminals and other electronic output devices, there are often software (or hardware) configuration choices which will allow a destruct backspace (ie, a BS, SP, BS sequence) which erases, or a non-destructive one which does not. The shift in and shift out characters (SO and SI) selected alternate character sets, fonts, underlining or other printing modes. Escape sequences were often used to do the same thing.
With the advent of computer terminals that did not physically print on paper and so offered more flexibility regarding screen placement, erasure, and so forth, printing control codes were adapted. Form feeds, for example, usually cleared the screen, there being no new paper page to move to. More complex escape sequences were developed to take advantage of the flexibility of the new terminals, and indeed of newer printers. The concept of a control character had always been somewhat limiting, and was extremely so when used with new, much more flexible, hardware. Control sequences (sometimes implemented as escape sequences) could match the new flexibility and power and became the standard method. However, there were, and remain, a large variety of standard sequences to choose from.
[edit] Data structuring
The separators (File, Group, Record, and Unit: FS, GS, RS and US) were made to structure data, usually on a tape, in order to simulate punch cards. End of medium (EM) warns that the tape (or whatever) is ending. While many systems use CR/LF and TAB for structuring data, it is possible to encounter the separator control characters in data that needs to be structured. The separator control characters are not overloaded; there is no general use of them except to separate data into structured groupings. Their numeric values are contiguous with the space character, which can be considered a member of the group, as a word separator.
[edit] Transmission control
The transmission control characters were intended to structure a data stream, and to manage re-transmission or graceful failure, as needed, in the face of transmission errors.
The start of heading (SOH) character was to mark a non-data section of a data stream -- the part of a stream containing addresses and other housekeeping data. The start of text character (STX) marked the end of the header, and the start of the textual part of a stream. The end of text character (ETX) marked the end of the data of a message. A widely used convention is to make the two characters preceding ETX a checksum or CRC for error-detection purposes. The end of transmission block character (ETB) was used to indicate the end of a block of data, where data was divided into such blocks for transmission purposes.
The escape character (ESC) can be used in software user-interfaces to exit from a screen, menu, or mode, or in device-control protocols (e.g., printers and terminals) to signal that what follows is a special command sequence rather than normal data.
The substitute character (SUB) was intended to request a translation of the next character from a printable character to another value, usually by setting bit 5 to zero. This is handy because some media (such as sheets of paper produced by typewriters) can transmit only printable characters. However, on MS-DOS systems with files opened in text mode, "end of text" or "end of file" is marked by this Ctrl-Z character, instead of the Ctrl-C or Ctrl-D which are common on other operating systems.
The cancel character (CAN) signalled that the previous element should be discarded. The negative acknowledge character (NAK) is a definite flag for, usually, noting that reception was a problem, and, often, that the current element should be sent again. The acknowledge character (ACK) is normally used as a flag to indicate no problem detected with current element.
When a transmission medium is half duplex (that is, it can transmit in only one direction at a time), there is usually a master station that can transmit at any time, and one or more slave stations that transmit when they have permission. The enquire character (ENQ) is generally used by a master station to ask a slave station to send its next message. A slave station indicates that it has completed its transmission by sending the end of transmission character (EOT).
The device control codes (DC1 to DC4) were originally generic, to be implemented as necessary by each device. However, a universal need in data transmission is to request the sender to stop transmitting when a receiver can't take more data right now. Digital Equipment Corporation invented a convention which used 19, (the device control 3 character (DC3), also known as control-S, or XOFF) to "S"top transmission, and 17, (the device control 1 character (DC1), aka control-Q, or XON) to start transmission. It has become so widely used that most don't realize it is not part of official ASCII. This technique, however implemented, avoids additional wires in the data cable devoted only to transmission management, which saves money. A sensible protocol for the use of such transmission flow control signals must be used, to avoid potential deadlock conditions, however.
The data link escape character (DLE) was intended to be a signal to the other end of a data link to cause the following code to be interpreted as raw data, not a control code.
[edit] Miscellaneous
Code 7 (BEL) is intended to cause an audible signal in the receiving terminal.
Many of the ASCII control characters were designed for devices of the time that are not often seen today. For example, code 22, "synchronous idle" (SYN), was originally sent by synchronous modems (which have to send data constantly) when there was no actual data to send. (Modern systems typically use a start bit to announce the beginning of a transmitted word.)
Code 0 (ASCII code name NUL) is a special case. In paper tape, it is the case when there are no holes. It is convenient to treat this as a fill character without meaning otherwise.
Code 127 (DEL, a.k.a. "rubout") is likewise a special case. Its code is all-bits-on in binary, which essentially erased a character cell on a paper tape when overpunched. Paper tape was a common storage medium when ASCII was developed, with a computing history dating back to WWII code breaking equipment at Bletchley Park. Paper tape became obsolete in the 1970s, so this clever aspect of ASCII rarely saw any use. Some systems (such as the original Apples) converted it to a backspace. But because its code is in the range occupied by other printable characters, and because it had no official assigned glyph, many computer equipment vendors used it as an additional printable character (often an all-black "box" character useful for erasing text by overprinting with ink).
Many file systems do not allow control characters in the file names, as they may have reserved functions.
[edit] External links
- ISO/IEC 6429:1992 (E), Information Technology - Control functions for coded character sets
- ISO IR 1 C0 Set of ISO 646 (PDF)
[edit] References
- ^ MS-DOS QBasic v1.1 Documentation. Microsoft 1987-1991.