Null character
From Wikipedia, the free encyclopedia
The null character (also null terminator) is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages. The original meaning of this character was like NOP — when sent to a printer or a terminal, it does nothing (some terminals, however, incorrectly display it as space). On punched tapes, this character is represented with no holes at all, so a new unpunched tape is initially filled with null characters.
On many computer and data terminal keyboards, it was possible to type a null character by holding down the Control key and pressing "@" (which usually required also holding Shift and pressing another key such as "2" or "P"). Consequently, in some contexts, the null character is represented visually as "^@
". In other contexts, it is represented as a subscript, single-em-width "NUL
". In Unicode, there is a character for visual representation of null character, "symbol for null", U+2400 (␀) — not to be confused with the actual null character, U+0000.
The character has special significance in C and its derivatives, where it serves as a reserved character used to signify the end of strings. The null character is often represented as "\0
" in source code (in reality an octal escape sequence). Strings ending in a null character are said to be null-terminated.
This differs from certain other languages (such as Pascal) which store a string as an array preceded by a string length. The main advantage of using a null character is that strings can be of any length, and only one character of additional storage is required. Null-terminated strings can also have efficiency benefits, since operations that traverse a string don't need to keep track of how many characters have been seen, and operations which modify the string's length do not need to update the stored length. Cache performance can also be better.
Conversely, the advantage of storing the string's length is that it is always immediately available in constant time; a program using null-terminated strings must count every character in a string to find the string's length, which requires linear or O(n) time. Also, storing the length allows strings to contain null characters, which can simplify data processing by eliminating exceptions. In null-terminated strings, the first occurring null character is interpreted as the end of the string.
However, the datatype used to store the length of a string is also important; if the length is stored as a byte, as in Pascal, strings may only be up to 255 characters long! Larger datatypes, on the other hand, take up more space than a null character (a 16-bit number occupies two bytes, and a 32-bit number takes four). In the 1970s, when C was designed, space considerations were much more important than they are at present, which greatly influenced the choice for null-terminated strings.
- A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string literal.
- - ANSI/ISO 9899:1990 (the ANSI C standard), section 5.2.1
- A string is a contiguous sequence of characters terminated by and including the first null character.
- - ANSI/ISO 9899:1990 (the ANSI C standard), section 7.1.1
- A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character).
- - ISO/IEC 14882 (the ISO C++ standard), section 17.3.2.1.3.1
[edit] Security exploit: Poison NULL byte
`Poison NULL byte` was originally used by Olaf Kirch in a Bugtraq post in the October 1998. It was further explored by Rain Forest Puppy in Phrack Issue 55, article 7.
`Poison NULL byte` exploits the (unexpected) behaviour of many programming interfaces, which often terminates string handling upon the NUL character. Some examples of Poison NULL byte usages includes:
- Terminating a file name string, such as removing a mandatory file extension.
- Terminating/commenting a SQL statement when executing code dynamically, such as Oracle EXECUTE IMMEDIATE.
Typically, `Poison NULL byte` is exploited along with another type of exploit such as Directory traversal or SQL Injection; Poison NULL byte is often used to simplify or enhance other attacks.