Empty string

From Wikipedia, the free encyclopedia

In formal language theory, the empty string (or null string)[1] is the unique string of length zero.

Formal theory

Formally, a string is a finite sequence of symbols such as letters or digits. The empty string is the extreme case where the sequence has length zero, so there are no symbols in the string. There is only one empty string, because two strings are only different if they have different lengths or a different sequence of symbols. In formal treatments,[2] the empty string is denoted with ε or sometimes Λ or λ.

The empty string should not be confused with the empty language ∅, which is a formal language (i.e. a set of strings) that contains no strings, not even the empty string.

The empty string has several properties:

Use in programming languages

In most programming languages, strings are a data type. Individual strings are typically stored in consecutive memory locations. This means that the same string (for example the empty string) could be stored in two different places in memory. (Note that even a string of length zero can require memory to store it, depending on the format being used.) In this way there could be multiple empty strings in memory, in contrast with the formal theory definition, for which there is only one possible empty string. However, a string comparison function would indicate that all of these empty strings are equal to each other.

The empty string is distinct from a null reference (or null pointer) because a null reference does not point to any string at all, not even the empty string. A null reference is likely to cause an error if one tries to perform any operation on it, but an empty string is less likely to do so. The empty string is a legitimate string, upon which most string operations should work. Some languages treat some or all of the following in similar ways, which can lessen the danger: empty strings, null references, the integer 0, the floating point number 0, the boolean value false, the ascii character NUL, or other such values.

The empty string is usually represented similarly to other strings. In implementations with string terminating character (null-terminated strings or plain text lines), the empty string is indicated by the immediate use of this terminating character.

λ representation Programming languages
"" C, C++, Perl, Python, C#, Go, PHP, Visual Basic .NET, Java, Turing, JavaScript, Haskell, Objective-C (as a C string), OCaml, Standard ML, Scala, Seed7, Tcl
'' Perl, PHP, Python, JavaScript, Delphi, Pascal, Matlab
{'\0'} C, C++, Objective-C (as a C string)
std::string() C++
@"" Objective-C (as a constant NSString object)
[NSString string] Objective-C (as a new NSString object)
q(), qq() Perl
%{} Ruby
""""""
str()
Python
string.Empty C#, Visual Basic .NET
String.make 0 '-' OCaml
{} Tcl

Examples of empty strings

The empty string is a syntactically valid representation of zero in positional notation (in any base), which does not contain leading zeros. Since the empty string does not have a standard visual representation outside of formal language theory, the number zero is traditionally represented by a single decimal digit 0 instead.

Zero-filled memory area, interpreted as a null-terminated string, is an empty string.

Empty lines of text show the empty string. This can occur from two consecutive EOLs, as often occur in text files, and this is sometimes used in text processing to separate paragraphs, e.g. in MediaWiki.

See also

References

  1. Kernighan and Ritchie, C, p. 38
  2. JOHN CORCORAN, WILLIAM FRANK, and MICHAEL MALONEY, String theory, Journal of Symbolic Logic, vol. 39 (1974) pp. 625 637
  3. CSE1002 Lecture Notes - Lexicographic
This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.