Formal language

From Wikipedia, the free encyclopedia

In mathematics, logic, and computer science, a formal language $\boldsymbol{L}$ consists of a set $\boldsymbol{F}$ of finite-length sequences of elements drawn from a specified finite set $\boldsymbol{A}$ of symbols. Mathematically, it is a unordered pair $\boldsymbol{L}=\{\boldsymbol{A},\boldsymbol{F}\}.$ Among the more common options that are found in applications, a formal language may be viewed as being analogous to

a collection of words

a collection of sentences

In the first case, the set $\boldsymbol{A}$ is called the alphabet of $\boldsymbol{L}$ , and the elements of $\boldsymbol{F}$ are called words. In the second 2, the set $\boldsymbol{A}$ is called the lexicon or the vocabulary of $\boldsymbol{F}$ , while the elements of $\boldsymbol{F}$ are then called sentences. The mathematical theory that treats formal languages in general is known as formal language theory.

Although it is common to hear the term formal language used in other contexts to refer to a mode of expression that is more disciplined or more precise than everyday speech, the sense of formal language discussed in this article is restricted to its meaning in formal language theory.

As an example of formal language, an alphabet might be $\left \{ a , b \right \}$ , and a string over that alphabet might be $ababba\,$ .

A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number of symbols $a\,$ and $b\,$ .

The empty word (that is, length-zero string) is allowed and is often denoted by $e\,$ , $\epsilon\,$ or $\Lambda\,$ . While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words belonging to it may be unbounded).

A question often asked about formal languages is "how difficult is it to decide whether a given word belongs to a particular language?" This is the domain of computability theory and complexity theory.

1 Examples
2 Specification
3 Operations
4 See also
5 Further reading
6 External links

[edit] Examples

Some examples of formal languages:

the set of all words over ${a, b}\,$
the set $\left \{ a^{n}\right\}$ , where $n\,$ is a natural number and $a^n\,$ means $a\,$ repeated $n\,$ times
Finite languages, such as $\{\{a,b\},\{a, aa, bba\}\}\,$
the set of syntactically correct programs in a given programming language; or
the set of inputs upon which a certain Turing machine halts.

[edit] Specification

A formal language can be specified in a great variety of ways, such as:

Strings produced by some formal grammar (see Chomsky hierarchy);
Strings described or matched by a regular expression;
Strings accepted by some automaton, such as a Turing machine or finite state automaton;
Strings indicated by a decision procedure (a set of related YES/NO questions) where the answer is YES.

[edit] Operations

Several operations can be used to produce new languages from given ones. Suppose $\boldsymbol{L}_{1}$ and $\boldsymbol{L}_{2}$ are languages over some common alphabet.

The concatenation $\boldsymbol{L}_{1}\boldsymbol{L}_{2}\,$ consists of all strings of the form $vw\,$ where $v\,$ is a string from $\boldsymbol{L}_{1}\,$ and $w\,$ is a string from $\boldsymbol{L}_{2}\,$ .
The intersection $\boldsymbol{L}_1 \cap \boldsymbol{L}_2$ of $\boldsymbol{L}_{1}\,$ and $\boldsymbol{L}_{2}\,$ consists of all strings which are contained in $\boldsymbol{L}_{1}\,$ and also in $\boldsymbol{L}_{2}\,$ .
The union $\boldsymbol{L}_1 \cup \boldsymbol{L}_2$ of $\boldsymbol{L}_{1}\,$ and $\boldsymbol{L}_{2}\,$ consists of all strings which are contained in $\boldsymbol{L}_{1}\,$ or in $\boldsymbol{L}_{2}\,$ .
The complement $\complement \boldsymbol{L}_{1}\,$ of the language $\boldsymbol{L}_{1}\,$ consists of all strings over the alphabet which are not contained in $\boldsymbol{L}_{1}\,$ .
The right quotient $\boldsymbol{L}_{1}/\boldsymbol{L}_{2}\,$ of $\boldsymbol{L}_{1}\,$ by $\boldsymbol{L}_{2}\,$ consists of all strings $v\,$ for which there exists a string $w\,$ in $\boldsymbol{L}_{2}\,$ such that $vw\,$ is in $\boldsymbol{L}_{1}$ .
The Kleene star $\boldsymbol{L}_{1}^{*}$ consists of all strings which can be written in the form $w_{1}w_{2}...w_{n}\,$ with strings $w_{i}\,$ in $\boldsymbol{L}_{1}\,$ and $n \ge 0$ . Note that this includes the empty string $\epsilon\,$ because $n = 0\,$ is allowed.
The reverse $\boldsymbol{L}_{1}^{R}\,$ contains the reversed versions of all the strings in $\boldsymbol{L}_{1}\,$ .
The shuffle of $\boldsymbol{L}_{1}\,$ and $\boldsymbol{L}_{2}\,$ consists of all strings which can be written in the form $v_{1}w_{1}v_{2}w_{2}\dots v_{n}w_{n}$ where $n \ge 1$ and $v_{1},\dots,v_{n}\,$ are strings such that the concatenation $v_{1}\dots v_{n}$ is in $\boldsymbol{L}_{1}\,$ and $w_{1},\dots,w_{n}$ are strings such that $w_{1}\dots w_{n}$ is in $\boldsymbol{L}_{2}$ .

[edit] See also

Language for languages in general
Syntax for the form of a language in general
Semantics for the meanings in a language
Natural language for languages that are not formal
Computer language for application of formal languages in computing
Programming language for the application of formal languages to program computers

[edit] Further reading

Hopcroft, J. & Ullman, J. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X.
Helena Rasiowa and Roman Sikorski (1970). The Mathematics of Metamathematics, 3^rd ed., PWN., chapter 6 Algebra of formalized languages.
Rozemberg, G. & Salomaa, A. (eds.) (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 978-3-540-61486-9.

[edit] External links

http://icalp06.dsi.unive.it/ ICALP 2006 33rd International Colloquium on Automata, Languages and Programming.

http://www.cs.auckland.ac.nz/CDMTCS/conferences/dlt/DLTConfSeries.html International Conferences on Developments in Language Theory

Automata theory: formal languages and formal grammars
Chomsky hierarchy	Grammars	Languages	Minimal automaton
Type-0	Unrestricted	Recursively enumerable	Turing machine
n/a	(no common name)	Recursive	Decider
Type-1	Context-sensitive	Context-sensitive	Linear-bounded
Type-2	Context-free	Context-free	Pushdown
Type-3	Regular	Regular	Finite
Each category of languages or grammars is a proper subset of the category directly above it.