Mildly context-sensitive language

From Wikipedia, the free encyclopedia

In formal language theory, a class of languages is mildly context-sensitive if it contains all context-free languages, can describe cross-serial dependencies, contains only polynomial languages, and if its languages are of constant growth.[1] The concept was introduced by Aravind Joshi in 1985 as a characterization of the type of grammar formalism needed for dealing with natural languages. Mild context-sensitivity occupies a middle ground between context-freeness, which is too limited to describe all phenomena present in natural languages, and full context sensitivity, which is too general to reveal anything about the class of natural languages in particular. A variety of formalisms are known to generate language classes which are mildly context-sensitive.

Definition

Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if

  1. it contains all context-free languages,
  2. it admits limited cross-serial dependencies,
  3. all the languages are parsable in polynomial time, and
  4. all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for the set of languages in question.

Formalisms

The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition. Two such classes are notable, each being generated by several equivalent formalisms. The smaller of the two classes is a proper subset of the larger class.[2]

The smaller language class is generated by the following formalisms:

The larger language class is generated by the following formalisms:

  • Linear context-free rewriting systems (LCFRS) developed by D. J. Weir
  • Minimalist grammars (MG) of Edward P. Stabler, Alain Lecomte, Christian Retoré, etc.
  • Multicomponent tree-adjoining grammars (MCTAG) defined in [3]
  • Multiple context free grammars (MCFG), developed in [4]
  • Simple range concatenation grammars (SRCG), developed by Boullier, 2000

The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known.[5]

Control Language Hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

  • Level-k languages are properly contained in the Level-(k + 1) language class
  • Level-k languages can be parsed in O(n^{{3\cdot 2^{{k-1}}}}) time
  • Level-k contains the language \{a_{1}^{n}\dotso a_{{2^{k}}}^{n}|n\geq 0\}, but not \{a_{1}^{n}\dotso a_{{2^{{k+1}}}}^{n}|n\geq 0\}
  • Level-k contains the language \{w^{{2^{{k-1}}}}|w\in \{a,b\}^{*}\}, but not \{w^{{2^{{k-1}}+1}}|w\in \{a,b\}^{*}\}

Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.

See also

Notes

  1. Kallmeyer 2010, p. 23.
  2. Kallmeyer 2010, p. 215-6.
  3. Joshi, et. al, 1991
  4. T., Kasami; M. Seki, and H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science (Osaka, Japan: Osaka University). 
  5. Kallmeyer 2010, p. 216.

Further reading

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.