Grammar induction

From Wikipedia, the free encyclopedia

This article or section does not adequately cite its references or sources.
Please help improve this article by adding citations to reliable sources. (help, get involved!)
This article has been tagged since January 2007.

This article or section is in need of attention from an expert on the subject.
Please help recruit one or improve this article yourself. See the talk page for details.
Please consider using {{Expert-subject}} to associate this request with a WikiProject

Grammatical Induction (using evolutionary algorithms) is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammars can easily be represented as a tree structure of production rules that can be subjected to evolutionary operators. Algorithms of this sort stem from the genetic programming paradigm pioneered by John Koza. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF language made trees a more flexible approach.

[edit] Representations

Koza represented Lisp programs as trees. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions of the lisp code. Similar analogues between the tree structured lisp representation and the represenation of grammars as trees, made the application of genetic programming techniques possible for grammar induction.

In the case of Grammar Induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language.

In a tree representation of a grammar, a terminal symbol (e.g. a noun or verb or some other part of speech) of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase or a verb phrase) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal.

[edit] Other applications

The principle of grammar induction has been applied to other aspects of natural language processing, and have been applied (among many other problems) to morpheme analysis, and even place name derivations.

Retrieved from "http://en.wikipedia.org../../../g/r/a/Grammar_induction.html"

Categories: Articles lacking sources from January 2007 | All articles lacking sources | Pages needing expert attention | Genetic programming | Natural language processing