Katz's back-off model
From Wikipedia, the free encyclopedia
Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by "backing-off" to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results.
Contents[hide] |
[edit] The method
The equation for Katz's back-off model is deceptively simple: [1]
where,
- C(x) = number of times x appears in training
- wi = ith word in the given context
Essentially, this means that if the n-gram has been seen k or more times in training, the conditional probability of a word given its history is proportional to the maximum likelihood estimate of that n-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the "(n-1)-gram".
The more difficult part is determining the values for k, d and α.
[edit] Computing the parameters
k is the least important of all the parameters. It is usually chosen to be 0. However, empirical testing may find more optimal values for k.
d is typically the amount of discounting found by Good-Turing estimation. In other words, if Good-Turing estimates C as C * , then
To compute α, it is useful to first define a quantity β, which is the left-over probability mass for the (n-1)-gram:
Then the back-off weight, α, is computed as follows:
[edit] Discussion
This model generally works well in practice, but fails in some circumstances. For example, suppose that the bigram "a b" the unigram "c" are very common, but the trigram "a b c" is rarely seen (less than k times). Instead of assigning a more appropriate value of 0, the method will back off to the bigram and estimate P(c | b).[2]
[edit] References
- ^ Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401.
- ^ Manning and Shütze, Foundations of Statistical Natural Language Programming, MIT Press (1999), ISBN 978-0262133609.