Pseudocount

From Wikipedia, the free encyclopedia

A pseudocount is a count added to observed data in order to change the probability in a model of those data, which is known not to be zero, to being negligible rather than being zero.

In any observed data set or sample there is the possibility, especially with low-probability events and/or small data sets, of a possible event not occurring. Its observed frequency is therefore 0, implying a probability of 0. This is an oversimplification and is often unhelpful, particularly in probability-based machine learning techniques such as artificial neural networks and hidden Markov models. By artificially adjusting the probability of rare (but not impossible) events so those probabilities are not exactly zero, we avoid the zero-frequency problem.

The simplest approach is to add 1 to each observed number of events including the zero-count one. This is sometimes called "Laplace's rule" (more formally known as Laplace's rule of succession).

A more complex approach is to estimate the probability of the events from other factors and adjust accordingly.