Latent class model

From Wikipedia, the free encyclopedia

In statistics, a latent class model (LCM) relates a set of observed discrete multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete. A class is characterized by a pattern of conditional probabilities that indicate the chance that variables takes on certain values.

For instance, the variables could be multiple choice items of a political questionnaire. The data in this case consists of a N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance certain answers are chosen.

Within each latent class, the observed variables are statistically independent. This is an important aspect. Usually the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes variables are independent (local independence). We then say that the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987).

In one form the latent class model is written as

p_{i_1, i_2, \ldots, i_N} \approx \sum_t^T p_t \, \prod_n^N p^n_{i_n,
t},

where T is the number of latent classes and pt are the so-called recruitment or unconditional probabilities that should sum to one. p^n_{i_n, t} are the marginal or conditional probabilities.

For a two-way latent class model the form is

p_{ij} \approx \sum_t^T p_t \, p_{it} \, p_{jt}.

This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization.

[edit] Links

[edit] References

Languages