Talk:Latent Dirichlet allocation

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Is this related to Latent semantic analysis? Thadk 06:26, 3 November 2006 (UTC)

Yes and no. Yes in the sense that it stems from PLSI, which is the probabilistic sequel to Latent Semantic Analysis. No, because LDA is a hieararchic Bayesian model, when on the other hand LSA is based on singular value decomposition. Artod 19:08, 17 May 2007 (UTC)

A recent edit asserts that function words can be "filtered out;" given that these are high probabilty words, if for every topic multinomial their probably is low, this will lead to a lower overall likelihood for the data. Can someone explain what is supposed to be meant by "filtered out?" I didn't remove it, as it would be probably best to keep the point, though perhaps expressed more clearly. Ezubaric 01:56, 18 May 2007 (UTC)

I guess that filtering out can be done by examining the words whose variational parameters phi have a flat distribution instead of a spiked one. Not an automated task, though. Please refer to Blei et alii's paper as to what the variational parameters gamma and phi are. Artod 14:25, 4 July 2007 (UTC)