Talk:Latent Dirichlet allocation
From Wikipedia, the free encyclopedia
Is this related to Latent semantic analysis? Thadk 06:26, 3 November 2006 (UTC)
Yes and no. Yes in the sense that it stems from PLSI, which is the probabilistic sequel to Latent Semantic Analysis. No, because LDA is a hieararchic Bayesian model, when on the other hand LSA is based on singular value decomposition. Artod 19:08, 17 May 2007 (UTC)
A recent edit asserts that function words can be "filtered out;" given that these are high probabilty words, if for every topic multinomial their probably is low, this will lead to a lower overall likelihood for the data. Can someone explain what is supposed to be meant by "filtered out?" I didn't remove it, as it would be probably best to keep the point, though perhaps expressed more clearly. Ezubaric 01:56, 18 May 2007 (UTC)
I guess that filtering out can be done by examining the words whose variational parameters phi have a flat distribution instead of a spiked one. Not an automated task, though. Please refer to Blei et alii's paper as to what the variational parameters gamma and phi are. Artod 14:25, 4 July 2007 (UTC)