Talk:Cross entropy
From Wikipedia, the free encyclopedia
This article uses the notation KL(p, q) and also DKL(p || m) when talking about Kullback-Leibler divergence. Are these notations two ways of expressing the same idea? If so, the article may want to indicate this equivalence.
The log-likelihood of the training data for a multinomial model is the same as the cross-entropy of the data. (Elements of Statistical Learning, page 32)
L(theta) = sum (all classes k) I(G=k) log Pr(G=k | X = x)
I guess "I(G=k)" is p and Pr(G=k | X=x) is q here.
Could somebody in the know please add this? Thanks!