Talk:Cross entropy

From Wikipedia, the free encyclopedia

WikiProject Physics This article is within the scope of WikiProject Physics, which collaborates on articles related to physics.
??? This article has not yet received a rating on the assessment scale. [FAQ]
??? This article has not yet received an importance rating within physics.

Please rate this article, and then leave comments here to explain the ratings and/or to identify the strengths and weaknesses of the article.

This article uses the notation KL(p, q) and also DKL(p || m) when talking about Kullback-Leibler divergence. Are these notations two ways of expressing the same idea? If so, the article may want to indicate this equivalence.


The log-likelihood of the training data for a multinomial model is the same as the cross-entropy of the data. (Elements of Statistical Learning, page 32)

L(theta) = sum (all classes k) I(G=k) log Pr(G=k | X = x)

I guess "I(G=k)" is p and Pr(G=k | X=x) is q here.

Could somebody in the know please add this? Thanks!