Talk:Information gain in decision trees

From Wikipedia, the free encyclopedia

drawbacks : if we don't want the credit card number to show up in the decision tree, we just would not include it in the input attributes. Thus I think this is not a good example. Nulli 08:33, 13 March 2006 (UTC)

I don't see why a credit card number would be used to "describe" a customer in the first place, useful for identifying customers but would hold no use in describing them. If we were putting customers into a decision tree from a long list of customers and their attributes, I think including credit card numbers would be analogous to including the list ID number which would obviously be stupid. I think maybe this is what the author of that example is trying to say, i.e. care must be taken as to what attributes to include.
But I agree it is not a good example at all. There are also several other disadvantages of decision trees also which are not included. I'll try improve the article. Canderra 20:54, 21 May 2006 (UTC)

I'm confused about this. Information gain and relative entropy/KL divergence are not the same thing, assuming the common version of information gain used in decision trees. Information gain is mutual information, which is a special case of KL divergence. Both this page as well as the KL divergence page appear to make this mistake -- is there a reason for this, or should I fix it? nparikh 21:57, 21 October 2006 (UTC)

Historically, the term Information Gain was introduced by Renyi, as a more intuitive synonym for KL divergence. Information gain can be used in connection with any conditioning step that causes you to move from a distribution Q to a better distribution P. If the conditioning happens to be based on learning the value of a particular variable, then as you say the Information Gain is equal to the mutual information. But the term Information Gain is not restricted to this case. Jheald 10:14, 23 October 2006 (UTC)

[edit] Definition

\{x\in Ex \wedge value(x,a)=v\} and \{x\in Ex|value(x,a)=v\} describes the same set, isn't it? Then it should be written identically also, otherwise it might confuse people. 84.57.82.107 08:49, 5 April 2007 (UTC)