Pruning (algorithm)

From Wikipedia, the free encyclopedia

Pruning is a term in mathematics and informatics which describes a method of enumeration, which allows to cut parts of a decision tree. Pruned parts of the tree are no longer considered because the algorithm knows based on already collected data (e.g. through sampling) that these subtrees do not contain the searched object. The decision tree is simplified by removing some decisions.

Contents

[edit] Introduction

Decision trees are built with the available information and they are supporting tools in a decision making process. However, sometimes the given set of data may be irrelevant, erroneous or unnecessary, in which case the decision tree formed may not be correct; this phenomenon is called overfitting. Also, too much information may misguide the decision policy. When this happens, it becomes necessary to remove the nodes from the decision tree. The process of making the decision tree smaller by removing nodes that are not helpful is called Pruning.

[edit] Illustration

Pruning is carried out as a step by step process. In general, it is performed as

    1.     Remove the interior node in the tree, a non-leaf node.
    2.  Evaluate the performance without the pruned part.
    3.  Repeat until no improvement is obtained from pruning
    4.  Retain the best performed tree as the decision tree.

Consider the decision tree as given below,

Sample Decision Tree

When a decision process is related to the selection of a flower, then the branch with the shape is irrelevant and not related to the flower branch. Hence that particular branch can be pruned to improve performance.

Decision Tree Pruning

The above illustration is a simple example of pruning.

[edit] Further Study

Pessimistic Decision tree pruning based on Tree size [1] MDL based decision tree pruning Decision tree pruning using backpropagation neural networks

[edit] See also

[edit] References

  1. ^ "[1]"

[edit] External links

Languages