In machine learning, multi-label classification is a variant of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem is categorizing instances into more than two classes.
There are two main methods for tackling the multi-label classification problem[1]: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification.
Several problem transformation methods exist for multi-label classification; a common one is the binary relevance (BR) where one binary classifier is trained per label. Various other transformations exist: The Label Combinations (LC) transformation, creates one binary classifier for every possible label combination. Other transformation methods include RAkEL[2] and Chain Classifiers(CC)[3]. Various problem transformation methods have been developed such as Ml-kNN[4], a variant of the k-nearest neighbors lazy classifiers.
Metrics for multi-label classification are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. The following metrics are typically used:
Implementations of multi-label algorithms are available in the Mulan and Meka software packages, both based on Weka.
A list of commonly used multi-label data-sets is available at the Mulan website.