Boosting methods for object categorization

Given images containing various known objects in the world, a classifier can be learned from them to automatically categorize the objects in future images. Simple classifiers built based on some image feature of the object tend to be weak in categorization performance. Using boosting methods for object categorization, then, is a way to unify the weak classifiers in a special way to boost the overall ability of categorization.

Contents

Problem of object categorization

Object categorization is a typical task of computer vision which involves determining whether or not an image contains some specific category of object. The idea is closely related with recognition, identification, and detection. Appearance based object categorization typically contains feature extraction, learning a classifier, and applying the classifier to new examples. There are many ways to represent a category of objects, e.g. from shape analysis, bag of words models, or local descriptors such as SIFT, etc. Examples of supervised classifiers are Naive Bayes classifier, SVM, mixtures of Gaussians, neural network, etc. However, recent research has shown that object categories and their locations in images can be discovered in an unsupervised manner as well.[1]

Status quo for object categorization

The recognition of object categories in images is a challenging problem in computer vision, especially when the number of categories is large. This is due to high intra class variability and the need for generalization across variations of objects within the same category. Objects within one category may look quite different. Even the same object may appear unalike under different viewpoint, scale, and illumination. Background clutter and partial occlusion add difficulties to recognition as well.[2] Humans are able to recognize thousands of object types, whereas most of the existing object recognition systems are trained to recognize only a few, e.g., human face, car, simple objects, etc.[3] Research has been very active on dealing with more categories and enabling incremental additions of new categories, and although the general problem remains unsolved, several multi-category objects detectors (number of categories around 20) for clustered scenes have been developed. One means is by feature sharing and boosting.

Boosting methods in machine learning

Boosting is a general method for improving the accuracy of any given learning algorithm.[4] A typical application of AdaBoost as one of the popular boosting algorithms is fast face detection by P. Viola and M. Jones, Viola–Jones object detection framework. There AdaBoost is used both to select good features (very simple rectangles) and to turn weak learners into a final strong classifier.

Using boosting methods for object categorization

Boosting for binary categorization

We use Adaboost for face detection as an example of binary categorization. The two categories are faces versus background. The general algorithm is as follows:

  1. Form a large set of simple features
  2. Initialize weights for training images
  3. for T rounds
    1. Normalize the weights
    2. For available features from the set, train a classifier using a single feature and evaluate the training error
    3. Choose the classifier with the lowest error
    4. Update the weights of the training images: increase if classified wrongly by this classifier, decrease if correctly
  4. Form the final strong classifier as the linear combination of the T classifiers (coefficient larger if training error is small)

After boosting, a classifier constructed from 200 features could yield a 95% detection rate under a 10^{-5} false positive rate.[5]

Another application of boosting for binary categorization is a system which detects pedestrians using patterns of motion and appearance[6]. This work is the first to combine both motion information and appearance information as features to detect a walking person. It takes a similar approach as the face detection work of Viola and Jones.

Boosting for multi-class categorization

Compared with binary categorization, multi-class categorization looks for common features that can be shared across the categories at the same time. They turn to be more generic edge like features. During learning, the detectors for each category can be trained jointly. Compared with training separately, it generalizes better, needs less training data, and requires less number of features to achieve same performance.

The main flow of the algorithm is similar to the binary case. What is different is that a measure of the joint training error shall be defined in advance. During each iteration the algorithm chooses a classifier of a single feature (features which can be shared by more categories shall be encouraged). This can be done via converting multi-class classification into a binary one (a set of categories versus the rest) [7], or by introducing a penalty error from the categories which do not have the feature of the classifier.[8]

In the paper "Sharing visual features for multiclass and multiview object detection", A. Torralba et al. used GentleBoost for boosting and showed that when training data is limited, learning via sharing features does a much better job than no sharing, given same boosting rounds. Also, for a given performance level, the total number of features required (and therefore the run time cost of the classifier) for the feature sharing detectors, is observed to scale approximately logarithmically with the number of class, i.e., slower than linear growth in the non-sharing case. Similar results are shown in the paper "Incremental learning of object detectors using a visual shape alphabet", yet the authors used AdaBoost for boosting.

References

  1. ^ Sivic, Russell, Efros, Freeman & Zisserman, "Discovering objects and their location in images", ICCV 2005
  2. ^ A. Opelt, A. Pinz, et al., "Generic Object Recognition with Boosting", IEEE Transactions on PAMI 2006
  3. ^ M. Marszalek, "Semantic Hierarchies for Visual Object Recognition", 2007
  4. ^ Y. Freund R. E. Schapire, "A Short Introduction to Boosting", 1999
  5. ^ P. Viola, M. Jones, "Robust Real-time Object Detection", 2001
  6. ^ P. Viola, et al., "Detecting Pedestrians Using Patterns of Motion and Appearance", ICCV 2003
  7. ^ A. Torralba, K. P. Murphy, et al., "Sharing visual features for multiclass and multiview object detection", IEEE Transactions on PAMI 2006
  8. ^ A. Opelt, et al., "Incremental learning of object detectors using a visual shape alphabet", CVPR 2006