Caltech 101

Caltech 101 is a data set of digital images created in September 2003 and compiled by Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques and is most applicable to techniques involving image recognition classification and categorization. Caltech 101 contains a total of 9,146 images, split between 101 distinct object categories (faces, watches, ants, pianos, etc.) and a background category. Provided with the images are a set of annotations describing the outlines of each image, along with a Matlab script for viewing.

Purpose

Most Computer Vision and Machine Learning algorithms function by training on example inputs. They require a large and varied set of training data to work effectively. For example, the real-time face detection method used by Paul Viola and Michael J. Jones was trained on 4,916 hand-labeled faces.[1]

Cropping, re-sizing and hand-marking points of interest is tedious and time-consuming.

Historically, most data sets used in computer vision research have been tailored to the specific needs of the project being worked on.A large problem in comparing computer vision techniques is the fact that most groups use their own data sets. Each set may have different properties that make reported results from different methods harder to compare directly. For example, differences in image size, image quality, relative location of objects within the images and level of occlusion and clutter present can lead to varying results.[2]

The Caltech 101 data set aims at alleviating many of these common problems.

However, a recent study [3] demonstrates that tests based on uncontrolled natural images (like the Caltech 101 data set) can be seriously misleading, potentially guiding progress in the wrong direction.

Data set

Images

The Caltech 101 data set consists of a total of 9,146 images, split between 101 different object categories, as well as an additional background/clutter category.

Each object category contains between 40 and 800 images. Common and popular categories such as faces tend to have a larger number of images than others.

Each image is about 300x200 pixels. Images of oriented objects such as airplanes and motorcycles were mirrored to be left to right aligned and vertically oriented structures such as buildings were rotated to be off axis.

Annotations

A set of annotations is provided for each image. Each set of annotations contains two pieces of information: the general bounding box in which the object is located and a detailed human-specified outline enclosing the object.

A Matlab script is provided with the annotations. It loads an image and its corresponding annotation file and displays them as a Matlab figure.

Uses

The Caltech 101 data set was used to train and test several computer vision recognition and classification algorithms. The first paper to use Caltech 101 was an incremental Bayesian approach to one shot learning,[4] an attempt to classify an object using only a few examples, by building on prior knowledge of other classes.

The Caltech 101 images, along with the annotations, were used for another one shot learning paper at Caltech. [5]

Other Computer Vision papers that report using the Caltech 101 data set include:

Analysis and comparison

Advantages

Caltech 101 has several advantages over other similar data sets:

Weaknesses

Weaknesses to the Caltech 101 data set[3][14] may be conscious trade-offs, but others are limitations of the data set. Papers that rely solely on Caltech 101 are frequently rejected.

Weaknesses include:

Other data sets

See also

References

  1. P. Viola and M. J. Jones, Robust Real-Time Object Detection, , IJCV 2004
  2. Oertel, C., Colder, B., Colombe, J., High, J., Ingram, M., Sallee, P., Current Challenges in Automating Visual Perception. Proceedings of IEEE Advanced Imagery Pattern Recognition Workshop 2008
  3. 3.0 3.1 3.2 Why is Real-World Visual Object Recognition Hard? Pinto N, Cox DD, DiCarlo JJ PLoS Computational Biology Vol. 4, No. 1, e27 doi:10.1371/journal.pcbi.0040027
  4. L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
  5. L. Fei-Fei, R. Fergus and P. Perona. One-Shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol28(4), 594 - 611, 2006.
  6. The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features. K. Grauman and T. Darrell. International Conference on Computer Vision (ICCV), 2005
  7. Combining Generative Models and Fisher Kernels for Object Class Recognition. Holub, AD. Welling, M. Perona, P. International Conference on Computer Vision (ICCV), 2005
  8. Object Recognition with Features Inspired by Visual Cortex. T. Serre, L. Wolf and T. Poggio. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), IEEE Computer Society Press, San Diego, June 2005
  9. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik. CVPR, 2006
  10. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. CVPR, 2006
  11. Empirical study of multi-scale filter banks for object categorization, M.J. Mar韓-Jim閚ez, and N. P閞ez de la Blanca. December 2005
  12. Multiclass Object Recognition with Sparse, Localized Features, Jim Mutch and David G. Lowe. , pg. 11-18, CVPR 2006, IEEE Computer Society Press, New York, June 2006
  13. Using Dependent Regions or Object Categorization in a Generative Framework, G. Wang, Y. Zhang, and L. Fei-Fei. IEEE Comp. Vis. Patt. Recog. 2006
  14. Dataset Issues in Object Recognition. J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B. C. Russell, A. Torralba, C. K. I. Williams, J. Zhang, and A. Zisserman. Toward Category-Level Object Recognition, Springer-Verlag Lecture Notes in Computer Science. J. Ponce, M. Hebert, C. Schmid, and A. Zisserman (eds.), 2006
  15. F. Tanner, B. Colder, C. Pullen, D. Heagy, C. Oertel, & P. Sallee, Overhead Imagery Research Data Set (OIRDS) – an annotated data library and tools to aid in the development of computer vision algorithms, June 2009, <http://sourceforge.net/apps/mediawiki/oirds/index.php?title=Documentation> (28 December 2009)
  16. L. Ballan, M. Bertini, A. Del Bimbo, A.M. Serain, G. Serra, B.F. Zaccone. Combining Generative and Discriminative Models for Classifying Social Images from 101 Object Categories. Int. Conference on Pattern Recognition (ICPR), 2012.

External links