Scale-invariant feature transform
From Wikipedia, the free encyclopedia
Scale-invariant feature transform (or SIFT) is a computer vision algorithm for extracting distinctive features from images, to be used in algorithms for tasks like matching different views of an object or scene (e.g. for stereo vision) and Object recognition. The features are invariant to image scale, rotation, and partially invariant (i.e. robust) to changing viewpoints, and change in illumination. The name Scale-invariant feature transform was chosen, as the algorithm transforms image data into scale-invariant coordinates relative to local features. However, there also exist other scale invariant image descriptors in the computer vision literature.
The algorithm was devised by David Lowe in 2004, who has a US patent on it, at the University of British Columbia.
First, the original image is progressively Gaussian blurred (see scale-space implementation) with σ in a band from 1 to 2 resulting in a series of Gaussian blurred images (a scale-space produced by cascade filtering). Then, these images are subtracted from their direct neighbors (by σ) to produce a new series of images (with difference of Gaussians which approximate the Laplacian of the Gaussian).
The major steps in the computation of the image features are
- Scale-space extrema detection - a specific type of blob detection where each pixel in the images is compared to its 8 neighbors and the 9 pixels each (corresponding pixel+8 neighbors) of the other pictures in the series.
- keypoint localization - keypoints are chosen from the extrema in scale space.
- orientation assignment - for each keypoint, in a 16x16 window, histograms of gradient directions are computed (using bilinear interpolation).
- keypoint descriptor - representation in a 128-dimensional vector.
The first two stages in the computation of image features are similar to the approach for blob detection with automatic scale selection proposed by Lindeberg (1998). For the application of SIFT keypoints in matching and object recognition, Lowe was applying a nearest neighbor algorithm, followed by a Hough transform for object recognition (as described in Lowe, 2004).
The feature representations found by SIFT are thought to be analogous to those of neurons in inferior temporal cortex, a region used for object recognition in primate vision.
The invariance of SIFT towards different image transformations like image rotation, scale changes (zoom), and off-plane rotations made it the most used detection/description scheme. As a consequence, there were many attempts that tried to outperform SIFT with more or less success. Besides a detailed comparative study of different state-of-the-art matching schemes, Krystian Mikolajczyk et al. presented a SIFT variant called GLOH (Gradient Location and Orientation Histogram). It is a SIFT-like descriptor that considers more spatial regions for the histograms. The higher dimensionality of the descriptor is reduced to 64 through principal components analysis (PCA). A different matching scheme called SURF (Speeded Up Robust Features) was presented by Herbert Bay et al.. The standard version of SURF is faster than SIFT and proved to be more robust against different image transformations than SIFT. SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral images. Despite all this hard competition, SIFT is still the most widely used matching scheme despite the fact that it is somehow limited in its application through its patent.
[edit] See also
- scale space
- blob detection
- ART-SIFT
- SURF
[edit] External links
- libsift: C# implementation of SIFT
- autopano-SIFT: a tool for generating panoramas from multiple images, using SIFT to match features between images
- MATLAB/C implementation of SIFT
- more detailed explanation of the keypoint extraction by the SIFT algorithm
- U.S. Patent 6,711,293: Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image - David Lowe's patent for the SIFT algorithm
- Website of SURF: Speeded Up Robust Features
[edit] References
- Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
- Lindeberg, Tony "Feature detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp 77--116, 1998. (Original reference on scale-space extrema detection and the scale invariant properties it leads to, including the use of scale-adapted Laplacian and determinant of the Hessian operators for blob detection).
- Krystian Mikolajczyk and Cordelia Schmid "A performance evaluation of local descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, 10, 27, pp 1615--1630, 2005.
- Herbert Bay, Tinne Tuytelaars and Luc Van Gool "SURF: Speeded Up Robust Features", Proceedings of the 9th European Conference on Computer Vision, Springer LNCS volume 3951, part 1, pp 404--417, 2006.