Scale space

From Wikipedia, the free encyclopedia

Scale space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities. It is a formal theory for handling image structures at different scales in such a way that fine-scale features can be successively suppressed and a scale parameter t can be associated with each level in the scale-space representation.

The notion of scale-space is general and applies in arbitrary dimensions. For simplicity of presentation, however, we here describe the most commonly used case with two-dimensional images. For a given image f(x,y), its linear scale-space representation is a family of derived signals L(x,y,t) defined by convolution of f(x,y) with the Gaussian kernel

g(x, y, t) = \frac {1}{2{\pi} t}e^{-(x^2+y^2)/2t}.\,

such that

L(x, y, t)\ = g(x, y, t) * f(x, y).

where t = σ2 is the variance of the Gaussian. Equivalently, the scale-space family can be generated from the solution of the heat equation,

\partial_t L = \frac{1}{2} \nabla^2 L,

with initial condition L(x,y,0) = f(x,y). Several different derivations have been expressed showing that this is the canonical way to generate a linear scale-space, based on the essential requirement that new structures must not be created from a fine scale to any coarser scale.[1][2][3][4][5][6] Conditions,referred to as scale-space axioms, that have been used for deriving the uniqueness of the Gaussian kernel include linearity, shift-invariance, semi-group structure, non-enhancement of local extrema, scale invariance and rotational invariance.

The motivation for generating a scale-space representation of a given data set originates from the basic fact that real-world objects are composed of different structures at different scales. This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation. For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a machine vision system analysing an unknown scene, there is no way to know a prioiri what scales are appropriate for describing the data. Hence, the only reasonable approach is to consider descriptions at all scales simultaneously.

From the scale-space representation, a large variety of image processing and computer vision operations can be expressed, such as feature detection, feature classification, image segmentation, image matching, motion estimation and computation of shape cues, based on (possibly non-linear) combinations of Gaussian derivatives at multiple scales

L_{x^m y^n}(x, y, t) = \partial_{x^m y^n} \left( g(x, y, t) * f(x, y) \right).

A highly useful property of scale-space representation is that image representations can be made invariant to scales, in order to handle size variations that arise from objects of different size or varying distances between the object and the camera. Scale invariance can be achieved by performing scale selection[7][8] based on local maxima (or minima) over scales of normalized derivatives

L_{\xi^m \eta^n}(x, y, t) = t^{(m+n) \gamma/2} L_{x^m y^n}(x, y, t)

where \gamma \in [0,1] is a parameter that is related to the dimensionality of the image feature. Following this approach of gamma-normalization, different types of scale adaptive and scale invariant feature detectors can be expressed for tasks such as blob detection, corner detection, ridge detection and edge detection. In particular, the scale levels obtained from automatic scale selection can be used for determining regions of interest for subsequent affine shape adaptation[9] to obtain affine invariant interest points[10][11] or for determining scale levels for computing associated image descriptors, such as locally scale adapted N-jets. Recent work has shown that also more complex operations, such as scale-invariant object recognition can be performed in this way, by computing local image descriptors (N-jets or local histograms of gradient directions) at scale-adapted interest points obtained from scale-space maxima of the normalized Laplacian operator (see also scale-invariant feature transform[12]).

The notion of scale-space representation has also been frequently used for expressing coarse-to-fine methods in particular for tasks such as image matching and for multi-scale image segmentation. For technical details when implementing scale-space smoothing in practice, please see the article on scale-space implementation.

Pyramid representation is a predecessor to scale-space representation, constructed by simultaneously smoothing and subsampling a given signal.[13][14] In this way, computationally highly efficient algorithms can be obtained. In a pyramid, however, it is usually algorithmically harder to relate structures at different scales, due to the discrete nature of the scale levels. In a scale-space representation, the existence of a continuous scale parameter makes it conceptually much easier to express this so-called deep structure. For features defined as zero-crossings of differential invariants, the implicit function theorem directly defines trajectories across scales, and at those scales where bifurcations occur, the local behaviour can be modelled by singularity theory.

There are interesting relations between scale-space representation and biological vision. Neurophysiological studies have shown that there are receptive field profiles in the mammalian retina and visual cortex, which can be well modelled by linear or non-linear scale-space operators.[15][16]

Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes.[17][18] There are strong relations between scale-space theory and wavelet theory, although these two notions of multi-scale representation have been developed from somewhat different premises. There has also been work on other multi-scale approaches, such as pyramids and a variety of other kernels, that do not exploit or require the same requirements as true scale-space descriptions do.

Contents

[edit] Further reading

[edit] Scale-space axioms

The scale-space representation satisfies a number of special properties that single out the Gaussian kernel as a unique choice of smoothing kernels. These properties, which include linearity, shift invariance, semi-group property, non-creation/non-enhancement of local extrema and scale invariance are listed in the separate article on scale-space axioms.

[edit] Multi-scale feature detection

One of the main applications of scale-space representation is to perform multi-scale feature detection in terms of Gaussian derivatives and differential invariants. This topic is treated in more detail in the articles on blob detection, corner detection, ridge detection and edge detection. Gaussian derivative operators are also frequently used as main features for object recognition, see the article on scale-invariant feature transform for a few examples.

[edit] Multi-scale segmentation

Another common application of the scale-space methodology is in terms of multi-scale image segmentation methods, which often proceed in a coarse-to-fine fashion, please see the article on scale-space segmentation for an overview of various approaches.

[edit] Non-symmetric Gaussian smoothing

When computing image descriptors subject to perspective transformations, invariance to local affine deformations can be achieved by complementing the regular scale-space concept based on rotationally symmetric Gaussian kernels with affine Gaussian kernels with their shapes determined by the local image structure, please see the article on affine shape adaptation for theory and algorithms.

[edit] Implementation issues

When implementing scale-space smoothing in practice there are a number of different approaches that can be taken in terms of continouous or discrete Gaussian smoothing, implementation in the Fourier domain, in terms of pyramids based on binomial filters that approximate the Gaussian or using recursive filters. More details about this are given in a separate article on scale-space implementation.

[edit] References

  1. ^ Witkin, A. P. "Scale-space filtering", Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany,1019–1022, 1983.
  2. ^ Koenderink, Jan "The structure of images", Biological Cybernetics, 50:363–370, 1984
  3. ^ Lindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994, ISBN 0-7923-9418-6
  4. ^ Florack, Luc, Image Structure, Kluwer Academic Publishers, 1997.
  5. ^ Sporring, Jon et al (Eds), Gaussian Scale-Space Theory, Kluwer Academic Publishers, 1997.
  6. ^ Romeny, Bart ter Haar, Front-End Vision and Multi-Scale Image Analysis, Kluwer Academic Publishers, 2003.
  7. ^ Lindeberg, Tony "Feature detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp 77–116, 1998.
  8. ^ Lindeberg, Tony "Edge detection and ridge detection with automatic scale selection", International Journal of Computer Vision, 30, 2, pp 117–154, 1998.
  9. ^ Lindeberg, T. and Garding, J.: Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure, Image and Vision Computing, 15,~415–434, 1997.
  10. ^ Baumberg, A.: Reliable feature matching across widely separated views, Proc. Computer Vision Pattern Recognition, I:1774–1781, 2000.
  11. ^ Mikolajczyk, K. and Schmid, C.: Scale and affine invariant interest point detectors, Int. Journal of Computer Vision, 60:1, 63 - 86, 2004.
  12. ^ Lowe, D. G., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
  13. ^ Burt, Peter and Adelson, Ted, "The Laplacian Pyramid as a Compact Image Code", IEEE Trans. Communications, 9:4, 532–540, 1983.
  14. ^ Crowley, J. L. and Sanderson, A. C. "Multiple resolution representation and probabilistic matching of 2-D gray-scale shape", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113-121, 1987.
  15. ^ Young, R. A. "The Gaussian derivative model for spatial vision: Retinal mechanisms", Spatial Vision, 2:273–293, 1987.
  16. ^ DeAngelis, G. C., Ohzawa, I., and Freeman, R. D., "Receptive-field dynamics in the central visual pathways", Trends Neurosci. 18: 451–458, 1995.
  17. ^ Romeny, Bart (Ed), Geometry-Driven Diffusion in Computer Vision, Kluwer Academic Publishers, 1994.
  18. ^ Weickert, J Anisotropic diffusion in image processing, Teuber Verlag, Stuttgart, 1998.

[edit] See also

[edit] External links

In other languages