Structure tensor

From Wikipedia, the free encyclopedia

Structure tensors (or second-moment matrices) are matrix representations of partial derivatives. In the field of image processing and computer vision, they are typically used to represent gradients, edges or similar information. Structure tensors also provide a more powerful description of local patterns when compared with the directional derivative through its coherence measure. They have several other advantages that are detailed in the structure tensor section of this tutorial. This article is written largely from the point of view of image processing applications; however, it should also be useful to those from a more general math background who want to learn more about this representation.

Contents

[edit] Background

[edit] Gradient

Gradient information serves several purposes. It can relate the structure of objects in an image, identify features of interest for recognition/classification directly or provide the basis of further processing for various computer vision tasks. For example, "edges," noted by regions of high gradient magnitude, are central to the task of identifying defects in circuitry. A sample "edge-detected" image created using the 'Image Processing Toolbox' for MATLAB is shown where locations marked by white are those points that are indicative of high gradient magnitude, which can also be described as regions of high pixel contrast.

[edit] Directional derivative

When representing gradient information derived from an image, there are several options from which to choose. One such choice is the directional derivative that provides a vector representation. Its magnitude reflects the maximum change in pixel values while the phase is directed along the orientation corresponding to the maximum change. These two components are calculated as per Equations (1) and (2) respectively:

Image:DDmagEqn.png

Image:DDphaseEqn.png

where Ix denotes the partial derivative of image I along the x-axis. An example is shown against a horizontal step-edge image with the directional derivative overlaid with a red arrow.

The direction of the gradient in this case is based purely on the ordering of the pixel values. i.e. it points from black to white. If these colors were reversed, the gradient magnitude would remain the same; however, the direction would then be reversed such that it again pointed FROM the black TO the white region. In applications where the gradient is used to denote contours within the scene, such as object outlines, the polarity is of little use. Nevertheless, this orientation reveals the "normal" to the contour in question. "Orientation," in this context, implies PI-periodic behavior, as in the term "horizontal orientation." i.e. orientation ranges between [0,π). To reflect this periodicity, the directional derivative, when applied to the extraction of gradient-based structures, should also have an arrow pointed in the opposite direction.

Although the directional derivative is relatively computationally inexpensive, it does possess a weakness. This is best illustrated with an example. Given an isotropic structure, where there is no preferred direction of gradient, the directional derivative formula results in a zero magnitude. An example of such an isotropic structure is a black circle on a white background. There is clearly gradient information; however, since there is no preferred phase, it zeros itself out.

There are several ways to calculate the partial derivatives of the image. For example, simple differencing between neighboring pixels one option, although the gradient will only be representative of a 2x1 region of interest. The 'Difference of Gaussians' (DoG) is a common technique that convolves a specialized mask to calculate the partial derivative value over a larger region of interest.

The same output is reached if the original input is a uniformly colored region. Again, the directional derivative magnitude is zero as there is no gradient information with which to calculate. What is needed is a representation capable of discerning between these two examples that properly reflects the presence of gradient information in the first case, but none in the second.

[edit] Structure tensors

[edit] Overview

A different method of representing gradient information is by using the "structure tensor[1][2]." It also makes use of the Ix and Iy values, however, in a matrix form. The term tensor refers to a representation of an array of data. A significant difference between a tensor and a matrix, which is also an array, is that a tensor represents a physical quantity the measurement of which is no more influenced by the coordinates with which one observes it than one can account for it. Tensors can be represented by matrices whereas not all matrices are tensors. The "rank" of a tensor denotes the number of indices needed to describe it. For example, a scalar value x is a single value, hence requires no indices. This means that a scalar is a tensor of rank zero. A vector \vec v, often represented as vi = {v1,v2,..vn} uses a single index, i. This implies that a vector is a tensor of rank one. A two-dimensional matrix Mij is a tensor of rank two and so and so forth.

[edit] 2D structure tensor

The structure tensor matrix (also referred to as the "second-moment matrix") is formed as:

Image:STeqn2D.png

Eigen-decomposition is then applied to the structure tensor matrix S to form the eigenvalues and eigenvectors 12) and (\vec e_1, \vec e_2) respectively. These new gradient features allow a more precise description of the local gradient characteristics. For example, \vec e_1, is a unit vector directed normal to the gradient edge, while the \vec e_2 vector is tangent. The eigenvalues indicate the underlying certainty of the gradient structure along their associated eigenvector directions. As noted by several researchers, the "coherence" is obtained as a function of the eigen-values.[3][4][5] This value is capable of distinguishing between the isotropic and uniform cases. The coherence is calculated as per the following equation:

Image:CoherenceEqn.png

If a different method is used to obtain the partial derivative information, for example using Horn's approach where the average, absolute pixel difference is used, the coherence for the isotropic region is one while the uniform case is zero: opposite of the values obtained above. What is key is that no matter the partial derivative method used, the coherence, a feature of the structure tensor, is able to distinguish between the two cases.

Many corner detection and salient point location algorithms[6][7][8][9] make use of the eigenvalues that are derived from the structure tensor to quantify the certainty in the measurements[10]

There are other advantages to using the structure tensor representation as well. For example, local shifting of edge locations in minimized when applying a Gaussian smoothing operator element-by-element to the structure tensor.[11] Furthermore, the cancellations of opposing gradient polarity directions are prevented when structure tensors are summed.[12]

For scale-space properties and other theoretical properties of the structure tensor (or second-moment matrix) when using a Gaussian window function for integration, please refer to.[13][14] For applications of the structure tensor to motion estimation and to the estimation of affine image deformations, please see the articles on the Lukas-Kanade method and affine shape adaptation. For the application of the structure tensor to non-linear fingerprint enhancement, please see.[15]

[edit] 3D structure tensors

[edit] Overview

Elliptical representation of structure tensor
Elliptical representation of structure tensor

This form of gradient representation serves especially well in inferring structure from sparse and noisy data [Medioni et al. 2000]. The structure tensor is also applicable to higher dimensional data. For example, given three-dimensional data, such as that from a spatio-temporal volume (x,y,time) or medical imaging input (x,y,z), the structure tensor is represented as follows:

Image:STeqn3D.png

The eigen-decomposition of the tensor of rank two results in 123) and (\vec e_1, \vec e_2, \vec e_3) for the eigenvalues and eigenvectors respectively. The interpretation of these components can be visualized as 3D ellipses where the radii are equal to the eigenvalues in descending order and directed along their corresponding eigenvectors.

[edit] 3d structure types

The differences between the eigenvalues indicate underlying structure as well. For example, if the value of 1 − λ2) > > 0, this depicts a "surfel" surface element, where \vec e_1 is the normal to the surface.[11]

[edit] 3D results

Applied against actual data

[edit] Tensor addition

Tensor addition of sphere and step-edge case
Tensor addition of sphere and step-edge case

Another desirable property of the structure tensor form is that the tensor addition equates itself to the adding of the elliptical forms. For example, if the structure tensors for the sphere case and step-edge case are added, the resulting structure tensor is an elongated ellipsed along the direction of the step-edge case.

[edit] Applications

Although structure tensors are applicable to many ND domains, it is in the image processing / computer vision domains that is of considerable interest. Using gradient-based structure tensors, local patterns of contours and surfaces may be inferred through a diffusion process.[16] Diffusion aids to enforce local structural estimates that serve such applications as defect detection in lumber, occlusion detection in image sequences and aid in biometric identification of fingerprints.[17]

[edit] References

  1. ^ J. Bigun and G. Granlund (1987.). "Optimal Orientation Detection of Linear Symmetry". First int. Conf. on Computer Vision, ICCV, (London): 433-438, Piscataway: IEEE Computer Society Press, Piscataway, (Tech. Report LiTH-ISY-I-0828, Computer Vision Laboratory, Linkoping University, Sweden 1986. Thesis Report, Linkoping studies in science and technology No. 85, 1986). 
  2. ^ H. Knutsson (1989). "{Representing local structure using tensors". Proceedings 6th Scandinavian Conf. on Image Analysis: 244-251, Oulu: Oulu University. 
  3. ^ B. Jahne (1993). Spatio-Temporal Image Processing: Theory and Scientific Applications 751. Berlin: Springer-Verlag. 
  4. ^ G. Medioni, M. Lee and C. Tang (March 2000). A Computational Framework for Feature Extraction and Segmentation. Elsevier Science. 
  5. ^ D. Tschumperle and Deriche (September 2002). "Diffusion PDE's on Vector-Valued Images": 16–25. 
  6. ^ W. Forstner (1986). "A Feature Based Correspondence Algorithm for Image Processing" 26: 150–166. 
  7. ^ C. Harris and M. Stephens (1988). "A Combined Corner and Edge Detector". Proc. of the 4th ALVEY Vision Conference: 147–151. 
  8. ^ K. Rohr (1997). "On 3D Differential Operators for Detecting Point Landmarks" 15 (3): 219–233. 
  9. ^ B. Triggs (2004). "Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes". Proc. European Conference on Computer Vision 4: 100–113. 
  10. ^ C. Kenney, M. Zuliani and B. Manjunath, (2005). "An Axiomatic Approach to Corner Detection". Proc. IEEE Computer Vision and Pattern Recognition: 191–197. 
  11. ^ a b M. Nicolescu and G. Medioni (2003). "Motion Segmentation with Accurate Boundaries — A Tensor Voting Approach". Proc. IEEE Computer Vision and Pattern Recognition 1: pp.382–389. 
  12. ^ T. Brox, J. Weickert, B. Burgeth and P. Mrazek (2004). "Nonlinear Structure Tensors" (113): 1–32. 
  13. ^ T. Lindeberg: Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, Dordrecht, Netherlands, 1994.
  14. ^ J. Garding and T. Lindeberg: "Direct computation of shape cues using scale-adapted spatial derivative operators, International Journal of Computer Vision, vol 17(2), pp. 163--191, 1996.
  15. ^ A. Almansa and T. Lindeberg, ``Enhancement of fingerprint images using shape-adaptated scale-space operators, IEEE Transactions on Image Processing, volume 9, number 12, pp 2027-2042, 2000
  16. ^ S. Arseneau and J. Cooperstock (September 2006). "An Asymmetrical Diffusion Framework for Junction Analysis". British Machine Vision Conference 2: 689–698. 
  17. ^ S. Arseneau, and J. Cooperstock (November 2006). "An Improved Representation of Junctions through Asymmetric Tensor Diffusion". International Symposium on Visual Computing. 

[edit] See also

[edit] Resources