Corner detection

From Wikipedia, the free encyclopedia

Corner detection or the more general terminology interest point detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, tracking, image mosaicing, panorama stitching, 3D modelling and object recognition.

Output of a typical Corner detection algorithm

Contents

[edit] Formalization

A corner can be defined as the intersection of two edges. A corner can also be defined as a point for which there are two dominant and different edge directions in a local neighbourhood of the point. An interest point is a point in an image which has a well-defined position and can be robustly detected. This means that an interest point can be a corner but it can also be, for example, an isolated point of local intensity maximum or minimum, line endings, or a point on a curve where the curvature is locally maximal. In practice, most so-called corner detection methods detect interest points in general rather than corners in particular. As a consequence, if only corners are to be detected it is necessary to do a local analysis of detected interest points to determine which of these are real corners. Unfortunately, in the literature, "corner", "interest point" and "feature" are used somewhat interchangeable, which rather clouds the issue. Specifically, there are several blob detectors that can be referred to as "interest point operators", but which are sometimes erroneously referred to as "corner detectors". Moreover, there exists a notion of ridge detection to capture the presence of elongated objects.

Corner detectors are not usually very robust and often require expert supervision or large redundancies introduced to prevent the effect of individual errors from dominating the recognition task. The quality of a corner detector is often judged based on its ability to detect the same corner in multiple images, which are similar but not identical, for example having different lighting, translation, rotation and other transforms. A simple approach to corner detection in images is using correlation, but this gets very computationally expensive and suboptimal. An alternative approach used frequently is based on a method proposed by Harris and Stephens (below), which in turn is an improvement of a method by Moravec.

[edit] The Moravec corner detection algorithm

This is one of the earliest corner detection algorithms and defines a corner to be a point with low self similarity. The algorithm tests each pixel in the image to see if a corner is present, by considering how similar a patch centred on the pixel is to nearby, largely overlapping patches. The similarity is measured by taking the sum of squared differences (SSD) between the two patches. A lower number indicated more similarity.

If the pixel is in a region of uniform intensity, then the nearby patches will look similar. If the pixel is on an edge, then nearby patches in a direction orthogonal to the edge will look quite different, but nearby patches in a direction normal to the edge will be similar. If the pixel is on a feature with variation in all directions, then none of the nearby patches will look similar.

The corner strength is defined as the smallest SSD between the patch and its neighbours (horizontal, vertical and on the two diagonals). If this number is locally maximal, then a feature of interest is present.

As pointed out by Moravec, one of the main problems with this operator is that it is not isotropic: if an edge is present that is not in the direction of the neighbours, then it will be detected as an interest point.

[edit] The Harris & Stephens / Plessey corner detection algorithm

Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly, instead of using shifted patches. It should be noted that this corner score is often referred to as autocorrelation, since the term is used in the paper in which this detector is described. However, the mathematics in the paper clearly indicate that the SSD is used.

Without loss of generality, we will assume a grayscale 2-dimensional image is used. Let this image be given by I. Consider taking an image patch over the area (u,v) and shifting it by (x,y). The SSD between these two patches, S is given by:

S = \sum_u \sum_v \left( I(u,v) - I(u-x,v-y) \right)^2

The Harris matrix (denotes A) is found by taking the second derivative (the Hessian) of S around (x,y) = (0,0). A is given by:

A =  \begin{bmatrix} \langle I_x^2 \rangle & \langle I_x I_y \rangle\\ \langle I_x I_y\rangle & \langle I_y^2 \rangle \end{bmatrix},

where angle brackets denote averaging (summation over (u,v)), and the typical notation for partial derivatives is used. If a circular window (or circularly weighted window, such as a Gaussian) is used, then the response will be isotropic.

The strength of the corner is determined by `how much' second derivative there is. This is done by considering the eigenvalues (λ1 and λ2) of A. Based on the magnitudes of the eigenvalues, the following inferences can be made based on this argument:

  1. If \lambda_1 \approx 0 and \lambda_2 \approx 0 then there are no features of interest at this pixel (x,y).
  2. If \lambda_1 \approx 0 and λ2 is some large positive values, then an edge is found.
  3. If λ1 and λ2 are both large, distinct positive values, then a corner is found.

Harris and Stephens note that exact computation of the eigen values is computationally expensive (since it requires a square root) and instead suggest the following function Mc, where κ is a tunable parameter which determines how 'edge-phobic' the algorithm is.

M_c = \lambda_1 \lambda_2 - \kappa \, (\lambda_1 + \lambda_2)^2
M_c = \operatorname{det}(A) - \kappa \, \operatorname{trace}^2(A)

Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix A and instead it is sufficient to evaluate the determinant and trace of A to find corners, or rather interest points in general.

The value of κ has to be determined empirically, and in the literature values in the range 0.04 - 0.15 have been reported as feasible.

[edit] The multi-scale Harris operator

The computation of the second moment matrix (sometimes also referred to as "structure tensor") A in the Harris operator, requires the computation of image derivatives Ix,Ix in the image domain as well as the summation of non-linear combinations of these derivatives over local neighbourhoods. Since the computation of derivatives usually involves a stage of scale-space smoothing, an operational definition of the Harris operator requires two scale parameters: (i) a local scale for smoothing prior to the computation of image derivatives, and (ii) an integration scale for accumulating the non-linear operations on derivative operators into an integrated image descriptor.

With I denoting the original image brightness, let L denote the scale-space representation of I obtained by convolution with a Gaussian kernel

g(x, y, t) = \frac {1}{2{\pi} t}e^{-(x^2+y^2)/2t}

with local scale parameter t:

L(x, y, t)\ = g(x, y, t) * I(x, y)

and let L_x = \partial_x L and L_y = \partial_y L denote the partial derivatives of L. Moreover, introduce a Gaussian window function g(x,y,s) with integration scale parameter s. Then, the multi-scale second-moment matrix (Lindeberg and Garding 1997) can be defined as

\mu(x, y; t, s) = \int_{\xi = -\infty}^{\infty} \int_{\eta = -\infty}^{\infty} \begin{bmatrix}  L_x^2(x-\xi, y-\eta; t)                               & L_x(x-\xi, y-\eta; t) \, L_y(x-\xi, y-\eta; t) \\ L_x(x-\xi, y-\eta; t) \, L_y(x-\xi, y-\eta; t) & L_y^2(x-\xi, y-\eta; t)  \end{bmatrix} g(\xi, \eta; s) \, d\xi \, d\eta.

Then, we can compute eigenvalues of μ in a similar way as the eigenvalues of A and define the multi-scale Harris corner measure as

M_c(x, y; t, s) = \operatorname{det}(\mu(x, y; t, s)) - \kappa \, \operatorname{trace}^2(\mu(x, y; t, s)).

Concerning the choice of the local scale parameter t and the integration scale parameter s, these scale parameters are usually coupled by a relative integration scale parameter γ such that s = γ2t, where γ is usually chosen in the interval [\sqrt{2}, 2]. Thus, we can compute the multi-scale Harris corner measure Mc(x,y;t2t) at any scale t in scale-space to obtain a multi-scale corner detector, which responds to corner structures of varying sizes in the image domain (Baumberg 2000).

In practice, this multi-scale corner detector is often complemented by a scale selection step, where the scale-normalized Laplacian operator (Lindeberg 1998)

\nabla^2_{norm} L(x, y; t)\ = t \nabla^2 L(x, y, t) = t (L_{xx}(x, y, t) + L_{yy}(x, y, t))

is computed at every scale in scale-space and scale adapted corner points with automatic scale selection (the "Harris-Laplace operator") are computed from the points that are simultaneously (Mikolajczyk and Schmid 2004):

  • spatial maxima of the multi-scale corner measure Mc(x,y;t2t)
    (\hat{x}, \hat{y}; t) = \operatorname{argmaxlocal}_{(x, y)} M_c(x, y; t, \gamma^2 t)
  • local maxima or minima over scales of the scale-normalized Laplacian operator \nabla^2_{norm}(x, y, t)
    \hat{t} = \operatorname{argmaxminlocal}_{t} \nabla^2_{norm}L(\hat{x}, \hat{y}; t).

[edit] The Shi and Tomasi corner detection algorithm

Note that this is also sometimes referred to as the Kanade-Tomasi corner detector.

The corner detector is strongly based on the Harris corner detector. The authors show that for image patches undergoing affine transformations, min12) is a better measure of corner strength than Mc.

[edit] The level curve curvature approach

An earlier approach to corner detection is to detect points where the curvature of level curves and the gradient magnitude are simultaneously high. A differential way to detect such points is to compute the rescaled level curve curvature (the product of the level curve curvature and the gradient magnitude raised to the power of three)

\tilde{\kappa}(x, y;t) = L_x^2 L_{yy} + L_y^2 L_{xx} - 2 L_x L_y L_{xy}

and to detect positive maxima and negative minima of this differential expression at some scale t in the scale-space representation L of the original image. A main problem with this approach, however, is that it is sensitive to noise and to the choice of the scale level. A better method is to compute the γ-normalized rescaled level curve curvature

\tilde{\kappa_{norm}}(x, y;t) = t^{2 \gamma} (L_x^2 L_{yy} + L_y^2 L_{xx} - 2 L_x L_y L_{xy})

with γ = 7 / 8 and to detect signed scale-space maxima of this expression, that are points and scales that are positive maxima and negative minima with respect to both space and scale

(\hat{x}, \hat{y}; \hat{t}) = \operatorname{argmaxlocal}_{(x, y; t)} \tilde{\kappa}_{norm}(x, y; t)

in combination with a complementary localization step to handle the increase in localization error at coarser scales (Lindeberg 1998). In this way, larger scale values will be associated with rounded corners of large spatial extent while smaller scale values will be associated with sharp corners with small spatial extent. This approach is the first corner detector with automatic scale selection (prior to the "Harris-Laplace operator" above) and has been used by (Bretzner and Lindeberg 1998) for tracking corners under large scale variations in the image domain.

[edit] LoG, DoG and DoH feature detection

LoG is an acronym standing for Laplacian of Gaussian, DoG is an acronym standing for Difference of Gaussians (DoG is an approximation of LoG) and DoH is an acronym standing for Determinant of the Hessian.

These detectors are more completely described in blob detection, however the LoG and DoG blobs do not necessarily make highly selective features, since these operators may also respond to edges. To improve the corner detection ability of the DoG detector, the feature detector used in the SIFT system uses an additional post-processing stage, where the eigenvalues of the hessian of the image at the detection scale are examined in a similar way as in the Harris operator. If the ratio of the eigenvalues is too high, then the local image is regarded as too edge-like, so the feature is rejected. The DoH operator on the other hand only responds when there are significant grey-level variations in two directions.

[edit] The Wang and Brady corner detection algorithm

The Wang and Brady detector considers the image to be a surface, and looks for places where there is large curvature along an image edge. In other words, the algorithm looks for places where the edge changes direction rapidly. The corner score, C is given by:

C = \nabla^2I - c|\nabla I|^2,

where c determines how edge-phobic the detector is. The authors also note that smoothing (Gaussian is suggested) is required to reduce noise. In this case, the first term of C becomes the Laplacian (single-scale) blob detector.

Smoothing also causes displacement of corners, so the authors derive an expression for the displacement of a 90 degree corner, and apply this as a correction factor to the detected corners.

[edit] The SUSAN corner detector

SUSAN as an acronym standing for Smallest Univalue Segment Assymilating Nucleus.

For feature detection, SUSAN places a circular mask over the pixel to be tested (the nucleus). The region of the mask is M, and a pixel in this mask is represented by \vec{m} \in M. The nucleus is at \vec{m}_0. Every pixel is compared to the nucleus using the comparison function:

c(\vec{m}) = e^{\frac{(I(\vec{m}) - I(\vec{m}_0))^6}{t}}

where t determines the width, and the power of the exponent has been determined empirically. This function has the appearance of a smoothed top-hat or rectangular function. The area of the USAN is given by:

n(M) = \sum_{\vec{m}\in M} c(\vec{m})

If c is the rectangular function, then n is the number of pixels in the mask which are within t of the nucleus. The response of the USAN operator is given by:

R(M) =     \begin{cases}                g - n(M) & \mbox{if}\ n(M) < g\\                0        & \mbox{otherwise,}            \end{cases}

where g is named the `geometric threshold'. In other words the USAN operator only has a positive score if the area is small enough. The smallest USAN locally can be found using non-maximal suppression, and this is the complete SUSAN operator.

The value t determines how similar points have to be to the nucleus before they are considered to be part of the univalue segment. The value of g determines the minimum size of the univalue segment. If g is large enough, then this becomes an edge detector.

For corner detection, two further steps are used. Firstly, the centroid of the USAN if found. A proper corner will have the centroid far from the nucleus. The second step insists that all points on the line from the nucleus through the centroid out to the edge of the mask are in the USAN.

This technique is patented with UK patent 2272285.

[edit] The Trajkovic and Hedley corner detector

In a manner similar to SUSAN, this detector directly tests whether a patch under a pixel is self similar by examining nearby pixels. \vec{c} is the pixel to be considered, and \vec{p} \in P is point on a circle P centered around \vec{c}. The point \vec{p'} is the point opposite to \vec{p} along the diameter.

The response function is defined as:

r(\vec{c}) = \min_{\vec{p} \in P}\quad (I(\vec{p}) - I(\vec{c}))^2 + (I(\vec{p'}) - I(\vec{c})) ^2

This will be large when there is no direction in which the centre pixel is similar to two nearby pixels along a diameter. P is a descretized circle (a Bresenham circle), so interpolation is used for intermediate diameters to give a more isotropic response. Since any computation gives an upper bound on \lim, the horizontal and vertical directions are checked to see if it is worth proceeding with the complete computation of c.

[edit] The FAST feature detector

FAST is an acronym standing for Features from Accelerated Segment Test.

The feature detector considers pixels in a Bresenham circle of radius r around the candidate point. If n contiguous pixels are all brighter than the nucleus by at least t or all darker than the nucleus by t, then the pixel under the nucleus is considered to be a feature. Although r can in principle take any value, only a value of 3 is used (corresponding to a circle of 16 pixels circumference), and tests show that the best value of n is 9. This value of n is the lowest one at which edges are not detected. The resulting detector is reported to produce very stable features. Aditionally, the ID3 algorithm is used to optimize the order in which pixels are tested, resulting in the most computationally efficient feature detector available.

Confusingly, the name of the detector is somewhat similar to the name of the paper describing Trajkovic and Hedley's detector.

[edit] Affine-adapted interest point operators

The interest points obtained from the multi-scale Harris operator with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain an interest point operator that is more robust to perspective transformations, a natural approach is to devise a feature detector that is invariant to affine transformations. In practice, affine invariant interest points can be obtained by applying affine shape adaptation where the shape of the smoothing kernel is iteratively warped to match the local image structure around the interest point or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationallt symmetric (Lindeberg and Garding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004). Hence, besides the commonly used multi-scale Harris operator, affine shape adaptation can be applied to other corner detectors as listed in this article as well as to differential blob detectors such as the Laplacian/Difference of Gaussian operator, the determinant of the Hessian and the Hessian-Laplace operator.

[edit] References

[edit] Reference Implementations

This section provides external links to reference implementations of some of the detectors described above. These reference implementations are provided by the authors of the paper in which the detector is first described. These may contain details not present or explicit in the papers describing the features.

[edit] See also

In other languages