Segmentation-based object categorization

The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.

Applications of image segmentation

Segmentation using normalized cuts

Graph theoretic formulation

The set of points in an arbitrary feature space can be represented as a weighted undirected complete graph G = (V, E), where the nodes of the graph are the points in the feature space. The weight of an edge is a function of the similarity between the nodes and . In this context, we can formulate the image segmentation problem as a graph partitioning problem that asks for a partition of the vertex set , where, according to some measure, the vertices in any set have high similarity, and the vertices in two different sets have low similarity.

Normalized cuts

Let G = (V, E, w) be a weighted graph. Let and be two subsets of vertices.

Let:

In the normalized cuts approach,[1] for any cut in , measures the similarity between different parts, and measures the total similarity of vertices in the same part.

Since , a cut that minimizes also maximizes .

Computing a cut that minimizes is an NP-hard problem. However, we can find in polynomial time a cut of small normalized weight using spectral techniques.

The ncut algorithm

Let:

Also, let D be an diagonal matrix with on the diagonal, and let be an symmetric matrix with .

After some algebraic manipulations, we get:

subject to the constraints:

Minimizing subject to the constraints above is NP-hard. To make the problem tractable, we relax the constraints on , and allow it to take real values. The relaxed problem can be solved by solving the generalized eigenvalue problem for the second smallest generalized eigenvalue.

The partitioning algorithm:

  1. Given a set of features, set up a weighted graph , compute the weight of each edge, and summarize the information in and .
  2. Solve for eigenvectors with the second smallest eigenvalues.
  3. Use the eigenvector with the second smallest eigenvalue to bipartition the graph (e.g. grouping according to sign).
  4. Decide if the current partition should be subdivided.
  5. Recursively partition the segmented parts, if necessary.

Computational Complexity

Solving a standard eigenvalue problem for all eigenvectors (using the QR algorithm, for instance) takes time. This is impractical for image segmentation applications where is the number of pixels in the image.

Since only one eigenvector, corresponding to the second smallest generalized eigenvalue, is used by the ncut algorithm, efficiency can be dramatically improved if the solve of the corresponding eigenvalue problem is performed in a matrix-free fashion, i.e., without explicitly manipulating with or even computing the matrix W, as, e.g., in the Lanczos algorithm. Matrix-free methods require only a function that performs a matrix-vector product for a given vector, on every iteration. For image segmentation, the matrix W is typically sparse, with a number of nonzero entries , so such a matrix-vector product takes time.

For high-resolution images, the second eigenvalue is often ill-conditioned, leading to slow convergence of iterative eigenvalue solvers, such as the Lanczos algorithm. Preconditioning is a key technology accelerating the convergence, e.g., in the matrix-free LOBPCG method. Computing the eigenvector using an optimally preconditioned matrix-free method takes time, which is the optimal complexity, since the eigenvector has components.

OBJ CUT

OBJ CUT[2] is an efficient method that automatically segments an object. The OBJ CUT method is a generic method, and therefore it is applicable to any object category model. Given an image D containing an instance of a known object category, e.g. cows, the OBJ CUT algorithm computes a segmentation of the object, that is, it infers a set of labels m.

Let m be a set of binary labels, and let be a shape parameter( is a shape prior on the labels from a layered pictorial structure (LPS) model). An energy function is defined as follows.

(1)

The term is called a unary term, and the term is called a pairwise term. A unary term consists of the likelihood based on color, and the unary potential based on the distance from . A pairwise term consists of a prior and a contrast term .

The best labeling minimizes , where is the weight of the parameter .

(2)

Algorithm

  1. Given an image D, an object category is chosen, e.g. cows or horses.
  2. The corresponding LPS model is matched to D to obtain the samples
  3. The objective function given by equation (2) is determined by computing and using
  4. The objective function is minimized using a single MINCUT operation to obtain the segmentation m.

Other approaches

References

  1. Jianbo Shi and Jitendra Malik (1997): "Normalized Cuts and Image Segmentation", IEEE Conference on Computer Vision and Pattern Recognition, pp 731–737
  2. M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pages 18–25, 2005.
  3. E. Borenstein, S. Ullman: Class-specic, top-down segmentation. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 109–124, 2002.
  4. Z. Tu, X. Chen, A. L. Yuille, S. C. Zhu: Image Parsing: Unifying Segmentation, Detection, and Recognition. Toward Category-Level Object Recognition 2006: 545–576
  5. B. Leibe, A. Leonardis, B. Schiele: An Implicit Shape Model for Combined Object Categorization and Segmentation. Toward Category-Level Object Recognition 2006: 508–524
  6. J. Winn, N. Joijic. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005.
  7. J. M. Winn, J. Shotton: The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR (1) 2006: 37–44
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.