Connected Component Labeling
From Wikipedia, the free encyclopedia
Connected component labeling (alternatively connected component analysis) is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected component labeling is not to be confused with segmentation.
Connected component labeling is used in computer vision to detect unconnected regions in binary digital images, although color images and data with higher-dimensionality can also be processed.[1][2] When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information.[3][4]
Contents |
[edit] Overview
A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors. Connectivity is determined by the medium; image graphs, for example, can be 4-connected or 8-connected.[5]
Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed.
[edit] Algorithms
The algorithms discussed can be generalized to arbitrary dimensions, albeit with increased time and space complexity.
[edit] Two-pass
Relatively simple to implement and understand, the two-pass algorithm[6] iterates through 2-dimensional, binary data. The first pass uniquely labels non-background regions, and a second pass merges labels on any regions that are spatially connected.
The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure.
On the first pass:
- Iterate through each element of the data by column, then by row
- If the element is not the background
- Get the neighboring elements with the current element's label
- If there are no neighbors, uniquely label the current element and continue
- Otherwise, find the neighbors' smallest label and assign it to the current element
- Store the equivalence between neighboring labels
On the second pass:
- Iterate through each element of the data by column, then by row
- If the element is not the background
- Relabel the element with the lowest equivalent label
Here, the background is a classification, specific to the data, used to distinguish salient elements from the foreground. If the background variable is omitted, then the two-pass algorithm will treat the background as another region.
The pseudocode is as follows:
algorithm TwoPass(data) linked = [] labels = structure with dimensions of data, initialized with the value of Background First pass for row in data: for column in row: if data[row][col] is not Background neighbours = connected elements with the current element's label if neighbours is empty linked[NextLabel] = set containing NextLabel labels[row][column] = NextLabel NextLabel += 1 else Find the smallest label L = neighbours' labels labels[row][column] = min(L) for label in L linked[label] = union(linked[label], L) Second pass for row in data for column in row if labels[row][column] is not Background labels[row][col] = find(labels[row][col]) return labels
The find and union algorithms are implemented as described in union find.
[edit] Others
Some of the steps present in the two-pass algorithm can be merged for efficiency, allowing for a single sweep through the image. Multi-pass algorithms also exist, some of which run in linear time relative to the number of image pixels.[7]
In the early 1990s, there was considerable interest in parallelizing connected component algorithms in image analysis applications, due to the bottleneck of sequentially processing each pixel.[8].
[edit] See also
- Image analysis
- Computer vision
- Feature extraction
- Blob extraction
- Connected component (graph theory)
- union find
- flood fill
[edit] References
- ^ H. Samet and M. Tamminen (1988). Efficient Component Labeling of Images of Arbitrary Dimension Represented by Linear Bintrees. TIEEE Trans. Pattern Anal. Mach. Intell..
- ^ Michael B. Dillencourt and Hannan Samet and Markku Tamminen (1992). A general approach to connected-component labeling for arbitrary image representations}. J. ACM.
- ^ Weijie Chen, Maryellen L. Giger and Ulrich Bick (2006). A Fuzzy C-Means (FCM)-Based Approach for Computerized Segmentation of Breast Lesions in Dynamic Contrast-Enhanced MR Images. Academic Radiology.
- ^ Kesheng Wu, Wendy Koegler, Jacqueline Chen and Arie Shoshani (2003). Using Bitmap Index for Interactive Exploration of Large Datasets. SSDBM.
- ^ R. Fisher, S. Perkins, A. Walker and E. Wolfart (2003). Connected Component Labeling.
- ^ Shapiro, L., and Stockman, G. (2002). Computer Vision. Prentice Hall, 69-73.
- ^ Kenji Suzuki and Isao Horiba and Noboru Sugie (2003). Linear-time connected-component labeling based on sequential local operations. Comput. Vis. Image Underst..
- ^ Yujie Han and Robert A. Wagner (1990). An efficient and fast parallel-connected component algorithm. J. ACM.