Pedestrian detection is an essential and significant task in any intelligent video surveillance system, as it provides the fundamental information for semantic understanding of the video footages.
Contents |
Despite the challenges, pedestrian detection still remains an active research area in computer vision in recent years. Numerous approaches have been proposed.
Detectors are trained to search for pedestrians in the video frame by scanning the whole frame. The detector would “fire” if the image features inside the local search window meet certain criteria. Some methods employ global features such as edge template [1], others uses local features like Histogram of oriented gradients [2] descriptors. The drawback of this approach is that, the performance can be easily affected by background clutter and occlusions.
Pedestrians are modeled as collections of parts. Part hypotheses are firstly generated by learning local features, which includes edgelet features[3] the orientation features [4], and etc. These part hypotheses are then joined to form the best assembly of existing pedestrian hypotheses. Though this approach is attractive, part detection itself is a difficult task.
Recently Leibe et al.[5] proposed an approach combining both the detection and segmentation with the name Implicit Shape Model (ISM). A codebook of local appearance is learned during the training process. In the detecting process, extracted local features are used to match against the codebook entries, and each match casts one vote for the pedestrian hypotheses. Final detection results can be obtained by further refining those hypotheses. The advantage of this approach is only a small number of training images are required.
Fleuret et al.[6] suggested a method for integrating multiple calibrated cameras for detecting multiple pedestrians. In this approach, The ground plane is partitioned into uniform, non- overlapping grid cells, typically with size of 25 by 25 (cm). The detector produces a Probability Occupancy Map (POM), it provides an estimation of the probability of each grid cell to be occupied by a person.