Features (pattern recognition)

From Wikipedia, the free encyclopedia

In pattern recognition, features are the individual measurable heuristic properties of the phenomena being observed. Choosing discriminating and independent features is key to any pattern recognition algorithm being successful in classification.

Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition.

While different areas of pattern recognition obviously have different features, once the features are decided, they are classified by a much smaller set of algorithms. These include nearest neighbor classification in multiple dimensions, neural networks or statistical techniques such as Bayesian approaches.

[edit] Examples

In character recognition, features may include horizontal and vertical profiles, number of internal holes, stroke detection and many others.

In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others.

In spam detection algorithms, features may include whether certain email headers are present or absent, whether they are well formed, what language the email appears to be, the grammatical correctness of the text, markovian frequency analysis and many others.

In all these cases, and many others, extracting features that are measurable by a computer is an art, and with the exception of some neural networking and genetic techniques that automatically intuit "features", hand selection of good features forms the basis of almost all classification algorithms.