Feature (machine learning)

From Wikipedia, the free encyclopedia

In machine learning and pattern recognition, a feature is an individual measurable heuristic property of a phenomenon being observed. Choosing discriminating and independent features is key to any pattern recognition algorithm being successful in classification. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition.

The set of features of a given data instance is often grouped into a feature vector. The reason for doing this is that the vector can be treated mathematically. For example, many algorithms compute a score for classifying an instance into a particular category by linearly combining a feature vector with a vector of weights, using a linear predictor function.

The concept of "feature" is essentially the same as the concept of explanatory variable used in statistical techniques such as linear regression.

Classification

While different areas of pattern recognition obviously have different features, once the features are decided, they are classified by a much smaller set of algorithms. These include nearest neighbor classification in multiple dimensions, neural networks or statistical techniques such as Bayesian approaches.

Examples

In character recognition, features may include horizontal and vertical profiles, number of internal holes, stroke detection and many others.

In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others.

In spam detection algorithms, features may include whether certain email headers are present or absent, whether they are well formed, what language the email appears to be, the grammatical correctness of the text, Markovian frequency analysis and many others.

In all these cases, and many others, extracting features that are measurable by a computer is an art, and with the exception of some neural networking and genetic techniques that automatically intuit "features", hand selection of good features forms the basis of almost all classification algorithms.

References

    See also

    This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.