Mel frequency cepstral coefficient

From Wikipedia, the free encyclopedia

Mel Frequency Cepstral Coefficients (MFCCs) are coefficients that represent audio. They are derived from a type of cepstral representation of the audio clip. The basic difference between the cepstrum and the MFCC is that in the MFCC, the frequency bands are positioned logarithmically (on the mel scale) which approximates the human auditory system's response more closely than the linearly-spaced frequency bands obtained directly from the FFT or DCT. This can allow for better processing of data, for example, in audio compression.

MFCCs are commonly derived as follows:

Take the Fourier transform of (a windowed excerpt of) a signal
Map the amplitudes of the spectrum obtained above onto the Mel scale, using triangular overlapping windows.
Take the Discrete Cosine Transform of the list of Mel amplitudes, as if it were a signal.
The MFCCs are the ampltiudes of the resulting spectrum.

There can be variations on this process - e.g. differences in the Mel scale conversion.

[edit] Applications

MFCCs are often used in speech recognition systems, such as the systems which can automatically recognise numbers spoken into a telephone.

[edit] References

Fang Zheng, Guoliang Zhang and Zhanjiang Song, "Comparison of Different Implementations of MFCC", J. Computer Science & Technology, 16(6): 582-589, Sept. 2001.