Pitch detection algorithm

From Wikipedia, the free encyclopedia

A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or virtually periodic signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain or the frequency domain.

1 Time-domain approaches
2 Frequency-domain approaches
- 2.1 Partial-domain approaches
3 References
4 See also

[edit] Time-domain approaches

In the time domain, a PDA typically estimates the period of the quasiperiodic signal, then inverts that value to give the frequency.

One simple approach would be to measure the distance between zero crossing points of the signal (i.e. the Zero Crossing Rate). However, this does not work well with complex waveforms which are composed of multiple sine waves with differing periods. Nevertheless, there are cases in which zero-crossing can be a useful measure, for example in some speech applications where a single source is assumed. The algorithm's simplicity makes it "cheap" to implement.

More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF (average magnitude difference function), ASDF (Average Squared Difference Function), or the similar autocorrelation work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems (often "octave errors"), can sometimes cope badly with noisy signals (depending on the implementation) and - in their basic implementations - do not deal with polyphonic sounds (which involve multiple musical notes of different pitches).

Current time-domain pitch detector algorithms tend to build upon the basic methods referred to above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm and the MPM algorithm are both based upon autocorrelation.

[edit] Frequency-domain approaches

In the frequency domain, polyphonic detection is possible, usually utilizing the Fast Fourier Transform (FFT) to convert the signal to a frequency spectrum; however, this requires more processing power as the desired accuracy increases.

Popular frequency domain algorithms include HPS (harmonic product spectrum algorithm), cepstral analysis and maximum likelihood which attempts to match the frequency domain characteristics to pre-defined frequency maps (useful for detecting pitch of fixed tuning instruments).

[edit] Partial-domain approaches

Whereas frequency-domain algorithms treat the spectrum as a discrete signal, partial-domain approaches only consider the sinusoidal components of a sound in order to determine its fundamental frequency. The sinusoidal components are often detected in the STFT spectrum and precisely estimated with techniques such as spectral reassignment (phase based) or Grandke interpolation (magnitude based).

A state-of-the-art algorithm in this category may be found in Mitre et al, 2006.

[edit] References

Mitre, Adriano; Queiroz, Marcelo; Faria, Régis. Accurate and Efficient Fundamental Frequency Determination from Precise Partial Estimates. Proceedings of the 4th AES Brazil Conference. 113-118, 2006. [1]