Harmonic pitch class profiles

Harmonic pitch class profiles (HPCP) is a group of features that a computer program extracts from an audio signal, based on a pitch class profile—a descriptor proposed in the context of a chord recognition system.[1] HPCP are an enhanced pitch distribution feature that are sequences of feature vectors that, to a certain extent, describe tonality, measuring the relative intensity of each of the 12 pitch classes of the equal-tempered scale within an analysis frame. Often, the twelve pitch spelling attributes are also referred to as chroma and the HPCP features are closely related to what is called chroma features or chromagrams.

By processing musical signals, software can identify HPCP features and use them to estimate the key of a piece,[2] to measure similarity between two musical pieces (cover version identification),[3] to perform content-based audio retrieval (audio matching),[4] to extract the musical structure (audio structure analysis),[5] and to classify music in terms of composer, genre or mood. The process is related to time-frequency analysis. In general, chroma features are robust to noise (e.g., ambient noise or percussive sounds), independent of timbre and instrumentation and independent of loudness and dynamics.

HPCPs are tuning independent and consider the presence of harmonic frequencies, so that the reference frequency can be different from the standard A 440 Hz. The result of HPCP computation is a 12, 24, or 36-bin octave-independent histogram depending on the desired resolution, representing the relative intensity of each 1, 1/2, or 1/3 of the 12 semitones of the equal tempered scale.

General HPCP feature extraction procedure

Fig.1 General HPCP feature extraction block diagram

The block diagram of the procedure is shown in Fig.1[3] and is further detailed in.[6]

The General HPCP feature extraction procedure is summarized as follows:

  1. Input musical signal.
  2. Do spectral analysis to obtain the frequency components of the music signal.
  3. Use Fourier transform to convert the signal into a spectrogram. (The Fourier transform is a type of time-frequency analysis.)
  4. Do frequency filtering. A frequency range of between 100 and 5000 Hz is used.
  5. Do peak detection. Only the local maximum values of the spectrum are considered.
  6. Do reference frequency computation procedure. Estimate the deviation with respect to 440 Hz.
  7. Do Pitch class mapping with respect to the estimated reference frequency. This is a procedure for determining the pitch class value from frequency values. A weighting scheme with cosine function is used. It considers the presence of harmonic frequencies (harmonic summation procedure), taking account a total of 8 harmonics for each frequency. To map the value on a one-third of a semitone, the size of the pitch class distribution vectors must be equal to 36.
  8. Normalize the feature frame by frame dividing through the maximum value to eliminate dependency on global loudness. And then we can get a result HPCP sequence like Fig.2.
Fig.2 Example of a high-resolution HPCP sequence

System of measuring similarity between two songs

Fig.3 System of measuring similarity between two songs

After getting the HPCP feature, the pitch of the signal in a time section is known. The HPCP feature has been used to compute similarity between two songs in many research papers. A system of measuring similarity between two songs is shown in Fig.3. First, time-frequency analysis is needed to extract the HPCP feature. And then set two songs' HPCP feature to a global HPCP, so there is a standard of comparing. The next step is to use the two features to construct a binary similarity matrix. Smith–Waterman algorithm is used to construct a local alignment matrix H in the Dynamic Programming Local Alignment. Finally, after doing post processing, the distance between two songs can be computed.

See also

References

  1. Fujishima, T. Realtime chord recognition of musical sound: a system using Common Lisp Music, ICMC, Beijing, China, 1999, pp. 464–467.
  2. Gomez, E. Herrera, P. (2004). Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies. ISMIR 2004 – 5th International Conference on Music Information Retrieval.
  3. 1 2 Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification August, 2008
  4. Müller, Meinard; Kurth, Frank; Clausen, Michael (2005). "Audio Matching via Chroma-Based Statistical Features" (PDF). Proceedings of the International Conference on Music Information Retrieval: 288–295.
  5. Paulus, Jouni; Müller, Meinard; Klapuri, Anssi (2010). "Audio-based Music Structure Analysis" (PDF). Proceedings of the International Conference on Music Information Retrieval: 625–636.
  6. Gomez, E. Tonal description of polyphonic audio for music content processing. INFORMS Journal on Computing. Special Cluster on Music Computing. Chew, E., Guest Editor, 2004.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.