Psychoacoustics
From Wikipedia, the free encyclopedia
Psychoacoustics is the study of subjective human perception of sounds. Alternatively it can be described as the study of psychology of acoustical perception.
Contents |
[edit] Background
In many applications of acoustics and audio signal processing it is necessary to know what humans actually hear. Sound, which consists of air pressure waves, can be accurately measured with sophisticated equipment. However, understanding how these waves are received and mapped into thoughts in the human brain is not trivial.
Recognizing features important to perception enables scientists and engineers to concentrate on audible features and ignore less important features of the involved system. It is important to note that the question of what humans hear is not only a physiological question of features of the ear but very much also a psychological issue.
[edit] Limits of perception
The human ear can nominally hear sounds in the range 20 Hz to 20,000 Hz (20 kHz). This upper limit tends to decrease with age, most adults being unable to hear above 16 kHz. The ear itself does not respond to frequencies below 20 Hz, but these can be perceived via the body's sense of touch.
Frequency resolution of the ear is, in the middle range, about 2 Hz. That is, changes in pitch larger than 2 Hz can be perceived. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a (low-)frequency difference pitch. This effect of phase variance upon the resultant sound is known as 'beating'.
However, the effect of frequency on the human ear has a logarithmic basis. In other words, the perceived pitch of a sound is related to the frequency as an exponential function. The 12-tone musical scale is an example of this; it evolved due to the way tones are perceived. When the fundamental frequency of a note or tone is multiplied by approximately (this factor is true in the average, but varies slightly depending on the tuning), the result is the frequency of the next higher semitone. Going 12 notes higher — an octave — is the same as multiplying the frequency by , which is the same as doubling the frequency.
The impact of this is that the raw frequency resolution of the ear is best judged in terms of semitones, or in 'cents' which is 1/100 of a semitone.
The "intensity" range of audible sounds is enormous. Our ear drums are sensitive only to the sound pressure variation. The lower limit of audibility is defined to 0 dB, but the upper limit is not as clearly defined. The upper limit is more a question of the limit where the ear will be physically harmed or with the potential to cause a hearing disability. This limit depends also on the time exposed to the sound. The ear can be exposed to short periods in excess of 120 dB without permanent harm, but long term exposure to sound levels over 80 dB can cause permanent hearing loss.
A more rigorous exploration of the lower limits of audibility determines that the minimum threshold at which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent Absolute Threshold of Hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1kHz and 5kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2kHz.
The ATH is the lowest of the equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB), over the range of audible frequencies, which are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher-Munson curves. Because subjective loudness was difficult to measure, the Fletcher-Munson curves were averaged over many subjects.
Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an anechoic chamber. The Robinson-Dadson curves were standardized as ISO 226 in 1986. In 2003, ISO 226 was revised as equal-loudness contour using data collected from 12 international studies.
[edit] Interpretation of sound
Human hearing is basically like a spectrum analyzer, that is, the ear resolves the spectral content of the pressure wave without respect to the phase of the signal. In practice, though, some phase information can be perceived. Inter-aural phase difference, that is the difference in sound between the ears, is a notable exception by providing a significant part of the directional sensation of sound. The filtering effects of head-related transfer functions provide another important directional cue.
[edit] Masking effects
In some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called masking. A weaker sound is masked if it is made inaudible in the presence of a louder sound. The masking phenomenon occurs because any loud sound will distort the Absolute Threshold of Hearing, making quieter, otherwise perceptable sounds inaudible.
If two sounds occur simultaneously and one is masked by the other, this is referred to as simultaneous masking. Simultaneous masking is also sometimes called frequency masking. The tonality of a sound partially determines its ability to mask other sounds. A sinusoidal masker, for example, requires a higher intensity to mask a noise-like maskee than a loud noise-like masker does to mask a sinusoid. Computer models which calculate the masking caused by sounds must therefore classify their individual spectral peaks according to their tonality.
Similarly, a weak sound emitted soon after the end of a louder sound is masked by the louder sound. Even a weak sound just before a louder sound can be masked by the louder sound. These two effects are called forward and backward temporal masking, respectively.
[edit] 'Phantom' fundamentals
At the lower end of the ears' response, low notes can sometimes be heard when there is no sound at that frequency. This is due to the brain synthesising the low frequency sound from the differences of audible harmonics that are present. This effect is used in some commercial sound systems to give the effect of extended low frequency response when the system itself cannot reproduce that frequency adequately.
[edit] Psychoacoustics in software
The psychoacoustic model provides for high quality lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely - that is, without significant losses in the (perceived) quality of the sound.
It can explain how a sharp clap of the hands might seem painfully loud in a quiet library, but is hardly noticeable after a car backfires on a busy, urban street. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are 1/10 to 1/12 the size of high quality original masters with very little discernible loss in quality. Such compression is a feature of nearly all modern audio compression formats. Some of these formats include MP3, Ogg Vorbis, WMA, Musicam (used for digital audio broadcasting in several countries) and ATRAC, the compression used in MiniDisc.
Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:
- High frequency limit
- Absolute threshold of hearing
- Absolute threshold of pain
- Temporal masking
- Simultaneous masking
Given that the ear will not be at peak perceptive capacity when dealing with these limitations, a compression algorithm can assign a lower priority to sounds outside the range of human hearing. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener can hear most clearly are of the highest quality.
[edit] Psychoacoustics and music
Psychoacoustics includes many subjects and produces discoveries which are relevant to music and its composition and performance, and some musicians, such as Benjamin Boretz, consider the results or some of the results of psychoacoustics to be meaningful only in a musical context.
[edit] Applied psychoacoustics
Psychoacoustics is presently applied within many fields, from software development where developers map proven and experimental mathematical patterns; through defense where scientists have the capability to create new acoustic weapons, some of which emit frequencies that may impair, harm or kill (with very limited success[1]). It is also applied today within music, where musicians and artists continue to create new sonic sensory breaking traditional perception of sonic reality. Yet another application is to give listeners to small loudspeakers the impression that they hear low notes, see references.
[edit] See also
- Bark scale, Equivalent rectangular bandwidth (ERB), Mel scale and other scales
- Loudness, that is, perceived volume, Bel, sone
- Perception of non-existent sounds, such as missing fundamental frequency and other auditory illusions. Compare to telephone which transmits 300 Hz to 3400 Hz
- Auditory Scene Analysis incl. 3D-sound perception, localisation
- equal-loudness contour
- auditory illusions
- audio compression
- noise health effects
- speech recognition
- sound localization
- source separation
- musical tuning
- timbre
- rate-distortion theory
- Haas effect
- Sound Masking
[edit] References
- E. Larsen and R.M. Aarts (2004), Audio Bandwidth extension. Application of Psychoacoustics, Signal Processing and Loudspeaker Design., J. Wiley.
- E. Larsen and R.M. Aarts (2002), Reproducing low-pitched signals through small loudspeakers, J. Audio Eng. Soc., March, 50 (3), pp. 147-164.
[edit] External links
- The musical ear - Perception of sound
- Applied psychoacoustics project - Perception of whole-body and hand-arm-vibration
- Applied psychoacoustics in space flight - Simulation of free field hearing by head phones
- GPSYCHO - an open source psycho-acoustic and noise shaping model for ISO based MP3 encoders.
- How audio codecs work - Psycoacoustics