Psychoacoustics

Psychoacoustics is the study of subjective human perception of sounds. Alternatively it can be described as the study of the psychological correlates of the physical parameters of acoustics.

1 Background
2 Limits of perception
3 Masking effects
4 'Phantom' fundamentals
5 Software
6 Music
7 Applied psychoacoustics
8 See also
9 References
- 9.1 Footnotes
- 9.2 Notations
10 External links

Background

Hearing is not a purely mechanical phenomenon of wave propagation, but is also a sensory and perceptual event. When a person hears something, that something arrives at the ear as a mechanical sound wave traveling through the air, but within the ear it is transformed into neural action potentials. These nerve pulses then travel to the brain where they are perceived. Hence, in many problems in acoustics, such as for audio processing, it is advantageous to take into account not just the mechanics of the environment, but also the fact that both the ear and the brain are involved in a person’s listening experience.

The inner ear, for example, does significant signal processing in converting sound waveforms into neural stimulus, so certain differences between waveforms may be imperceptible.^[1] MP3 and other audio compression techniques make use of this fact.^[2] In addition, the ear has a nonlinear response to sounds of different loudness levels. Telephone networks and audio noise reduction systems make use of this fact by nonlinearly compressing data samples before transmission, and then expanding them for playback.^[3] Another effect of the ear's nonlinear response is that sounds that are close in frequency produce phantom beat notes, or intermodulation distortion products.^[4]

There are true psychoacoustic effects introduced by the brain. For example, when a person listens to crackly and needle-on-vinyl hiss-filled records, he or she soon stops noticing the background noise, and enjoys the music. A person who does this habitually appears to forget about the noise altogether, and may not be able to tell you after listening if there was noise present. This effect is called psycho-acoustical masking. The brain’s ability to perform such masking has been important for the adoption of a number of technologies; though in this age of digital signaling and high fidelity playback the effect is typically used to hide losses in compression rather than to cover up analog white noise. As another example of a psychoacoustic effect, the brain appears to use a correlative process for pattern recognition; much like is done in electronic circuits that look for signal patterns. When the threshold for acceptance of a correlative match is very low a person may perceive hearing a sought after pattern in pure noise or among sounds that are somewhat indicative, as the brain fills in the rest of the pattern. This is a psychoacoustic phantom effect. For example when a radio operator is straining to hear a weak Morse code signal in a noisy background, he or she often perceives hearing the pitch of tiny dots and dashes even when they are not present. In general, psychoacoustic phantom effects play an important role in any environment where people have heightened perceptions, such as when danger may be perceived to be near. (There is an analogous visual effect experienced by people standing watch in very dark places.) The psychoacoustic phantom effect is conceptually distinct from hallucination, where the brain auto generates perceptions. Also, the psychoacoustic phantom effect is distinct from the physiology-acoustic phantom effect.

Limits of perception

The human ear can nominally hear sounds in the range 20 Hz to 20,000 Hz (20 kHz). This upper limit tends to decrease with age, most adults being unable to hear above 16 kHz. The ear itself does not respond to frequencies below 20 Hz, but these can be perceived via the body's sense of touch. Some recent research has also demonstrated a hypersonic effect which is that although sounds above 20 kHz cannot consciously be heard, they can have an effect on the listener.

Frequency resolution of the ear is, in the middle range, about 2 Hz. That is, changes in pitch larger than 2 Hz can be perceived. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a (low-)frequency difference pitch. This effect of phase variance upon the resultant sound is known as 'beating'.

The semitone scale used in Western musical notation is not a linear frequency scale but logarithmic. Other scales have been derived directly from experiments on human hearing perception, such as the mel scale and Bark scale (these are used in studying perception, but not usually in musical composition), and these are approximately logarithmic in frequency at the high-frequency end, but nearly linear at the low-frequency end.

The "intensity" range of audible sounds is enormous. Our ear drums are sensitive only to variations in the sound pressure, but can detect pressure changes as small as 2×10^–10 atm and as great or greater than 1 atm. For this reason, Sound Pressure Level is also measured logarithmically, with all pressures referenced to 1.97385×10^–10 atm. The lower limit of audibility is therefore defined as 0 dB, but the upper limit is not as clearly defined. While 1 atm (191 dB) is the largest pressure variation an undistorted sound wave can have in Earth's atmosphere, larger sound waves can be present in other Atmospheres, or on Earth in the form of shock waves. The upper limit is more a question of the limit where the ear will be physically harmed or with the potential to cause a hearing disability. This limit also depends on the time exposed to the sound. The ear can be exposed to short periods in excess of 120 dB without permanent harm — albeit with discomfort and possibly pain; but long term exposure to sound levels over 80 dB can cause permanent hearing loss.

A more rigorous exploration of the lower limits of audibility determines that the minimum threshold at which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent Absolute Threshold of Hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1 kHz and 5 kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2 kHz.

The ATH is the lowest of the equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB), over the range of audible frequencies, which are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher-Munson curves. Because subjective loudness was difficult to measure, the Fletcher-Munson curves were averaged over many subjects.

Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an anechoic chamber. The Robinson-Dadson curves were standardized as ISO 226 in 1986. In 2003, ISO 226 was revised as equal-loudness contour using data collected from 12 international studies.

Masking effects

Main article: Auditory masking

In some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called masking. A weaker sound is masked if it is made inaudible in the presence of a louder sound. The masking phenomenon occurs because any loud sound will distort the Absolute Threshold of Hearing, making quieter, otherwise perceptible sounds inaudible.

If two sounds occur simultaneously and one is masked by the other, this is referred to as simultaneous masking. Simultaneous masking is also sometimes called frequency masking. The tonality of a sound partially determines its ability to mask other sounds. A sinusoidal masker, for example, requires a higher intensity to mask a noise-like maskee than a loud noise-like masker does to mask a sinusoid. Computer models which calculate the masking caused by sounds must therefore classify their individual spectral peaks according to their tonality.

Similarly, a weak sound emitted soon after the end of a louder sound is masked by the louder sound. Even a weak sound just before a louder sound can be masked by the louder sound. These two effects are called forward and backward temporal masking, respectively.

'Phantom' fundamentals

Main article: Missing fundamental

Low pitches can sometimes be heard when there is no apparent source or component of that frequency. This perception is due to the brain interpreting repetition patterns determined by the differences of audible harmonics that are present.^[5] A harmonic series of pitches that are related 2×f, 3×f, 4×f, 5×f, etc, give human hearing the psychoacoustic impression that the pitch 1×f is present. This phenomenon is used by some pro audio manufacturers to allow sound systems to seem to produce notes that are lower in pitch than they are capable of reproducing.^[6]^[7]

Software

The psychoacoustic model provides for high quality lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely - that is, without significant losses in the (consciously) perceived quality of the sound.

It can explain how a sharp clap of the hands might seem painfully loud in a quiet library, but is hardly noticeable after a car backfires on a busy, urban street. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are 1/10 to 1/12 the size of high quality original masters with very little discernible loss in quality. Such compression is a feature of nearly all modern audio compression formats. Some of these formats include MP3, Ogg Vorbis, AAC, WMA, MPEG-1 Layer II (used for digital audio broadcasting in several countries) and ATRAC, the compression used in MiniDisc and Walkman.

Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:

High frequency limit
Absolute threshold of hearing
Temporal masking
Simultaneous masking

Given that the ear will not be at peak perceptive capacity when dealing with these limitations, a compression algorithm can assign a lower priority to sounds outside the range of human hearing. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener is most likely to perceive are of the highest quality.

Music

Psychoacoustics include topics and studies which are relevant to music psychology. Theorists such as Benjamin Boretz consider some of the results of psychoacoustics to be meaningful only in a musical context.

Applied psychoacoustics

Psychoacoustics is presently applied within many fields from software development, where developers map proven and experimental mathematical patterns; in digital signal processing, where many audio compression codecs such as MP3 use a psychoacoustic model to increase compression ratios; in the design of (high end) audio systems for accurate reproduction of music in theatres and homes; as well as defense systems where scientists have experimented with limited success in creating new acoustic weapons, which emit frequencies that may impair, harm, or kill (see [1]). It is also applied today within music, where musicians and artists continue to create new auditory experiences by masking unwanted frequencies of instruments, causing other frequencies to be enhanced. Yet another application is in design of small or lower-quality loudspeakers, which use the phenomenon of missing fundamentals to give the effect of low frequency bass notes that the system, due to frequency limitations, cannot actually reproduce (see references).

References

Footnotes

↑ Christopher J. Plack (2005). The Sense of Hearing. Routledge. ISBN 0805848843. http://books.google.com/books?id=DoGzm3soUoMC&pg=PA65&dq=ear+hearing+cochlea++inauthor:plack&lr=&as_brr=3&ei=z0emSN2LJo3sswO7g-2dBQ&sig=ACfU3U1lfPTX-igjhSgGUD6eObrQlcqL7g.
↑ Lars Ahlzen, Clarence Song (2003). The Sound Blaster Live! Book. No Starch Press. ISBN 1886411735. http://books.google.com/books?id=tKO-truWww8C&pg=PA310&dq=mp3++imperceptible+ear&lr=&as_brr=3&ei=gUimSMP9D5fUtAP0yp2eBQ&sig=ACfU3U3eupVEYqdtBT-_7tLrD-572cA7HQ.
↑ Rudolf F. Graf (1999). Modern dictionary of electronics. Newnes. ISBN 0750698667. http://books.google.com/books?id=o2I1JWPpdusC&pg=PA137&dq=compression+expansion+noise-reduction+telephone&lr=&as_brr=3&ei=p0mmSMb5Joa2tgOvzqGeBQ&sig=ACfU3U3vnf20ljMFnFneQlWnYGk8SuxwGQ.
↑ Jack Katz, Robert F. Burkard, and Larry Medwetsky (2002). Handbook of Clinical Audiology. Lippincott Williams & Wilkins. ISBN 0683307657. http://books.google.com/books?id=Aj6nVIegE6AC&pg=PA43&dq=beat+distortion++ear&lr=&as_brr=3&ei=8EumSM3oIIOEswP0-IieBQ&sig=ACfU3U3m4oRu5h6MU3zsvfeZjzabodf_8g.
↑ Colin Yallop and Janet Fletcher (2007). An Introduction to Phonetics and Phonology. Blackwell Publishing. ISBN 1405130830. http://books.google.com/books?id=dX5P5mxtYYIC&pg=PA233&dq=phantom-fundamental+pitch+perception&lr=&as_brr=0&ei=ESCaR_m9DIfgswPHlMx9&sig=tbYP69o6YD3EPOqE-SOynLfMdhg.
↑ Waves Car Audio. MaxxBass Bass Enhancement Technology
↑ US patent Method and system for enhancing quality of sound signal 5930373

Notations

E. Larsen and R.M. Aarts (2004), Audio Bandwidth extension. Application of Psychoacoustics, Signal Processing and Loudspeaker Design., J. Wiley.
E. Larsen and R.M. Aarts (2002), Reproducing low-pitched signals through small loudspeakers, J. Audio Eng. Soc., March, 50 (3), pp. 147-164.
T. Oohashi, N. Kawai, E. Nishina, M. Honda, R. Yagi, S. Nakamura, M. Morimoto, T. Maekawa, Y. Yonekura, and H. Shibasaki. ^[1]http://dx.doi.org/10.1016/j.brainres.2005.12.096. Brain Research, 1073:339–347, February 2006.

External links

Data compression methods

Lossless

Theory	Entropy · Complexity · Redundancy

Entropy encoding	Huffman · Adaptive Huffman · Arithmetic (Shannon-Fano · Range) · Golomb · Exp-Golomb · Universal (Elias · Fibonacci)

Dictionary	RLE · DEFLATE · LZ Family (LZ77/78 · LZSS · LZW · LZWL · LZO · LZMA · LZX · LZJB · LZT)

Others	CTW · BWT · PPM · DMC

Audio

Theory	Convolution · Sampling · Nyquist–Shannon theorem

Audio codec parts	LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · μ-law · MDCT · Fourier transform · Psychoacoustic model

Others	Dynamic range compression · Speech compression · Sub-band coding

Image

Terms	Color space · Pixel · Chroma subsampling · Compression artifact

Methods	RLE · Fractal · Wavelet · EZW · SPIHT · DCT · KLT

Others	Bit rate · Test images · PSNR quality measure · Quantization

Video

Terms	Video Characteristics · Frame · Frame types · Video quality

Video codec parts	Motion compensation · DCT · Quantization

Others	Video codecs · Rate distortion theory (CBR · ABR · VBR)

Timeline of information theory, data compression, and error-correcting codes

See Compression Formats and Standards for formats and Compression Software Implementations for codecs