Sound localization
Sound localization refers to a listener's ability to identify the location or origin of a detected sound in direction and distance. It may also refer to the methods in acoustical engineering to simulate the placement of an auditory cue in a virtual 3D space (see binaural recording, wave field synthesis).
The sound localization mechanisms of the mammalian auditory system have been extensively studied. The auditory system uses several cues for sound source localization, including time- and level-differences between both ears, spectral information, timing analysis, correlation analysis, and pattern matching.
These cues are also used by other animals, but there may be differences in usage, and there are also localization cues which are absent in the human auditory system, such as the effects of ear movements. Animals with the ability to localize sound have a clear evolutionary advantage.
How sound reaches the brain
Sound is the perceptual result of mechanical vibrations traveling through a medium such as air or water. Through the mechanisms of compression and rarefaction, sound waves travel through the air, bounce off the pinna and concha of the exterior ear, and enter the ear canal. The sound waves vibrate the tympanic membrane (ear drum), causing the three bones of the middle ear to vibrate, which then sends the energy through the oval window and into the cochlea where it is changed into a chemical signal by hair cells in the organ of corti, which synapse onto spiral ganglion fibers that travel through the cochlear nerve into the brain.
Neural interactions
In vertebrates, inter-aural time differences are known to be calculated in the superior olivary nucleus of the brainstem. According to Jeffress,[1] this calculation relies on delay lines: neurons in the superior olive which accept innervation from each ear with different connecting axon lengths. Some cells are more directly connected to one ear than the other, thus they are specific for a particular inter-aural time difference. This theory is equivalent to the mathematical procedure of cross-correlation. However, because Jeffress' theory is unable to account for the precedence effect, in which only the first of multiple identical sounds is used to determine the sounds' location (thus avoiding confusion caused by echoes), it cannot be entirely used to explain the response. Furthermore, a number of recent physiological observations made in the midbrain and brainstem of small mammals have shed considerable doubt on the validity of Jeffress' original ideas [2]
Neurons sensitive to inter-aural level differences (ILDs) are excited by stimulation of one ear and inhibited by stimulation of the other ear, such that the response magnitude of the cell depends on the relative strengths of the two inputs, which in turn, depends on the sound intensities at the ears.
In the auditory midbrain nucleus, the inferior colliculus (IC), many ILD sensitive neurons have response functions that decline steeply from maximum to zero spikes as a function of ILD. However, there are also many neurons with much more shallow response functions that do not decline to zero spikes.
The cone of confusion
Most mammals are adept at resolving the location of a sound source using interaural time differences and interaural level differences. However, no such time or level differences exist for sounds originating along the circumference of circular conical slices, where the cone's axis lies along the line between the two ears.
Consequently, sound waves originating at any point along a given circumference slant height will have ambiguous perceptual coordinates. That is to say, the listener will be incapable of determining whether the sound originated from the back, front, top, bottom or anywhere else along the circumference at the base of a cone at any given distance from the ear. Of course, the importance of these ambiguities are vanishingly small for sound sources very close to or very far away from the subject, but it is these intermediate distances that are most important in terms of fitness.
These ambiguities can be removed by tilting the head, which can introduce a shift in both the amplitude and phase of sound waves arriving at each ear. This translates the vertical orientation of the interaural axis horizontally, thereby leveraging the mechanism of localization on the horizontal plane. Moreover, even with no alternation in the angle of the interaural axis (i.e. without tilting one's head) the hearing system can capitalize on interference patterns generated by pinnae, the torso, and even the temporary re-purposing of a hand as extension of the pinna (e.g., cupping one's hand around the ear).
As with other sensory stimuli, perceptual disambiguation is also accomplished through integration of multiple sensory inputs, especially visual cues. Having localized a sound within the circumference of a circle at some perceived distance, visual cues serve to fix the location of the sound. Moreover, prior knowledge of the location of the sound generating agent will assist in resolving its current location.
Sound localization by the human auditory system
Sound localization is the process of determining the location of a sound source. The brain utilizes subtle differences in intensity, spectral, and timing cues to allow us to localize sound sources.[3][4] Localization can be described in terms of three-dimensional position: the azimuth or horizontal angle, the elevation or vertical angle, and the distance (for static sounds) or velocity (for moving sounds).[5] The azimuth of a sound is signaled by the difference in arrival times between the ears, by the relative amplitude of high-frequency sounds (the shadow effect), and by the asymmetrical spectral reflections from various parts of our bodies, including torso, shoulders, and pinnae.[5] The distance cues are the loss of amplitude, the loss of high frequencies, and the ratio of the direct signal to the reverberated signal.[5] Depending on where the source is located, our head acts as a barrier to change the timbre, intensity, and spectral qualities of the sound, helping the brain orient where the sound emanated from.[4] These minute differences between the two ears are known as interaural cues.[4] Lower frequencies, with longer wavelengths, diffract the sound around the head forcing the brain to focus only on the phasing cues from the source.[4] Helmut Haas discovered that we can discern the sound source despite additional reflections at 10 decibels louder than the original wave front, using the earliest arriving wave front.[4] This principle is known as the Haas effect, a specific version of the precedence effect.[4] Haas measured down to even a 1 millisecond difference in timing between the original sound and reflected sound increased the spaciousness, allowing the brain to discern the true location of the original sound. The nervous system combines all early reflections into a single perceptual whole allowing the brain to process multiple different sounds at once.[6] The nervous system will combine reflections that are within about 35 milliseconds of each other and that have a similar intensity.[6]
Lateral information (left, ahead, right)
For determining the lateral input direction (left, front, right) the auditory system analyzes the following ear signal information:
- Interaural time differences (ITD)
Sound from the right side reaches the right ear earlier than the left ear. The auditory system evaluates interaural time differences from- Phase delays at low frequencies
- group delays at high frequencies
- Interaural level differences (ILD)
Sound from the right side has a higher level at the right ear than at the left ear, because the head shadows the left ear. These level differences are highly frequency dependent and they increase with increasing frequency.
For frequencies below 800 Hz, mainly interaural time differences are evaluated (phase delays), for frequencies above 1600 Hz mainly interaural level differences are evaluated. Between 800 Hz and 1600 Hz there is a transition zone, where both mechanisms play a role.
Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.[7][8]
Evaluation for low frequencies
For frequencies below 800 Hz, the dimensions of the head (ear distance 21.5 cm, corresponding to an interaural time delay of 625 µs), are smaller than the half wavelength of the sound waves. So the auditory system can determine phase delays between both ears without confusion. Interaural level differences are very low in this frequency range, especially below about 200 Hz, so a precise evaluation of the input direction is nearly impossible on the basis of level differences alone. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound's lateral source, because the phase difference between the ears becomes too small for a directional evaluation.
Evaluation for high frequencies
For frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves. An unambiguous determination of the input direction based on interaural phase alone is not possible at these frequencies. However, the interaural level differences become larger, and these level differences are evaluated by the auditory system. Also, group delays between the ears can be evaluated, and is more pronounced at higher frequencies; that is, if there is a sound onset, the delay of this onset between the ears can be used to determine the input direction of the corresponding sound source. This mechanism becomes especially important in reverberant environments. After a sound onset there is a short time frame where the direct sound reaches the ears, but not yet the reflected sound. The auditory system uses this short time frame for evaluating the sound source direction, and keeps this detected direction as long as reflections and reverberation prevent an unambiguous direction estimation.[9]
The mechanisms described above cannot be used to differentiate between a sound source ahead of the hearer or behind the hearer; therefore additional cues have to be evaluated.[10]
Sound localization in the median plane (front, above, back, below)
Monaural cues
The human outer ear, i.e. the structures of the pinna and the external ear canal, form direction-selective filters. Depending on the sound input direction in the median plane, different filter resonances become active. These resonances implant direction-specific patterns into the frequency responses of the ears, which can be evaluated by the auditory system (directional bands) for vertical sound localization. Together with other direction-selective reflections at the head, shoulders and torso, they form the outer ear transfer functions.
These patterns in the ear's frequency responses are highly individual, depending on the shape and size of the outer ear. If sound is presented through headphones, and has been recorded via another head with different-shaped outer ear surfaces, the directional patterns differ from the listener's own, and problems will appear when trying to evaluate directions in the median plane with these foreign ears. As a consequence, front–back permutations or inside-the-head-localization can appear when listening to dummy head recordings, or otherwise referred to as binaural recordings.
It has been shown that human subjects can monaurally localize high frequency sound but not low frequency sound. Binaural localization, however, was possible with lower frequencies. This is likely due to the pinna being small enough to only interact with sound waves of high frequency.[11] It seems that people can only accurately localize the elevation of sounds that are complex and include frequencies above 7,000 Hz, and a pinna must be present.[12]
Dynamic binaural cues
When the head is stationary, the binaural cues for lateral sound localization (interaural time difference and interaural level difference) do not give information about the location of a sound in the median plane. Identical ITDs and ILDs can be produced by sounds at eye level or at any elevation, as long as the lateral direction is constant. However, if the head is rotated, the ITD and ILD change dynamically, and those changes are different for sounds at different elevations. For example, if an eye-level sound source is straight ahead and the head turns to the left, the sound becomes louder (and arrives sooner) at the right ear than at the left. But if the sound source is directly overhead, there will be no change in the ITD and ILD as the head turns. Intermediate elevations will produce intermediate degrees of change, and if the presentation of binaural cues to the two ears during head movement is reversed, the sound will be heard behind the listener.[10][13]
Hans Wallach[14] artificially altered a sound’s binaural cues during movements of the head. Although the sound was objectively placed at eye level, the dynamic changes to ITD and ILD as the head rotated were those that would be produced if the sound source had been elevated. In this situation, the sound was heard at the synthesized elevation. The fact that the sound sources objectively remained at eye level prevented monaural cues from specifying the elevation, showing that it was the dynamic change in the binaural cues during head movement that allowed the sound to be correctly localized in the vertical dimension. The head movements need not be actively produced; accurate vertical localization occurred in a similar setup when the head rotation was produced passively, by seating the blindfolded subject in a rotating chair. As long as the dynamic changes in binaural cues accompanied a perceived head rotation, the synthesized elevation was perceived.[10]
Distance of the sound source
The human auditory system has only limited possibilities to determine the distance of a sound source. In the close-up-range there are some indications for distance determination, such as extreme level differences (e.g. when whispering into one ear) or specific pinna (the visible part of the ear) resonances in the close-up range.
The auditory system uses these clues to estimate the distance to a sound source:
- Direct/ Reflection ratio: In enclosed rooms, two types of sound are arriving at a listener: The direct sound arrives at the listener's ears without being reflected at a wall. Reflected sound has been reflected at least one time at a wall before arriving at the listener. The ratio between direct sound and reflected sound can give an indication about the distance of the sound source.
- Loudness: Distant sound sources have a lower loudness than close ones. This aspect can be evaluated especially for well-known sound sources.
- Sound spectrum : High frequencies are more quickly damped by the air than low frequencies. Therefore a distant sound source sounds more muffled than a close one, because the high frequencies are attenuated. For sound with a known spectrum (e.g. speech) the distance can be estimated roughly with the help of the perceived sound.
- ITDG: The Initial Time Delay Gap describes the time difference between arrival of the direct wave and first strong reflection at the listener. Nearby sources create a relatively large ITDG, with the first reflections having a longer path to take, possibly many times longer. When the source is far away, the direct and the reflected sound waves have similar path lengths.
- Movement: Similar to the visual system there is also the phenomenon of motion parallax in acoustical perception. For a moving listener nearby sound sources are passing faster than distant sound sources.
- Level Difference: Very close sound sources cause a different level between the ears.
Signal processing
Sound processing of the human auditory system is performed in so-called critical bands. The hearing range is segmented into 24 critical bands, each with a width of 1 Bark or 100 Mel. For a directional analysis the signals inside the critical band are analyzed together.
The auditory system can extract the sound of a desired sound source out of interfering noise. So the auditory system can concentrate on only one speaker if other speakers are also talking (the cocktail party effect). With the help of the cocktail party effect sound from interfering directions is perceived attenuated compared to the sound from the desired direction. The auditory system can increase the signal-to-noise ratio by up to 15 dB, which means that interfering sound is perceived to be attenuated to half (or less) of its actual loudness.
Localization in enclosed rooms
In enclosed rooms not only the direct sound from a sound source is arriving at the listener's ears, but also sound which has been reflected at the walls. The auditory system analyses only the direct sound,[9] which is arriving first, for sound localization, but not the reflected sound, which is arriving later (law of the first wave front). So sound localization remains possible even in an echoic environment. This echo cancellation occurs in the Dorsal Nucleus of the Lateral Lemniscus (DNLL).
In order to determine the time periods, where the direct sound prevails and which can be used for directional evaluation, the auditory system analyzes loudness changes in different critical bands and also the stability of the perceived direction. If there is a strong attack of the loudness in several critical bands and if the perceived direction is stable, this attack is in all probability caused by the direct sound of a sound source, which is entering newly or which is changing its signal characteristics. This short time period is used by the auditory system for directional and loudness analysis of this sound. When reflections arrive a little bit later, they do not enhance the loudness inside the critical bands in such a strong way, but the directional cues become unstable, because there is a mix of sound of several reflection directions. As a result no new directional analysis is triggered by the auditory system.
This first detected direction from the direct sound is taken as the found sound source direction, until other strong loudness attacks, combined with stable directional information, indicate that a new directional analysis is possible. (see Franssen effect)
Animals
Since most animals have two ears, many of the effects of the human auditory system can also be found in other animals. Therefore interaural time differences (interaural phase differences) and interaural level differences play a role for the hearing of many animals. But the influences on localization of these effects are dependent on head sizes, ear distances, the ear positions and the orientation of the ears.
Lateral information (left, ahead, right)
If the ears are located at the side of the head, similar lateral localization cues as for the human auditory system can be used. This means: evaluation of interaural time differences (interaural phase differences) for lower frequencies and evaluation of interaural level differences for higher frequencies. The evaluation of interaural phase differences is useful, as long as it gives unambiguous results. This is the case, as long as ear distance is smaller than half the length (maximal one wavelength) of the sound waves. For animals with a larger head than humans the evaluation range for interaural phase differences is shifted towards lower frequencies, for animals with a smaller head, this range is shifted towards higher frequencies.
The lowest frequency which can be localized depends on the ear distance. Animals with a greater ear distance can localize lower frequencies than humans can. For animals with a smaller ear distance the lowest localizable frequency is higher than for humans.
If the ears are located at the side of the head, interaural level differences appear for higher frequencies and can be evaluated for localization tasks. For animals with ears at the top of the head, no shadowing by the head will appear and therefore there will be much less interaural level differences, which could be evaluated. Many of these animals can move their ears, and these ear movements can be used as a lateral localization cue.
Sound localization by odontocetes
Dolphins (and other odontocetes) rely on echolocation to aid in detecting, identifying, localizing, and capturing prey. Dolphin sonar signals are well suited for localizing multiple, small targets in a three‐dimensional aquatic environment by utilizing highly directional (3 dB beamwidth of about 10 deg), broadband (3 dB bandwidth typically of about 40 kHz; peak frequencies between 40 kHz and 120 kHz), short duration clicks (about 40 μs). Dolphins can localize sounds both passively and actively (echolocation) with a resolution of about 1 deg. Cross‐modal matching (between vision and echolocation) suggests dolphins perceive the spatial structure of complex objects interrogated through echolocation, a feat that likely requires spatially resolving individual object features and integration into a holistic representation of object shape. Although dolphins are sensitive to small, binaural intensity and time differences, mounting evidence suggests dolphins employ position‐dependent spectral cues derived from well developed head‐related transfer functions, for sound localization in both the horizontal and vertical planes. A very small temporal integration time (264 μs) allows localization of multiple targets at varying distances. Localization adaptations include pronounced asymmetry of the skull, nasal sacks, and specialized lipid structures in the forehead and jaws, as well as acoustically isolated middle and inner ears.
Sound localization in the median plane (front, above, back, below)
For many mammals there are also pronounced structures in the pinna near the entry of the ear canal. As a consequence, direction-dependent resonances can appear, which could be used as an additional localization cue, similar to the localization in the median plane in the human auditory system. There are additional localization cues which are also used by animals.
Head tilting
For sound localization in the median plane (elevation of the sound) also two detectors can be used, which are positioned at different heights. In animals, however, rough elevation information is gained simply by tilting the head, provided that the sound lasts long enough to complete the movement. This explains the innate behavior of cocking the head to one side when trying to localize a sound precisely. To get instantaneous localization in more than two dimensions from time-difference or amplitude-difference cues requires more than two detectors.
Localization with coupled ears (flies)
The tiny parasitic fly Ormia ochracea has become a model organism in sound localization experiments because of its unique ear. The animal is too small for the time difference of sound arriving at the two ears to be calculated in the usual way, yet it can determine the direction of sound sources with exquisite precision. The tympanic membranes of opposite ears are directly connected mechanically, allowing resolution of sub-microsecond time differences[15][16] and requiring a new neural coding strategy.[17] Ho[18] showed that the coupled-eardrum system in frogs can produce increased interaural vibration disparities when only small arrival time and sound level differences were available to the animal’s head. Efforts to build directional microphones based on the coupled-eardrum structure are underway.
Bi-coordinate sound localization (owls)
Most owls are nocturnal or crepuscular birds of prey. Because they hunt at night, they must rely on non-visual senses. Experiments by Roger Payne[19] have shown that owls are sensitive to the sounds made by their prey, not the heat or the smell. In fact, the sound cues are both necessary and sufficient for localization of mice from a distant location where they are perched. For this to work, the owls must be able to accurately localize both the azimuth and the elevation of the sound source.
See also
- Acoustic location
- Animal echolocation
- Binaural fusion
- Coincidence detection in neurobiology
- Human echolocation
- Perceptual-based 3D sound localization
- Psychoacoustics
- Spatial hearing loss
References
- ↑ Jeffress L.A. (1948). "A place theory of sound localization". Journal of Comparative and Physiological Psychology 41: 35–39. doi:10.1037/h0061495.
- ↑ Schnupp J., Nelken I & King A.J., 2011. Auditory Neuroscience, MIT Press, chapter 5.
- ↑ Blauert, J.: Spatial hearing: the psychophysics of human sound localization; MIT Press; Cambridge, Massachusetts (1983)
- 1 2 3 4 5 6 Thompson, Daniel M. Understanding Audio: Getting the Most out of Your Project or Professional Recording Studio. Boston, MA: Berklee, 2005. Print.
- 1 2 3 Roads, Curtis. The Computer Music Tutorial. Cambridge, MA: MIT, 2007. Print.
- 1 2 Benade, Arthur H. Fundamentals of Musical Acoustics. New York: Oxford UP, 1976. Print.
- ↑ Ian Pitt. "Auditory Perception". Archived from the original on 2010-04-10.
- ↑ DeLiang Wang; Guy J. Brown (2006). Computational auditory scene analysis: principles, algorithms and applications. Wiley interscience. ISBN 9780471741091.
For sinusoidal signals presented on the horizontal plane, spatial resolution is highest for sounds coming from the median plane (directly in front of the listener) with about 1 degree MAA, and it deteriorates markedly when stimuli are moved to the side – e.g., the MAA is about 7 degrees for sounds originating at 75 degrees to the side.
- 1 2 Wallach,, H; Newman, E.B.; Rosenzweig, M.R. (July 1949). "The precedence effect in sound localization". American Journal of Psychology 62 (3): 315–336. doi:10.2307/1418275.
- 1 2 3 Wallach, Hans (October 1940). "The role of head movements and vestibular and visual cues in sound localization". Journal of Experimental Psychology 27 (4): 339–368. doi:10.1037/h0054629.
- ↑ Robert A. BUTLER, Richard A. HUMANSKI (1992). "Localization of sound in the vertical plane with and without high-frequency spectral cues" (PDF). Perception & Psychophysics 51 (2): 182–186. doi:10.3758/bf03212242.
- ↑ Roffler Suzanne K., Butler Robert A. (1968). "Factors That Influence the Localization of Sound in the Vertical Plane". J. Acoust. Soc. Am. 43 (6): 1255–1259. doi:10.1121/1.1910976.
- ↑ Thurlow, W.R. "Audition" in Kling, J.W. & Riggs, L.A., Experimental Psychology, 3rd edition, Holt Rinehart & Winston, 1971, pp. 267–268.
- ↑ Wallach, H (1939). "On sound localization". Journal of the Acoustical Society of America 10 (4): 270–274. doi:10.1121/1.1915985.
- ↑ Miles RN, Robert D, Hoy RR (Dec 1995). "Mechanically coupled ears for directional hearing in the parasitoid fly Ormia ochracea". J Acoust Soc Am. 98 (6): 3059–70. doi:10.1121/1.413830. PMID 8550933.
- ↑ Robert D, Miles RN, Hoy RR (1996). "Directional hearing by mechanical coupling in the parasitoid fly Ormia ochracea". J Comp Physiol [A] 179 (1): 29–44. doi:10.1007/BF00193432. PMID 8965258.
- ↑ Mason AC, Oshinsky ML, Hoy RR (Apr 2001). "Hyperacute directional hearing in a microscale auditory system". Nature 410 (6829): 686–90. doi:10.1038/35070564. PMID 11287954.
- ↑ Ho CC, Narins PM (Apr 2006). "Directionality of the pressure-difference receiver ears in the northern leopard frog, Rana pipiens pipiens". J Comp Physiol [A] 192 (4): 417–29. doi:10.1007/s00359-005-0080-7.
- ↑ Payne, Roger S., 1962. How the Barn Owl Locates Prey by Hearing. The Living Bird, First Annual of the Cornell Laboratory of Ornithology, 151-159
External links
- auditoryneuroscience.com: Collection of multimedia files and flash demonstrations related to spatial hearing
- Collection of references about sound localization
- Scientific articles about the sound localization abilities of different species of mammals
- Interaural Intensity Difference Processing in Auditory Midbrain Neurons: Effects of a Transient Early Inhibitory Input
- Online learning center - Hearing and Listening
- HearCom:Hearing in the Communication Society, an EU research project
- Research on "Non-line-of-sight (NLOS) Localisation for Indoor Environments" by CMR at UNSW
- An introduction to sound localization
- Sound and Room
- An introduction to acoustic holography
- An introduction to acoustic beamforming
|