Auditory masking occurs when the perception of one sound is affected by the presence of another sound.[1]
Contents |
Simultaneous masking is when a sound is made inaudible by a "masker", a noise or unwanted sound of the same duration as the original sound.[2]
If two sounds of two different frequencies (pitches) are played at the same time, two separate sounds can often be heard rather than a combination tone. This is otherwise known as frequency resolution or frequency selectivity. This is thought to occur due to filtering within the cochlea, also known as critical bandwidths, in the hearing organ in the inner ear. A complex sound is split into different frequency components and these components cause a peak in the pattern of vibration at a specific place on the cilia inside the basilar membrane within the cochlea. These components are then coded independently on the auditory nerve which transmits sound information to the brain. This individual coding only occurs if the frequency components are different enough in frequency, otherwise they are coded at the same place and are perceived as one sound instead of two.[3]
The filters that distinguish one sound from another are called auditory filters or listening channels, or also critical bandwidths. It is thought that they line up along the basilar membrane and when a sound wave excites the cilia it detects the perceived frequency and filters it into the appropriate critical band depending on whether it is a high low or mid frequency. Frequency resolution occurs on the basilar membrane due to the listener choosing a filter which is centered over the frequency they expect to hear, the signal frequency. A sharply tuned filter has good frequency resolution as it allows the centre frequencies through but not other frequencies (Pickles 1982). Damage to the cochlea and the outer hair cells in the cochlea can impair the ability to tell sounds apart (Moore 1986). This explains why someone with a hearing loss due to cochlea damage would have more difficulty than a normal hearing person in distinguishing between different consonants in speech.[4]
Masking illustrates the limits of frequency selectivity. If a signal is masked by a masker with a different frequency to the signal then the auditory system was unable to distinguish between the two frequencies. By experimenting with conditions where one sound can mask a previously heard signal, the frequency selectivity of the auditory system can be tested,[5]
How effective the masker is at raising the threshold of the signal depends on the frequency of the signal and the frequency of the masker. The graphs in figure B are a series of masking patterns, also known as masking audiograms. Each graph shows the amount of masking produced at each masker frequency shown at the top corner, 250, 500, 1000 and 2000Hz. For example, in the first graph the masker is presented at a frequency of 250Hz at the same time as the signal. The amount the masker increases the threshold of the signal is plotted and this is repeated for different signal frequencies, shown on the X axis. The frequency of the masker is kept constant. The masking effect is shown in each graph at various masker sound levels.
Figure B shows along the Y axis the amount of masking. The greatest masking is when the masker and the signal are the same frequency and this decreases as the signal frequency moves further away from the masker frequency.[1] This phenomenon is called on-frequency masking and occurs because the masker and signal are within the same auditory filter (figure C). This means that the listener cannot distinguish between them and they are perceived as one sound with the quieter sound masked by the louder one (figure D).
The amount the masker raises the threshold of the signal is much less in off frequency masking, but it does have some masking effect because some of the masker overlaps into the auditory filter of the signal (figure E)[5]
Off frequency masking requires the level of the masker to be greater in order to have a masking effect; this is shown in figure F. This is because only a certain amount of the masker overlaps into the auditory filter of the signal and more masker is needed to cover the signal.[5]
The masking pattern changes depending on the frequency of the masker and the intensity (figure B). For low levels on the 1000Hz graph, such as the 20-40 dB range, the curve is relatively parallel. As the masker intensity increases the curves separate, especially for signals at a frequency higher than the masker.[1] This shows that there is a spread of the masking effect upward in frequency as the intensity of the masker is increased. The curve is much shallower in the high frequencies than in the low frequencies. This flattening is called upward spread of masking and is why an interfering sound masks high frequency signals much better than low frequency signals.[1]
Figure B also shows that as the masker frequency increases, the masking patterns become increasingly compressed. This demonstrates that high frequency maskers are only effective over a narrow range of frequencies, close to the masker frequency. Low frequency maskers on the other hand are effective over a wide frequency range.[1]
Fletcher carried out an experiment to discover how much of a band of noise contributes to the masking of a tone. In the experiment, a fixed tone signal had various bandwidths of noise centred on it. The masked threshold was recorded for each bandwidth. His research showed that there is a critical bandwidth of noise which causes the maximum masking effect and energy outside that band does not affect the masking. This can be explained by the auditory system having an auditory filter which is centred over the frequency of the tone. The bandwidth of the masker that is within this auditory filter effectively masks the tone but the masker outside of the filter has no effect (figure G.)
This is used in MP3 files to reduce the size of audio files. Parts of the signals which are outside the critical bandwidth are cut out leaving only the parts of the signals which are perceived by the listener[6]
Another application of auditory masking in everyday situations is the cocktail party effect.
Varying intensity levels can also have an effect on masking. The lower end of the filter becomes flatter with increasing decibel level, whereas the higher end becomes slightly steeper (Moore 1998). Changes in slope of the high frequency side of the filter with intensity are less consistent than they are at low frequencies. At the medium frequencies (1–4 kHz) the slope increases as intensity increases, but at the low frequencies there is no clear inclination with level and the filters at high centre frequencies show a small decrease in slope with increasing level.[5] The sharpness of the filter depends on the input level and not the output level to the filter. The lower side of the auditory filter also broadens with increasing level.[5] These observations are illustrated in figure H.
Ipsilateral masking ("same side") is not the only condition where masking takes place. Another situation where masking occurs is called contralateral ("other side") simultaneous masking. In this case, the instance where the signal might be audible in one ear but is deliberately taken away by applying a masker to the other ear.
The last situation where masking occurs central masking. This refers to the case where a masker causes a threshold elevation. This can be in the absence of, or in addition to, another effect and is due to interactions within the central nervous system between the separate neural inputs obtained from the masker and the signal.[1]
Experiments have been carried out to see the different masking effects when using a masker which is either in the form of a narrow band noise or a sinusoidal tone.
When a sinusoidal signal and a sinusoidal masker (tone) are presented simultaneously the envelope of the combined stimulus fluctuates in a regular pattern described as beats. The difference between the frequencies of the two sounds equals the rate that the fluctuations occur. If the frequency difference is small then the sound is perceived as a periodic change in the loudness of a single tone. If the beats are fast then this can be described as a sensation of roughness. When there is a large frequency separation, the two components are heard as separate tones without roughness or beats. Beats can be a cue to the presence of a signal even when the signal itself is not audible. The influence of beats can be reduced by using a narrowband noise rather than a sinusoidal tone for either signal or masker.[3]
There are many different mechanisms of masking, one being suppression. This is when there is a reduction of a response to a signal due to the presence of another. This happens because the original neural activity caused by the first signal is reduced by the neural activity of the other sound.[7]
Addition is the adding of several maskers to result in an increased final masker threshold greater than the original maskers (Lincoln 1998).
Combination tones are products of a signal/s and a masker/s. This happens when the two sounds interact causing new sound, which can be more audible than the original signal. This is caused by the non linear distortion that happens in the ear.[5]
For example, the combination tone of two maskers can be a better masker than the two original maskers alone.[5]
The sounds interact in many ways depending on the difference in frequency between the two sounds. The most important two are cubic difference tones and quadratic difference tones.[5]
Cubic difference tones are calculated by the sum
F1 – F2
(F1 being the first frequency, F2 the second) These are audible most of the time and especially when the level of the original tone is low. Hence they have a greater effect on psychoacoustic tuning curves than quadratic difference tones.
Quadratic difference tones are the result of
F2 – F1
This happens at relatively high levels hence have a lesser effect on psychoacoustic tuning curves.[5]
Combination tones can interact with primary tones resulting in secondary combination tones due to being like their original primary tones in nature, stimulus like. An example of this is
3F1 – 2F2
Secondary combination tones are again similar to the combination tones of the primary tone.[5]
Off frequency listening is when a listener chooses a filter just lower than the signal frequency to improve their auditory performance. This “off frequency” filter reduces the level of the masker more than the signal at the output level of the filter, which means they can hear the signal more clearly hence causing an improvement of auditory performance.[2]
Temporal masking or non-simultaneous masking is when the signal and masker are not presented at the same time. This can be split into forward masking and backward masking. Forward masking is when the masker is presented first and the signal follows it. Backward masking is when the signal precedes the masker.[5]
The effect of auditory masking is used in Sound masking systems. These are audio systems that broadcast White noise for the purpose of hiding an unwanted sound. The unwanted noise may be intermittent sounds from machinery, people or other sources. Usually, this sound is filtered to provide the best effect of hiding the unwanted noise.
Spectral masking is a frequency-domain version of temporal masking, and tends to occur in sounds with similar frequencies: a powerful spike at 1 kHz will tend to mask out a lower-level tone at 1.1 kHz. This too, can be exploited by the psychoacoustic model.