'The term echo cancellation' is used in telephony to describe the process of removing echo from a voice communication in order to improve voice quality on a telephone call. In addition to improving subjective quality, this process increases the capacity achieved through silence suppression by preventing echo from traveling across a network.
Two sources of echo have primary relevance in telephony: acoustic echo and hybrid echo.
Echo cancellation involves first recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or received signal. This technique is generally implemented using a digital signal processor (DSP), but can also be implemented in software. Echo cancellation is done using either echo suppressors or echo cancellers, or in some cases both.
Contents |
In telephony, "echo" is very much like what one would experience yelling in a canyon. Echo is the reflected copy of the voice heard some time later and a delayed version of the original. On a telephone, if the delay is fairly significant (more than a few hundred milliseconds), it is considered annoying. If the delay is very small (10's of milliseconds or less), the phenomenon is called sidetone and while not objectionable to humans, can interfere with the communication between data modems.
In the earlier days of telecommunications, echo suppression was used to reduce the objectionable nature of echos to human users. In essence these devices rely upon the fact that most telephone conversations are half-duplex. That is one person speaks while the other listens. An echo suppressor attempts to determine which is the primary direction and allows that channel to go forward. In the reverse channel, it places attenuation to block or "suppress" any signal on the assumption that the signal is echo. Naturally, such a device is not perfect. There are cases where both ends are active, and other cases where one end replies faster than an echo suppressor can switch directions to keep the echo attenuated but allow the remote talker to reply without attenuation.
Echo cancellers are the replacement for earlier echo suppressors that were initially developed in the 1950s to control echo caused by the long delay on satellite telecommunications circuits. Initial echo canceller theory was developed at AT&T Bell Labs in the 1960s,[1] but the first commercial echo cancellers were not deployed until the late 1970s owing to the limited capability of the electronics of the era. The concept of an echo canceller is to synthesize an estimate of the echo from the talker's signal, and subtract that synthesis from the return path instead of switching attenuation into/out of the path. This technique requires adaptive signal processing to generate a signal accurate enough to effectively cancel the echo, where the echo can differ from the original due to various kinds of degradation along the way.
Rapid advances in the implementation of digital signal processing allowed echo cancellers to be made smaller and more cost-effective. In the 1990s, echo cancellers were implemented within voice switches for the first time (in the Northern Telecom DMS-250) rather than as standalone devices. The integration of echo cancellation directly into the switch meant that echo cancellers could be reliably turned on or off on a call-by-call basis, removing the need for separate trunk groups for voice and data calls. Today's telephony technology often employs echo cancellers in small or handheld communications devices via a software voice engine, which provides cancellation of either acoustic echo or the residual echo introduced by a far-end PSTN gateway system; such systems typically cancel echo reflections with up to 64 milliseconds delay.
Voice messaging and voice response systems which accept speech for caller input use echo cancellation while speech prompts are played to prevent the systems own speech recognition from falsely recognizing the echoed prompts.
Acoustic echo arises when sound from a loudspeaker—for example, the earpiece of a telephone handset—is picked up by the microphone in the same room—for example, the mic in the very same handset. The problem exists in any communications scenario where there is a speaker and a microphone. Examples of acoustic echo are found in everyday surroundings such as:
In most of these cases, direct sound from the loudspeaker (not the person at the far end, otherwise referred to as the Talker) enters the microphone almost unaltered. This is called direct acoustic path echo. The difficulties in cancelling acoustic echo stem from the alteration of the original sound by the ambient space. This colours the sound that re-enters the microphone. These changes can include certain frequencies being absorbed by soft furnishings, and reflection of different frequencies at varying strength. These secondary reflections are not strictly referred to as echo, but rather are "reverb".
Acoustic echo is heard by the far end talkers in a conversation. So if a person in Room A talks, they will hear their voice bounce around in Room B. This sound needs to be cancelled, or it will get sent back to its origin. Due to the slight round-trip transmission delay, this acoustic echo is very distracting.
Since invention at AT&T Bell Labs[1] echo cancellation algorithms have been improved and honed. Like all echo cancelling processes, these first algorithms were designed to anticipate the signal which would inevitably re-enter the transmission path, and cancel it out.
The acoustic echo cancellation (AEC) process works as follows:
The primary challenge for an echo canceller is determining the nature of the filtering to be applied to the far-end signal such that it resembles the resultant near-end signal. The filter is essentially a model of the speaker, microphone and the room's acoustical attributes.
To configure the filter, early echo cancellation systems required training with impulse or pink noise, and some used this as the only model of the acoustic space. Later systems used this training only as a basis to start from, and the canceller then adapted from that point on. By using the far-end signal as the stimulus, modern systems use an adaptive filter and can 'converge' from nothing to 55 dB of cancellation in around 200 ms.
Until recently echo cancellation only needed to apply to the voice bandwidth of telephone circuits. PSTN calls transmit frequencies between 300 Hz and 3 kHz, the range required for human speech intelligibility. Videoconferencing is one area where full bandwidth audio is transceived. In this case, specialised products are employed to perform echo cancellation.
Hybrid echo is generated by the public switched telephone network (PSTN) through the reflection of electrical energy by a device called a hybrid (hence the term hybrid echo). Most telephone local loops are two-wire circuits while transmission facilities are four-wire circuits. Each hybrid produces echoes in both directions, though the far end echo is usually a greater problem for voiceband.
Echo suppression may have the side-effect of removing valid signals from the transmission. This can cause audible signal loss that is called "clipping" in telephony, but the effect is more like a "squelch" than amplitude clipping. In an ideal situation then, echo cancellation alone will be used. However this is insufficient in many applications, notably software phones on networks with long delay and meager throughput. Here, echo cancellation and suppression can work in conjunction to achieve acceptable performance.
Echo control on voice-frequency data calls that use dial-up modems may cause data corruption. Some telephone devices disable echo suppression or echo cancellation when they detect the 2100 or 2225 Hz "answer" tones associated with such calls, in accordance with ITU-T recommendation G.164 or G.165.
In the 1990s most echo cancellation was done inside modems of type v.32 and later. In voiceband modems this allowed using the same frequencies in both directions simultaneously, greatly increasing the data rate. As part of connection negotiation, each modem sent line probe signals, measured the echoes, and set up its delay lines. Echoes in this case did not include long echoes caused by acoustic coupling, but did include short echoes caused by impedance mismatches in the 2-wire local loop to the telephone exchange.
After the turn of the century, DSL modems also made extensive use of automated echo cancellation. Though they used separate incoming and outgoing frequencies, these frequencies were beyond the voiceband for which the cables were designed, and often suffered attenuation distortion due to bridge taps and incomplete impedance matching. Deep, narrow frequency gaps often resulted, that could not be made usable by echo cancellation. These were detected and mapped out during connection negotiation.