Echo cancellation
From Wikipedia, the free encyclopedia
The term echo cancellation is used in telephony to describe the process of removing echo from a voice communication in order to improve voice quality on a telephone call. In addition to improving quality, this process improves bandwidth savings achieved through silence suppression by preventing echo from traveling across a network.
There are two types of echo of relevance in telephony: acoustic echo and hybrid echo. Speech compression techniques and digital processing delay often make these echoes more severe in telephone networks.
Echo cancellation involves first recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or received signal. This technique is generally implemented using a digital signal processor (DSP), but can also be implemented in software. Echo cancellation is done using either echo suppressors or echo cancellers.
Contents |
[edit] Acoustic echo
Acoustic echo arises when sound from a loudspeaker - for example, the earpiece of a telephone handset, is picked up by the microphone in the same room - for example, the mic in the very same handset. The problem exists in any communications scenario where there is a speaker and a microphone. Examples of acoustic echo are found in everyday surroundings such as;
° Hands-free car phone systems
° A standard telephone in speakerphone or hands-free mode
° Conference phones such as Polycom's Soundstation
° Installed room systems which use ceiling speakers and microphones on the table
° Physical coupling (vibrations of the loudspeaker transfer to the microphone via the handset casing)
In most of these cases, direct sound from the loudspeaker (not the person at the far end - otherwise referred to as the Talker) enters the microphone almost unaltered. This is called direct acoustic path echo. The difficulties in cancelling acoustic echo stem from the alteration of the original sound by the ambient space. This colours the sound that re-enters the microphone. These changes can include certain frequencies being absorbed by soft furnishings, and reflection of different frequencies at varying strength. These secondary reflections are not strictly referred to as echo, but rather are reverb(eration).
Acoustic echo is heard by the far end talkers in a conversation. So if a person in Room A talks, they will hear their voice bounce around in Room B. This sound needs to be cancelled, or it will get sent back to its origin. Due to the slight round-trip transmission delay, this acoustic echo is very distracting.
[edit] Acoustic Echo Cancellation
Since invention at AT&T Bell Labs (1950), echo cancellation algorithms have been improved and honed. Like all echo cancelling processes, these first algorithms were designed to anticipate the signal which would inevitably re-enter the transmission path, and cancel it out.
The Acoustic Echo Cancellation (AEC) process is this;
Received sound is digitally sampled to form a reference signal →
This sound is then produced by the speaker →
The microphone picks up the resulting direct path sound, and consequent reverberant sound →
This is again digitally sampled →
The reference signal and echo signal are compared. In an ideal system these two are exactly the same →
The reference signal is summed with the echo signal at 180° out of phase. Again, in an ideal system this results in a perfect null →
This process continues for every sample.
[edit] Challenges for AEC (Acoustic Echo Cancellation)
There are two main issues that echo cancellers must deal with. The first is the changes and additions to the original signal caused by imperfections of the loudspeaker, microphone, reverberant space and physical coupling. The second is the changing nature of those changes. See below.
[edit] Signal Colouration
The first problem is dealt with by modelling the acoustic space in the time and frequency domains. AEC (Acoustic Echo Cancellation) algorithms approximate the result of the next sample by comparing the difference between the current and previous samples. To simplify - a sound is sampled pre-speaker and post microphone, then compared for initial differences in frequencies, and frequencies that are longer than they were in the original sample. This can visualised by Fourier Transform. The resulting information is used to predict how the next sound will be altered by the acoustic path. The model of the acoustic space is therefore continually updated. Updates are not instantly applied, but occur over a half second or so.
Older echo cancellation systems required training with impulse or pink noise, and some used this as the only model of the acoustic space. Later systems used this training only as a basis to start from, and the canceller then adapted from that point on. Modern systems can 'converge' form nothing to 55dB of cancellation in around 200ms.
[edit] Changes in Colouration
The changing nature of the sampled signal is mainly due to changes in the acoustic environment, not the characteristics of loudspeaker, microphone or physical coupling. These are from moving objects in the environment, and movement of the microphone within that environment. When a door is closed or opened, a chair is pulled in closer to the table, drapes are opened or closed, all these effect a change in the reverberation of the sound in the space. For this reason, the cancellation algorithm also has a degree of aggressive adaptation called Non-Linear Processing. This allows the algorithm to make changes to the model of the acoustic path that are suggested, but not yet confirmed by comparison of the two signals.
[edit] NLP (Non-Linear Processing)
Due to the pre-emptive characteristic of NLP (Non-Linear Processing) algorithms, they are quickly overused. As more NLP is applied, the chance of overcancelling rises, resulting in echo. This may seem counterintuitive, but if the initial sample is +35, and the echo cancellation applied is -42, the result is not a perfect cancellation of 0, it is -7. This resulting echo may be small and out of phase, but it is still echo. Under/over compensation also occurs independent of NLP, but is less audible when it does.
[edit] Full Bandwidth Cancellation
Fortunately for engineers and telcos, until recently echo cancellation only needed to apply to the voice bandwidth of telephone circuits. PSTN calls contain frequencies from 300Hz to 3kHz, because this is enough for human speech to be intelligible.
Videoconferencing is a one area where full bandwidth audio is transceived. In this case, specialised products such as Clearone XAP or Polycom Vortex units are employed to perform echo cancellation.
[edit] Hybrid echo
Hybrid echo is generated by the public switched telephone network (PSTN) through the reflection of electrical energy by a device called a hybrid (hence the term hybrid echo). Most telephone lines are two-wire circuits while transmission facilities are four-wire. Each hybrid produces echoes in both directions, though the far end echo is usually a greater problem for voiceband.
[edit] Drawbacks
Echo suppression may have the side-effect of removing valid signals from the transmission. This can cause audible signal loss that is called "clipping." In an ideal situation then, echo cancellation alone will be used. However this is insufficient in many applications, notably software phones. Here, echo cancellation and suppression can work in conjunction to achieve acceptable performance.
[edit] Modems
Echo control on voice-frequency data calls that use dial-up modems may cause data corruption. Some telephone devices disable echo suppression or echo cancellation when they detect the 2100 or 2225 Hz "answer" tones associated with such calls, in accordance with ITU-T recommendation G.164 or G.165.
In the 1990s most echo cancellation was done inside modems of type v.32 and later. In voiceband modems this allowed using the same frequencies in both directions simultaneously, greatly increasing the data rate. As part of connection negotiation, each modem sent line probe signals, measured the echoes, and set up its delay lines. Echoes in this case did not include long echoes caused by acoustic coupling, but did include short echoes caused by impedance mismatches in the 2-wire connection to the Telephone exchange.
After the turn of the century DSL modems also made extensive use of echo cancellation, though they used separate incoming and outgoing frequencies. Frequencies beyond voiceband were often damaged by impedance mismatch, including narrow frequency gaps that were unusable. These were detected and mapped out during connection negotiation.