Vocoder

From Wikipedia, the free encyclopedia

A vocoder (a portmanteau of vox/voc (voice) and encoder) is a speech analyzer and synthesizer. It was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. Its primary use in this fashion is for secure radio communication, where voice has to be digitized, encrypted and then transmitted on a narrow, voice-bandwidth channel. The vocoder has also been used extensively as an electronic musical instrument.

The vocoder is related to, but essentially different from, the computer algorithm known as the "phase vocoder".

Whereas the vocoder analyzes speech, transforms it into electronically transmitted information, and recreates it, the voder (from Voice Operating Demonstrator) generates synthesized speech by means of a console with fifteen touch-sensitive keys and a foot pedal, basically consisting of the "second half" of the vocoder, but with manual filter controls, needing a highly trained operator.[1]

Early 1970s vocoder, custom built by electronic music band Kraftwerk
Early 1970s vocoder, custom built by electronic music band Kraftwerk

Contents

[edit] How a vocoder works

[edit] Vocoder theory

The human voice consists of sounds generated by the opening and closing of the glottis by the vocal cords, which produces a periodic waveform with many harmonics. This basic sound is then filtered by the nose and throat (a complicated resonant piping system) to produce differences in harmonic content (formants) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the unvoiced and plosive sounds, which are not modified by the mouth in the same fashion.

The vocoder examines speech by finding this basic carrier wave, which is at the fundamental frequency, and measuring how its spectral characteristics are changed over time by recording someone speaking. This results in a series of numbers representing these modified frequencies at any particular time as the user speaks. In doing so, the vocoder dramatically reduces the amount of information needed to store speech, from a complete recording to a series of numbers. To recreate speech, the vocoder simply reverses the process, creating the fundamental frequency in an oscillator, then passing it through a stage that filters the frequency content based on the originally recorded series of numbers.

[edit] Early vocoders

Most analog vocoder systems use a number of frequency channels, all tuned to different frequencies (using band-pass filters). The various values of these filters are stored not as the raw numbers, which are all based on the original fundamental frequency, but as a series of modifications to that fundamental needed to modify it into the signal seen in the output of that filter. During playback these settings are sent back into the filters and then added together, modified with the knowledge that speech typically varies between these frequencies in a fairly linear way. The result is recognizable speech, although somewhat "mechanical" sounding. Vocoders also often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.

The first experiments with a vocoder were conducted in 1928 by Bell Labs engineer Homer Dudley, who eventually patented it in 1935. Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers (Alan Turing was briefly involved) in 1943. The SIGSALY system was used for encrypted high-level communications during WW-II. Later work in this field has been conducted by James Flanagan.

[edit] Linear prediction-based vocoders

Since the late 1970s, most non-musical vocoders have been implemented using linear prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole IIR filter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of the target speech signal. In contrast with vocoders realized using bandpass filter banks, the location of the linear predictor's spectral peaks is entirely determined by the target signal and need not be harmonic, i.e., a whole-number multiple of the basic frequency.

[edit] Modern vocoder implementations

Even with the need to record several frequencies, and the additional unvoiced sounds, the compression of the vocoder system is impressive. Standard systems to record speech record a frequency from about 500 Hz to 3400 Hz, where most of the frequencies used in speech lie, which requires 64kbit/s of bandwidth (the Nyquist rate). However a vocoder can provide a reasonably good simulation with as little as 2400 bit/s of data rate, a 26× improvement.

Several vocoder systems are used in NSA encryption systems:

  • LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
  • Code Excited Linear Prediction, (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III
  • Continuously Variable Slope Delta-modulation (CVSD), 16 Kbit/s, used in wide band encryptors such as the KY-57.
  • Mixed Excitation Linear Prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone.
  • Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32Kbit/s used in STE secure telephone

(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some other ADPCM codecs into G.726.)

Vocoders are also currently used in developing psychophysics, linguistics, computational neuroscience and cochlear implant research.

[edit] Musical applications

For musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a synthesizer as the input to the filter bank, a technique that became popular in the 1970s.

[edit] Musical history

In 1970, electronic music pioneers Wendy Carlos and Robert Moog developed one of the first truly musical vocoders. A 10-band device inspired by the vocoder designs of Homer Dudley, it was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the microphone; this ruins the device for its original speech-coding application, but it makes the "talking synthesizer" effect much more intelligible.

Carlos' and Moog's vocoder was featured in several recordings, including the soundtrack to Stanley Kubrick's A Clockwork Orange, in which the vocoder sang the vocal part of Beethoven's Ninth Symphony. Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. Originally, "Timesteps" was intended as merely an introduction to vocoders for the "timid listener", but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.

One of the first rock songs to feature a vocoder was "The Raven" by progressive rock band The Alan Parsons Project on their 1976 album Tales of Mystery and Imagination, and also was used on later albums such as I Robot. Following Alan Parsons' example, vocoders began to appear in pop music in the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light Orchestra used the vocoder in several albums such as Time. Pink Floyd made extensive use of the vocoder on the album Animals, even going so far as to put the sound of a barking dog through the device. Another example is Giorgio Moroder's 1977 album From Here to Eternity. Vocoders are often used to create the sound of a robot talking, as in the Styx song "Mr. Roboto". It was also used for the introduction to the Main Street Electrical Parade at Disneyland. The late R&B singer Roger Troutman used the vocoder extensively in his songs from the 70s up until the late 80s.

Vocoders have appeared on pop recordings from time to time ever since, most often simply as a special effect rather than a featured aspect of the work. However, many experimental electronic artists of the New Age music genre often utilize vocoder in a more comprehensive manner in specific works, such as Jean Michel Jarre (on Zoolook, 1984) and Mike Oldfield (on Five Miles Out, 1982). There are also some artists who have made vocoders an essential part of their music, overall or during an extended phase. Examples include the German synthpop group Kraftwerk, jazz/fusion keyboardist Herbie Hancock during his late 1970s disco period, French jazz organist Emmanuel Bex, Patrick Cowley's later recordings and more recently, avant-garde pop groups Trans Am, Black Moth Super Rainbow, Daft Punk, the Christian synthpop band Norway, as well as metal bands such as At All Cost and The Devil Wears Prada.

More recently, the combination of using vocoders and pitch-correction software can be heard in the 2007 single "Sensual Seduction" by Snoop Dogg. Clay Aiken's Falling, on the album On My Way Here, also employs a vocoder effect in the bridge.

[edit] Other voice effects

"Robot voices" became a recurring element in popular music during the late twentieth century. Several methods of producing variations on this effect have arisen, of which the vocoder remains the best known and most widely-used. The following other pieces of music technology are often confused with the vocoder:

The Talk box (Sonovox), Autotuner, Linear predictive coding, Ring modulation, Speech synthesis, and Comb filter.

Cher's Believe is an example of a vocal sound that is often confused with the vocoder. Mark Taylor, producer of Believe, used the vocoder minimally where the effect would become most striking. For example, in the lyrics "Do you believe in life after love?", the vocoder was used by shifting it into "Do you believe in life after lo-" instead of "-love" which added the dimension Taylor was looking for. Because Taylor was concerned with keeping the trade secret of the true effect on the song, he originally attributed the effects to the Digitech Talker [1] [2] which he claimed to produce the synthesized sound, but not before the vocals were filtered through the Drawmer DS404 [3]. However, as later defined as the Cher effect, the vocal effects were in fact created by Auto-Tune, used deliberately to create the effect.[4]

The sub-page Robotic voice effects includes more detailed comparisons.

[edit] Television, film and game applications

Vocoders have also been used in television, film and games usually for robots or talking computers:

  • The current Klasky Csupo closing logo "Robot", has a vocoder voiceover which is at the beginning, where as soon as the paint splashes on screen and a hand has placed a paper with a mouth on it. The vocoder voiceover says, "Klasky Csupo!"
  • One of the earliest film applications of vocoding can be heard in the flashback preludes of the 1949 movie A Letter to Three Wives.
  • In Grand Theft Auto: San Andreas, a vocoder is used to disguise Mike Toreno's voice in a phone call to CJ. The phone call is received after completing the "Yay Ka Boom Boom" mission in San Fierro.
  • In the episode of Hollyoaks broadcast on Channel 4 on Friday 7 December 2007, Elliot and John Paul, posing as pirate broadcasters, used a vocoder which hacked into Kris' radio broadcast.
  • In several of the Transformers TV series (and 1986 animated film), some of the vocal effects (those for Soundwave being the most prominent example) were created with vocoders.
  • The Cylons from Battlestar Galactica used the EMS Vocoder5000 and a ring-modulator to create their duo-tone voice effects.
  • A vocoder was used by Wendy Carlos for the soundtrack to Stanley Kubrick's A Clockwork Orange, particularly the choirs in "An die Freude".
  • In the film Sgt. Pepper's Lonely Hearts Club Band, the robotic singing of the Computerettes in the song "Mean Mr. Mustard" was achieved by using a vocoder.
  • In the game Half-Life 2 and its episodes trilogy, the main enemy, the Combine, talk in a sort of distorted sound, because Civil Protection units have vocoder in their masks, while transhuman soldiers and elites have the vocoder surgically implanted into their necks.
  • The 1980 version of the Doctor Who theme has a section generated by a Roland SVC-350 Vocoder. It is first obvious about 15 seconds into the theme.
  • In the early 1980s British sitcom Metal Mickey, used for the voice of Mickey, the robotic character.
  • Vocoders were utilized extensively by The Electric Light Orchestra, particularly on songs such as "Mr. Blue Sky" and "Sweet Talking Woman" both from Out of the Blue (1977) which uses the vocoder extensively throughout the entire album: the EMS Vocoder 2000W MkI, The prologue from the Time album: the Roland VP-330 Plus MkI, and "The Diary of Horace Wimp": also the EMS Vocoder (-System) 2000(W or B, MkI or II), to name just a few.

Most recently a famous song that is composed completely using vocoder is Imogen Heap's song "Hide and Seek" Where Heap plays full chords through her vocal without any other acompaniment

Example of vocoder

Demonstration of the "robotic voice" effect found in film and television.
Problems listening to the file? See media help.

"Mr. Blue Sky" by the Electric Light Orchestra (1977)

Classic example of a singing vocoded voice.
Problems listening to the file? See media help.

[edit] Vocoder Models

  • Elecrtix Warp Factory
  • Electro Harmonix Vocoder
  • EMS Vocoder 2000
  • EMS Vocoder 5000
  • Roland SVC-350
  • Roland VP-330
  • Roland VP-550
  • Sennheiser VSM-201
  • Korg Microkorg
  • Korg VC-10
  • Korg DVP-1

[edit] See also

[edit] References

[edit] Cited references

[edit] External links