Quantization (sound processing)
From Wikipedia, the free encyclopedia
In signal processing, quantization is the process of approximating a continuous range of values (or a very large set of possible discrete values) by a relatively-small set of discrete symbols or integer values. This article describes aspects of quantization related to sound signals.
After sampling, speech signals are usually represented by one of a fixed number of values, in a process known as pulse-code modulation (PCM). Some specific issues related to quantization of audio signals follow.
Contents |
[edit] Audio quantization
Telephony applications frequently use 8-bit quantization. That is, all possible values of the analogue waveform are compressed into just 256 distinct voltage values that are then represented as an 8-bit binary number. This crude quantization introduces substantial quantization noise into the signal, but the result is still more than adequate to represent human speech.
By comparison, compact discs use a 16-bit digital representation, allowing 65,536 distinct voltage levels. This is far better than telephone quantization but CD audio representing low signal levels would still sound noticeably 'granular' because of the quantizing noise, were it not for the addition of a small amount of noise to the signal before digitisation. This deliberately-added noise is known as dither. Adding dither eliminates this granularity, and gives very low distortion, but at the expense of a small increase in noise level. Measured using ITU-R 468 noise weighting, this is about 66dB below alignment level, or 84dB below FS (full scale) digital, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence (see Programme levels for more on this).
[edit] Optimising dither waveforms
In a seminal paper published in the AES Journal, Lipshitz and Vanderkooy pointed out that different noise types, with different probability density functions (PDF's) behave differently when used as dither signals, and suggested optimal levels of dither signal for audio. Gaussian noise requires a higher level for full elimination of distortion than rectangular PDF or triangular PDF noise. Triangular PDF noise has the advantage of requiring a lower level of added noise to eliminate distortion and also minimising 'noise modulation'. The latter refers to audible changes in the residual noise on low level music that are found to draw attention to the noise.
[edit] Noise shaping for lower audibility
An alternative to dither is noise shaping, which involves a feedback process in which the final digitised signal is compared with the original, and the instantaneous errors on successive past samples integrated and used to determine whether the next sample is rounded up or down. This smooths out the errors in a way that alters the spectral noise content. By the neat device of inserting a weighting filter in the feedback path the spectral content of the noise can be shifted to areas of the 'equal-loudness contours' where the human ear is least sensitive, producing a lower subjective noise level (-68/-70dB typically ITU-R 468 weighted).
[edit] 24-bit quantization
24-bit audio is sometimes used undithered, because for most audio equipment and situations the noise level of the digital converter can be louder than the required level of any dither that might be applied.
There is some disagreement over the recent trend towards higher bit-depth audio. It is argued by some that the dynamic range presented by 16-bit is sufficient to store the dynamic range present in almost all music. In terms of pure data storage this is often true, as a high-end system can extract an extremely good sound out of the 16-bits stored in a well-mastered CD. However, audio with very loud and very quiet sections can require some of the above dithering techniques to fit it into 16-bits. This is not a problem for most recently produced popular music, which is often mastered so that it constantly sits close to the maximum signal (see loudness war), however higher resolution audio formats are already being used (especially for applications such as film soundtracks, where there is often a very wide dynamic range between whispered conversations and explosions).
For most situations, the advantage given by higher-resolution audio than 16-bits are mainly to do with processing the audio. No digital filter is perfect, but if the audio is upsampled and the audio is done in 24-bit or higher, then the distortion introduced by filtering will be much quieter (as the errors always creep into the least significant bits) and a well designed filter can weight the distortion more towards the higher inaudible frequencies (but you need a sample rate higher than 48kHz so that these inaudible frequencies are available for soaking up errors).
There is also a good case for 24-bit (or higher) recording in the live studio, because it enables greater headroom (often 24dB or more rather than 18dB) to be left on the recording without encountering quantization errors at low volumes. This means that brief peaks are not harshly clipped, but can be compressed or soft-limited later to suit the final medium.
Environments where large amounts of signal processing are required (such as mastering or synthesis) can require even more than 24 bits. Some of modern audio editors convert incoming audio to 32-bit (both for an increased dynamic range to reduce clipping, and to minimise noise in intermediate stages of filtering), and some DAW environments (such as recent versions of SONAR) use 64-bit audio for their underlying engine.