Noise shaping

From Wikipedia, the free encyclopedia

Similar to dither, noise shaping is a bit reduction technique used to minimize quantization error. Noise shaping is used in many areas of digital signal processing, including digital audio and digital video.

Dither effectively reduces quantization error by adding noise prior to the quantization process. See the dither page for a more complete explanation of how this works. One concern about dither is that it adds white noise (sometimes colored noise) to the signal, inserting a noise floor at a fixed level below full scale (roughly 6 dB per bit). In situations wherein the receptor (in the case of digital audio, the human ear) is more sensitive to some frequencies than others, noise shaping can be used to "re-shape" the frequency contour of the noise.

Contents

[edit] How noise shaping works

Noise shaping works by putting the quantization error in a feedback loop. Any feedback loop functions as a filter, so by creating a feedback loop for the error itself, the error can be filtered as desired. The simplest example would be:

y(n) = x(n) + E(x(n − 1))

wherein y is the outbound sample value, x is the inbound sample value, n is the sample number, and E(x) is the error between the original and quantized values. This formula can also be read: The outbound sample is equal to the inbound sample plus the error from the previous inbound sample.

Essentially, when any sample's bit depth is reduced, the quantization error between the rounded (truncated) value and the original value is measured and stored. That "error value" is then added to the next sample prior to its quantization. The effect here is that the quantization error itself (and not the valid signal) is put into a feedback loop. This simple example gives a single-pole filter, or a filter that rolls off 6 dB per octave. The cutoff frequency of the filter can be controlled by the amount of the error from the previous sample that is fed back. For example, changing the value for A in the formula:

y(n) = x(n)+A \cdot E(x(n-1))

will change the frequency at which the feedback loop is centered.

More complex algorithms can be used which use more samples' errors' worth of feedback in order to create more complex curves. The following formula:

y(n) = x(n)+A \cdot E(x(n-1))+B \cdot E(x(n-2))+C \cdot E(x(n-3))+
D \cdot E(x(n-4))+E \cdot E(x(n-5))+F \cdot E(x(n-6))+
G \cdot E(x(n-7))+H \cdot E(x(n-8))+I \cdot E(x(n-9))

is that of a ninth order noise shaper, and can allow very complex noise shaping.

Noise shaping must also always involve an appropriate amount of dither within the process itself so as to prevent determinable and correlated errors to the signal itself. If dither is not used then noise shaping effectively functions merely as distortion shaping — pushing the distortion energy around to different frequency bands, but it is still distortion. If dither is added to the process as follows:

y(n) = x(n)+A \cdot E(x(n-1))+ dither

then the quantization error truly becomes noise, and the process indeed yields noise shaping.

[edit] Noise shaping in digital audio

Noise shaping in audio is most commonly done as a bit-reduction scheme. The quantization error from straight dither is flat, white noise. The ear, however, is less sensitive to certain frequencies than others at low levels (see Fletcher-Munson curves). By using noise shaping we can effectively spread the quantization error around so that more of it is focused on frequencies that we can't hear as well and less of it is focused on frequencies that we can hear. The result is that where the ear is most critical the quantization error can be reduced greatly and where our ears are less sensitive the noise is much greater.

One famous noise shaping algorithm is POW-R, designed by the POW-R Consortium. It uses a ninth order noise shaper to reduce 24 bit signals to 16 bits. By using such a steep noise shaper it can push much of the noise away from the critical 1 kHz to 4 kHz band and into the 20 Hz to 60 Hz and 12 kHz and up bands, where the ear's threshold of hearing is much higher. By doing so, the noise shaping algorithm can preserve complete 24 bit accuracy as far as the ear is concerned, providing as much as 150 dB of dynamic range where the ear needs it, even though the broadband noise (the cumulative noise from all frequencies) is still −96 dBFS.

Not all algorithms that reduce bit depth by spreading the noise around are noise shapers. UV-22 and UV-22HR by Apogee, for example, are 24 bit to 16 bit dither algorithms that merely use colored (filtered) dither. This does not involve a feedback loop and does not involve the filtering of the quantization error, but merely involves pre-filtering the dither noise.

[edit] Noise shaping and 1 bit converters

Since around 1989, 1 bit delta-sigma modulators have been used in analog to digital converters. This involves sampling the audio at a very high rate (2.8224 MS/s, for example) but only using 1 bit. Because only 1 bit is used, this converter only has 6 dB of dynamic range. The noise floor, however, is spread throughout the entire "legal" frequency range below the Nyquist frequency of 1.4112 MHz. Noise shaping is used to lower the noise present in the audible range (20 Hz to 20 kHz) and increase the noise above the audible range. This results in a broadband dynamic range of only 6 dB, but it is not consistent amongst frequency bands, and in the lowest frequencies (the audible range) the dynamic range is much greater — over 100 dB. Noise Shaping is inherently built into the delta-sigma modulators.

The 1 bit converter is the basis of the DSD format by Sony. An inherent flaw in the 1 bit converter (and thus the DSD system) is that because only 1 bit is used in both the signal and the feedback loop, adequate amounts of dither cannot be used in the feedback loop and distortion can be heard under some conditions. Most A/D converters made since 2000 use multi-bit or multi-level delta sigma modulators that yield more than 1 bit output so that proper dither can be added in the feedback loop. For traditional PCM sampling the signal is then decimated to 44.1 kS/s or other appropriate sample rates.

[edit] See also

In other languages