Multichannel code

With the popularization of digital audio there is a growing demand for audio compression and audio transmission techniques. The use of perceptual coding (based on the psychoacoustic model) has been a major breakthrough in the compressión of digital audio. But it doesn't solve the problem of multi-channel audio encoding.

The evolution of multichannel audio technology has been growing slowly beginning with stereo evolving into 5.1 surround sound systems and then systems of 10 or even more channels. These systems are no longer just for cinema or recording studios, but they are now used in what we call (home cinema).

These "home cinema" systems employ Dolby 5.1 which has 5 channels and a sixth channel for the low frequencies. However, there are new applications that use many more channels.

However, it seems important to emphasize the fact that all multichannel systems don't have the same behavior. We could talk about two different categories:

Category 1 includes: sound films intended to be reproduced in "home cinema 5.1" systems or in the theatrical business. The cross-correlation between channels tends to be high for symmetrical channels (L-Ls, R-Rs, C-R, C-L) but not among the rest of the channels.

Category 2 would include those live signals obtained using multiple microphones to capture the acoustic properties of a room. The signals oriented for generate acoustic fields, obtained by a linear grouping of microphones, belong to this group. Such signals have a very high cross correlation between all channels.

Class 1 signals

To class 1 signals, the system most commonly used today is Dolby Digital. This is used both in cinemas and in home cinema. The system compressión is Dolby AC-3. The distribution of the speakers in Dolby 5.1 is composed of three front channels (left, right and center), two surround channels (left and right) and a channel devoted to strengthening the bass effects. This channel is severely limited to a band of 20 to 120 Hz, while the other five have a frequency response of 20 Hz to 20 kHz, so we are talking about a system of 5.1 channels.

The digital audio encoding that is used in the Compact Disc (16-bit PCM) achieves a dynamic range of 96 dB at the expense of working at a frequency 44.1 kHz with samples of 16-bit, which is a lot of data over to be stored or transmitted in a cost-effective manner, particularly in multichannel systems. That is why we need compression algorithm. The Dolby AC-3 achieves compression rates 10:1 also allowed for different bit rates depending on the number of channels encoded or quality required.

Dolby AC-3 has been designed to maximize the time and frequency masking characteristic of human hearing. This happens every signal to encode for a bank of filters, distributes bits that will be quantified with the espectral components of different bands in the light of the spectral characteristics of the encoded signal. An internal model that simulates the frequency masking and temporary hearing allows the encoder vary its resolution espectral-temporal depending on the nature of sound, in a way that ensures a minimum number of bits to describe each band signal in ensuring that the noise becomes totally masked. This model makes those masquerading frequency spectral components of the sound that will be masked by other are not encrypted. AC-3 also distributes the bits between the various channels so as to get a bit rate stable, allocating more bits to channels with a higher frequency content.

The algorithm AC-3 considers the six channels as a single entity by adding a single bit frame, which gets a bit rate less than separating each channel in a different frame.

The most important blocks of this algorithm are the following ones:

Buffer-entry.
Filtering
Detector transient
Precombinación carrier

Buffer-entry

AC-3 is a block structured encoder, so that one or more blocks of samples of the signal in time are stored in the buffer for each channel input before proceeding with the prosecution. The blocks are usually composed of 512 samples.

Filtering

Input signals are individually filtered high pass at a frequency of 3 Hz to eliminate continuous component. The bass signal channel is also serious low-pass filtered at a frequency of 120 Hz.

Detector transient

We apply a band pass filter centred at high frequency that detects the presence of transients.

In the case of a signal that varies very quickly, such as the attack on a cymbal, we need a good temporal resolution of the same (which implies less spectral resolution), hence the block size must be to codify small for the quantization noise associated with this signal be temporarily confined in the vicinity of the same, so that this can be masked by noise that signal along the lines of masking temporary human ear.

It has imposed a limitation on the size variations that may suffer from the blocks in order to facilitate the process of consolidation; allowed eight different combinations of four types of window. Each of the eight combinations are identified by an ID Table The decoder must know at all times the kind of ID Table It is being used in the analysis of the signal, so that this information is multiplex together with the coefficients describing the signal. The information in Table ID used in conjunction with their protection against mistakes is the 1% of the total bit rate.

Precombinación carrier .

In general the average bit multichannel systems is directly proportional to the square root of the number of channels. If we use 128 kbit / s to encode a single channel, an amount of 5.1 channels will require 128 • √ 5.1 = 289 kbit / s that can be transmitted using the speed with comfort typical working AC-3 (320 kbit / s). That is why most of the time will be sufficient to use as a method of compression algorithm of allocation of bits. However, when needed greater compression is also used method precombinación carrier.

This technique eliminates redundant information HF, and is based on the phenomenon psicoacustic that high frequencies in the human hearing is most sensitive to "surround" sound than the signal itself.

This behavior is used by the AC-3 separating the signals and high-frequency carrier envelope, so that information is encoded surround the more accurately the carrier.

The auditory impact is minimal, since the location of the sound is recorded on the envelope, which will combine sound in the ear by producing an effect equivalent to the original sound.

Besides all this, we take the high correlation that exists between channels using symmetric encryption difference and amount so we got also save more bit as symmetrical as channels són quite similar, we will need to encode only one, and the difference between this and the other channel.

Class 2 signals

Karhunen-Loeve theorem

For encoding multichannel signals for Class 2 use is made of the properties of the Karhunen-Loeve theorem. This is transformed into a product matrix type MxV = U. Where M is the matrix composed of the eigenvectors associated with the matrix covariances of V and U matrix signals uncorrelate that call matrix autochannels. V is the matrix that contains our multichannel signal to encode. And finally U és matrix output with our encoded signal.

KLT properties

What is interesting about this operation is its properties. The first is that if we want to restore our original signals, we only have to multiply it by the matrix M transposed. This greatly simplifies the decode time.

The second is that the U channels are ordered from highest energy to smaller energy. This is very useful for bit assignment for codification. The codification of the channels less energy.

Binary requires a bitrate much smaller than that of those higher energy channels.

The third is that the signals obtained from the KLT retain the spectral characteristics and perceptive the audio signals. Therefore we can also use this property to apply perceptual coding.

Application of the properties for class 2 signals

Based on these properties we can reduce the bit rate. First seize the matrix M has its maximum values for diagonal. Since the diagonal is the cross-correlation of own channels. Therefore in the values that are far from the diagonal use fewer bits. The second property is the one that will take over as the last channels (which have a lower energy) is encode with less bits. This does not greatly affect the quality. And finally seize property to implement the third perceptual coding as can be for example encryption algorithm Advance Audio Codec (AAC). Using this technique can achieve very high compression rates. But the only condition is that all channels have a high autocorrelation between all of them.