Source coding

From Wikipedia, the free encyclopedia

Source coding is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is the concept behind data compression.

In information theory, the source coding theorem (Shannon 1948) informally states that:

"N i.i.d. random variables each with entropy H(X) can be compressed into more than NH(X) bits with negligible risk of information loss, as N tends to infinity; but conversely, if they are compressed into fewer than NH(X) bits it is virtually certain that information will be lost." (MacKay 2003).

Contents

[edit] Source Coding Theorem for i.i.d. sources

[edit] Fixed Rate \epsilon\;-lossless source coding for discrete-time i.i.d. sources

Given X is an i.i.d. source, its time series X1, ..., Xn is i.i.d. with entropy H(X) in the discrete-valued case and differential entropy in the continuous-valued case. The Source coding theorem states that for any ε > 0 for any rate larger than the entropy of the source, there is large enough n and an encoder that takes n i.i.d. repetition of the source, X1:n, and maps it to n.(H(X) + ε) binary bits such that the source symbols X1:n are recoverable from the binary bits with probability at least 1 − ε.

Proof for achievablity

Fix some for any ε > 0. The typical set,A_n^\epsilon, is defined as follows:

A_n^\epsilon =\; \left\{x_1^n : \left|-\frac{1}{n} \log p(X_1, X_2, ..., X_n) - H_n(X)\right|<\epsilon \right\}

AEP shows that for large enough n, the probability that a sequence generated by the source lies in the typical set,A_n^\epsilon, defined as approaches one. In particular there for large enough n, P(A_n^\epsilon)>1-\epsilon (See AEP for a proof):

The definition of typical sets implies that those sequences that lie in the typical set satisfy:

2^{-n(H(X)+\epsilon)} \leq p(x_1, x_2, ..., x_n) \leq 2^{-n(H(X)-\epsilon)}

Note that:

  • The probability of a sequence from X being drawn from {A_\epsilon}^{(n)} is greater than 1 − ε
  • \left| {A_\epsilon}^{(n)} \right| \leq 2^{n(H(X)+\epsilon)} since the probability of the whole set {A_\epsilon}^{(n)} is at most one.
  • \left| {A_\epsilon}^{(n)} \right| \geq (1-\epsilon)2^{n(H(X)-\epsilon)}. For the proof, use the upper bound on the probability of each term in typical set and the lower bound on the probability of the whole set {A_\epsilon}^{(n)}.

Since \left| {A_\epsilon}^{(n)} \right| \leq 2^{n(H(X)+\epsilon)}, n.(H(X)+\epsilon)\; bits are enough to point to any string in this set.

The encoding algorithm: The encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary n.(H(X) + ε) digit number. As long as the input sequence lies within the typical set (with probability at least 1 − ε), the encoder doesn't make any error. So, the probability of error of the encoder is bounded above by ε

Proof for converse The converse is proved by showing that any set of size smaller than A_n^\epsilon (in the sense of exponent) would cover a set of probability bounded away from 1.

[edit] Variable length perfectly lossless source coding

A variable length source code maps source symbols to a variable number of bits. This allows sources to be compressed and decompressed with zero error. It is possible to compress an i.i.d. source arbitrarily close to its entropy with variable length codes.

Some examples of well-known variable length coding strategies are Huffman coding, Lempel-Ziv coding and Arithmetic coding.

Variable length codes can be strictly nested in order of decreasing generality as non-singular, uniquely decodable and instantaneous (prefix free). Instantaneous codes are always uniqely decodable, which in turn are always non-singular.

A code is non-singular if each source symbol is mapped to a different string, i.e. the mapping from source symbols to bit strings is one-to-one.

The extension of a code is the mapping on finite length source sequences to finite length bit strings that is induced by the original code by applying concatenating the codewords of each symbol. A code is uniquely decodable if its extension is non-singular.

A code is 'instantaneous' if no codeword is a prefix of another codeword. This means that symbols can be decoded instantaneously after their entire codeword is received.

An example of an instantaneous variable length code is shown below.

Image:Variablelengthcodeexample.JPG

The source is one of four symbols: a,b,c or d. The binary codeword for each symbol is given. The encoding and decoding of a portion of an example source sequence is given below.

Image:Variablelengthcodeexample2.JPG

The advantage of a variable length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low expected codeword length. For the above example, if the probabilites of a,b,c and d were \Bigg(\frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{8}\Bigg), the expected number of bits used to represent a source symbol would be \frac{7}{4}. The entropy of this source is 1.75 bits per symbol, so this code compresses the source as much as possible so that the source can be recovered with zero error.

[edit] Fixed rate lossy source coding for discrete-time i.i.d. sources

[edit] Source Coding Theorem for non-stationary independent sources

[edit] Fixed Rate lossless source coding for discrete time non-stationary independent sources

Define typical set A_n^\epsilon as:

A_n^\epsilon =\; \{x_1^n : \left|-\frac{1}{n} \log p(X_1, X_2, ..., X_n) - \bar{H_n}(X)\right|<\epsilon \}

Then, for given δ > 0, for n large enough, \Pr(A_n^\epsilon)> 1-\delta. Now we just encode the sequences in the typical set, and usual methods in source coding show that the cardinality of this set is smaller than 2^{n(\bar{H_n}(X)+\epsilon)}. Thus, on an average, \bar{H_n}(X)+\epsilon bits suffice for encoding with probability greater than 1 − δ, where \epsilon\; \mbox{ and }\; \delta can be made arbitrarily small, by making n larger.

[edit] See also