Entropy encoding

From Wikipedia, the free encyclopedia

You have new messages (last change).

In information theory an entropy encoding is a data compression scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes where the length of each codeword is proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes.

According to Shannon's source coding theorem, the optimal code length for a symbol is −log_bP, where b is the number of symbols used to make output codes and P is the probability of the input symbol.

Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code such as unary coding, Elias gamma coding, Fibonacci coding, Golomb coding, or Rice coding may be useful.

[edit] Entropy as a measure of Similarity

Besides using entropy encoding as a way to compress (and losslessly recover) digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.

[edit] See also

[edit] External links

On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay - gives an accessible introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding.
Spam Filtering using Statistical Data Compression Models by Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, Thomas R. Lynam and Blaz Zupan, Journal of Machine Learning Research, Vol 7(Dec), 2006.
Anatomy of Range Encoder

An earlier (open content) version of the above article was posted on PlanetMath.

v • d • e

Data compression

Lossless compression methods

Theory

Entropy · Complexity · Redundancy

Entropy encoding

Huffman · Adaptive Huffman · Arithmetic (Shannon-Fano · Range) · Golomb · Exp-Golomb · Universal (Elias · Fibonacci)

Dictionary

LZ77/78 · LZW · LZO · DEFLATE · LZMA · LZX

Others

RLE · BWT · PPM

Audio compression methods

Theory

Convolution · Sampling · Nyquist–Shannon theorem

Audio codecs parts

LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · μ-law · MDCT · Fourier transform · Psychoacoustic model

Others

Dynamic range compression · Speech compression · Sub-band coding

Image compression methods

Terms

Color space · Pixel · Chroma subsampling · Compression artifact

Methods

RLE · Fractal · Wavelet · SPIHT · DCT · KLT

Others

Bit rate · Test images · PSNR quality measure · Quantization

Video compression

Terms

Video Characteristics · Frame · Frame types · Video quality

Video codec parts

Motion compensation · DCT · Quantization

Others

Video codecs · Rate distortion theory (CBR · ABR · VBR)

Timeline of information theory, data compression, and error-correcting codes

Retrieved from "http://en.wikipedia.org../../../e/n/t/Entropy_encoding.html"

Categories: Lossless compression algorithms | Entropy and information

Views

interaction

Search

In other languages