Talk:Entropy encoding

From Wikipedia, the free encyclopedia

WikiProject Physics This article is within the scope of WikiProject Physics, which collaborates on articles related to physics.
??? This article has not yet received a rating on the assessment scale. [FAQ]
??? This article has not yet received an importance rating within physics.

Please rate this article, and then leave comments here to explain the ratings and/or to identify the strengths and weaknesses of the article.

[edit] Isn't "entropy coding" the same thing as "lossless compression"?

Remark: This article contains some problems that appear not worth correcting because the article seems approximately fully-redundant with the article on lossless compression. Also "entropy coding" would be better a better subject title than "entropy encoding". -- comments by 4.65.146.242 moved from article

Entropy encoding is only a subset of lossless compression. LZ77 compression, for instance, is an important compression technique that isn't any form of entropy encoding. -- Antaeus Feldspar 20:43, 4 Jun 2005 (UTC)
Actually, I don't know where Feldspar gets his definitions, but I mostly agree with the prior comment by 4.65.146.242. In the usage I have experienced, the terms "entropy coding" and "lossless compression" are synonymous, and both terms apply to such things as Lempel-Ziv coding. I have never previously heard anyone assert that LZ coding or other such things are not entropy coding. The content of this page also seems rather overly simplistic. For example, I don't think what it describes is a very accurate description of what happens in arithmetic coding. I also agree that "entropy coding" is better than "entropy encoding". Pawnbroker 05:28, 5 Jun 2005 (UTC)
Then I don't know where you get your definitions, Pawn. People might use "entropy coding" as if it was a synonym of "lossless compression" (just as there are some major RFCs that incorrectly use "Huffman code" as if it was a synonym of "prefix code", instead of denoting only to those prefix codes created by a Huffman algorithm) but that's definitely not correct usage. Entropy encoding is encoding where each symbol is assigned a pattern whose length/cost corresponds to its entropy (hence the name). While entropy encoding is quite often used with LZ77 compression, as the two techniques complement each other, LZ77 is not an example of entropy encoding. -- Antaeus Feldspar 17:03, 5 Jun 2005 (UTC)
Can someone point to any textbook on information theory or any similar such authoritative source that draws a distinction between the two terms in this manner? Pawnbroker 05:46, 12 Jun 2005 (UTC)
Well, the JPEG standard defines "entropy encoding" as "A lossless procedure which converts a sequence of input symbols into a sequence of bits such that the average number of bits per symbol approaches the entropy of the input symbols", a description which does not apply to dictionary coding. However, if you weren't prepared before to believe that entropy coding might have something to do with entropy, I have no idea what sort of source you're prepared to accept as "authoritative". -- Antaeus Feldspar 16:08, 12 Jun 2005 (UTC)
Two responses: 1) Actually, Lempel-Ziv coding does fit into that definition pretty well, 2) the JPEG standard is a document that defines terms for its own purposes - it is not meant to be a textbook on information theory (it has a status perhaps roughly similar to those major RFCs that confuse Huffman coding with prefix coding - note for example that JPEG defines lossless coding in a way that includes only the kinds of lossless coding described in the JPEG document - clearly there are others). If you could point to a single textbook on information theory that draws such a distinction, that would suffice to satisfy me. Even a single paper in the IEEE Transactions on Information Theory would be something (or a digital communication textbook, or one or more papers in some other IEEE Transactions, like the Transactions on Signal Processing or the Transactions on Communications). Right now the assertion that there's a difference between the two terms is completely unsupported by published sources. Pawnbroker 04:48, 14 Jun 2005 (UTC)
Further elaborating on point 1 above - the quoted JPEG definition of "entropy encoding" is actually not bad. (It is certainly better than the one found here at Wikipedia today.) Note, in particular, that it does not refer to a symbol-by-symbol mapping of each input symbol to a particular codeword length. It only says that the average number of bits per symbol approaches the entropy of the input symbols. There are many ways to make the average approach the entropy (or at least be substantially less than the average number of bits used to represent the input to the entropy encoding process). Those ways include techniques like LZW coding and other dictionary-based methods. Many of those ways do not involve a mapping of individual input symbols to specific coded lengths. In fact, when the input symbols are not probabilistically independent of each other (e.g., when there is some statistical dependency between input symbol values, as there is with text, for example) any simple one-to-one mapping like a Huffman code will be inferior to methods that take into account the inter-symbol dependencies. The JPEG document is therefore evidence against the Feldsparian interpretation, not for it. Pawnbroker 23:29, 14 Jun 2005 (UTC)
Note also that entropy of a symbol is not the same thing as the negative log of the probability of the symbol value. Entropy includes the concept of averaging over all possible values of the input symbol. Entropy is the expected value of the negative log of the probability, not the actual value of the negative log of the probability associated with a particular symbol's value. The negative log of the probability of the sample value (assuming independent source symbol statistics) is the "information" conveyed by that sample value. The average amount of information per symbol is the entropy. Thus the above Feldspar quote that "each symbol is assigned a pattern whose length/cost corresponds to its entropy" is not correct in that sense. In a Huffman-style scheme, each symbol is assigned a pattern whose length corresponds (approximately) to the amount of information conveyed by its value (not the entropy conveyed by its value). Entropy coding therefore is conceptually focused on good average behavior on typical data, not at what happens from the micro perspective of individual symbols and individual short message lengths. Pawnbroker 23:52, 14 Jun 2005 (UTC)
A little clear thinking will make it clear that Lempel-Ziv does not fit into the quoted definition, since it is entirely the existence of repeated multi-symbol patterns that enables compression under this technique, which has absolutely no connection with the entropy of the input symbols save as the existence of repeated multi-symbol patterns requires the repetition of the component symbols. A simple example: Create two texts, each of which uses the characters A-Z twice. The first text is the sequence A-Z, followed by A-Z again. The second text is the sequence A-Z followed by the reverse of that sequence, Z-A. Assuming a fairly normal LZ77 compression scheme, the first text will probably be compressed to just a little over 50% of its size, and the second text will not be compressed at all -- results very different, even though the entropy of the input symbols is the same for the two texts.
Right now the assertion that there's no difference between the two terms is completely unsupported by published sources. -- Antaeus Feldspar 00:22, 15 Jun 2005 (UTC)
Those are artifically-constructed example sequences - they are not generated by a probabilistic source model, so it is impossible to talk about their entropy. 52 letters is also way too short a message to try to invoke probabilistic behaviour arguments. If you assume a probabilistic source model that, with equal probability, generates either the sequence A to Z or the sequence Z to A in each stage of its operation, then it can easily be shown that either LZ coding or any other good entropy coder will approach the underlying entropy after a long run of input data (much more than two 26-letter sequences), which is a rate of one bit per 26 letters. Actually, the existence of repeated patterns is very much coupled with the concept of entropy. A probabilistic source will tend to create repeated patterns of output. Certain patterns will be "typical" and others will be highly atypical. Some proofs of Shannon's lossless coding theorem involve segmenting the source in this fashion (something known as the "asymptotic equi-partition property"). Much of Lempel and Ziv's original paper is devoted to proofs of how their encoding method approaches the source entropy (under specific assumptions). Pawnbroker 01:28, 15 Jun 2005 (UTC)

Those are artifically-constructed example sequences - they are not generated by a probabilistic source model, so it is impossible to talk about their entropy. Yes, but it *is* possible to run those 52-letter files through a data compression algorithm, such as a "entropy coder" and/or LZ77. Also, it is theoretically possible (although highly unlikely) that exactly these sequences might be generated by a base-26 memoryless source.

In the papers I've read, "entropy coding" usually refers to assuming a simple 0-order entropy model. This leades to "entropy coders" such as Huffman or arithmetic coding. They generate compressed files roughly equal in length to the "0-order entropy" of the input file, even when consecutive letters in the input file are highly correlated (not at all memoryless). On such input files, coders such as LZ77 and bzip can get much smaller output files by exploiting the correlation. Mathematically, I suppose this has something to do with "first-order entropy" or "nth-order entropy" ...

2 examples of published sources:

"Length-limited variable-to-variable length codes for High–Performance Entropy Coding." Joshua Senecal. Mark Duchaineau. IEEE Transactions on Information Theory, 47(4):1533–1537, May 2001. http://graphics.cs.ucdavis.edu/~jgseneca/Papers/DCC_2004.pdf "Given a binary memoryless source, each bit b_i in the source has a probability of being either a 0 or a 1, denoted respectively by the pair (p_i, q_i). ... For a binary memoryless source with probabilities (p, q), the entropy H of the source is defined as H(p,q) = − (plog2(p) + qlog2(q)) ..."

IEEE Signal Processing Magazine 2001-09 http://www.rle.mit.edu/stir/documents/Goyal_SigProcMag2001.pdf "Entropy codes are used for lossless coding of discrete random variables. Consider the discrete random variable z with alphabet I. An entropy code y assigns a unique binary string, called a codeword, to each i \in I. ... The entropy code y is called optimal if it is a prefix code that minimizes L(y). Huffman codes ... are examples of optimal codes. The performance of an optimal code is bounded by H(z) <= L(y) < H(z)+1 where H(z) = -\sum_{i \in I} p_z (i) \log_2 p_z (i) is the entropy of z."

Both of these give definitions of "entropy" that give memoryless (uncorrelated) symbol entropy, also known as "0-order entropy". The second one (identical to one of the equations in the entropy article) can be applied to any file, including the 52 byte files mentioned above.

--DavidCary 05:05, 15 Jun 2005 (UTC)

Those seem like good citations. It's also nice to see someone else enter the conversation. To me, the Goyal reference speaks the most directly on the subject of our discussion. (The other one doesn't really supply a definition of "entropy coding" -- it just discusses a design problem for a source that is assumed to have memoryless and stationary statistics during the development of the method.) So it appears from Goyal that the phrase "entropy coding" (or at least "entropy code") is sometimes used in appropriate literature in a manner that restricts the scope of the concept to the handling of "zero-order" statistics. I think this justifies a different approach to this page than what I previously advocated, although I think there should be some acknowledgement that not everyone who uses the term "entropy coding" (e.g., as defined in JPEG) would include that assumption.
I would express some caution about the arithmetic coding example for two reasons: 1) in real applications, arithmetic coding is (almost) always used in a way that is both adaptive and context-based - these things don't quite fit into the zero-order model; 2) in arithmetic coding, the output bits are not just dependent on the current input symbol -- they are also affected by the entire sequence of preceding input symbols and by some subsequent input symbols (due to rounding effects).
Note that the Goyal definition of "entropy code" does not include arithmetic coding, as it is restricted to methods that assign "a unique binary string" to each input symbol value.
-Pawnbroker 23:07, 20 Jun 2005 (UTC)

[edit] Repetition of content

Much of this content seems to be saying that the length of the codes are assigned proportionally to the inverse logarithm of the probability. Couldn't this just be said once? --Berrinam 1 Jun 2005

Quite possibly this repeated material shouldn't even be there once, per the above conversation. Pawnbroker 00:42, 15 Jun 2005 (UTC)

[edit] Range coding and arithmetic coding

I've deleted range encoding from the list of most common techniques, as it's actually the same as arithmetic coding. See those articles (and their talk pages) for more on the issue (if you're interested).

--Simon G Best 19:44, 21 July 2006 (UTC)