Universal data compression

From Wikipedia, the free encyclopedia

Universal data compression is impossible, though some claim to have found a way to do it. Any data compression algorithm necessarily leaves some input data uncompressed. A simple counting argument illustrates this.

Suppose there is an algorithm A which compresses all data sequences of length b (in bits) to shorter length data sequences. Let c be the length of the longest compressed sequence. Since A compresses all data sequences it must be the case that c < b. Now, there are 2b sequences of length b, but only 2c sequences of length c. Since 2c < 2b, it must be the case that at least two sequences, say U1 and U2 compress to the same value, say C. As a result, when uncompressing C it is not possible to determine whether U1 or U2 should be produced. Therefore, there is no way to undo algorithm A, and A is not correct.