Golomb coding

From Wikipedia, the free encyclopedia

Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. If negative values are present in the input stream then a overlap and interleave scheme is used, where all non-negative inputs are mapped to even numbers ( $x^\prime=2|x|$ ), and all negative numbers are mapped to odd numbers ( $x^\prime=2|x|+1$ ).

Rice coding is a special case of Golomb coding first described by, and independently invented by, Robert F. Rice. It is equivalent to Golomb coding where the tunable parameter is a power of two. This makes it extremely efficient for use on computers, since the division operation becomes a bitshift operation and the remainder operation becomes a bitmask operation.

The FLAC audio codec uses Rice coding to represent linear prediction residues.

1 Overview
2 Simple Algorithm
3 Example
4 Example code
5 Applications
6 References

[edit] Overview

Golomb coding uses a tunable parameter M to divide an input value into two parts: $q$ , the result of a division by M, and $r$ , the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When $M = 1$ Golomb coding is equivalent to unary coding.

Golomb-Rice codes can be thought of as codes that indicate a number by the position of the bin (q), and the offset within the bin (r). The above figure shows the position q, and offset r for the encoding of integer N using Golomb-Rice parameter M.

Formally, the two parts are given by the following expression, where $x$ is the number being encoded: $q = \left \lfloor \frac{\left (x-1 \right )}{M} \right \rfloor$ and $r = x - q M - 1$ The final result looks like: $\left (q+1 \right ) r$

Note that $r$ can be of a varying number of bits, and is specifically only b bits for Rice code, and switches between b-1 and b bits for Golomb code (i.e M is not a power of 2): Let $b = \lceil\log_2(M)\rceil$ . If $0 \leq r < 2^b-M$ , then use b-1 bits to encode r. If $2^b-M \leq r < M$ , then use b bits to encode r. Clearly, $b = log 2 (M)$ if M is a power of 2 and we can encode all values of r with b bits.

The parameter M is a function of the corresponding Bernoulli process, which is parameterized by $p = P (X = 0)$ the probability of success in a given Bernoulli Trial. $M$ and $p$ are related by these inequalities:

$(1-p)^M + (1-p)^{M+1} \leq 1 < (1-p)^{M-1} + (1-p)^M$

The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.

[edit] Simple Algorithm

Fix the parameter M to an integer value.
For N, the number to be encoded, find
1. quotient = q = int[N/M]
2. remainder = r = N%M
Generate Codeword
1. The Code format : <Quotient Code><Remainder Code>, where
2. Quotient Code
  1. Write quotient in unary coding
3. Remainder Code
  1. If M is power of 2, code remainder as binary format. So $log 2 (M)$ bits are needed. (Rice code)
  2. If M is not a power of 2, set
    1. Code the first $2 b - M$ values of r as plain binary using b-1 bits.
    2. Code the rest of the values of r by coding the number $r + 2 b - M$ in plain binary representation using b bits.

[edit] Example

Set M = 10. Thus $b = \lceil\log_2(10)\rceil = 4$ . The cutoff is $2 b - M = 16 - 10 = 6$

r	0	1	2	3	4	5	6	7	8	9
code	000	001	010	011	100	101	1100	1101	1110	1111

q	0	1	2	3	...
code	0	10	110	1110	...

42 would be encoded as qcode(q),rcode(r) = qcode(4),rcode(2) = 11110,010 (the comma isn't necessary, because the 0 at the end of the q code is enough to say when q ends and r begins).

[edit] Example code

Encode

void golombEncode(char* source, char* dest, int M)
{
     IntReader intreader(source);
     BitWriter bitwriter(dest);
     int num;
     int c = 0;
     while(intreader.hasLeft())
     {
         num = intreader.getInt();
         int q = num / M;
         for (int i = 0 ; i < q; i++)
             bitwriter.putBit(true); //write q ones
         bitwriter.putBit(false); //write one zero
         int v  = 1; 
         for (int i = 0 ; i < log2(M); i++)         
         {
             bitwriter.putBit( v & num );
             v = v <<1;
         }
     }
     bitwriter.close();
     intreader.close();
}

Decode

void golombDecode(char* source, char* dest, int M)
{
     BitReader bitreader(source);
     BitWriter bitwriter(dest);
     int q = 0;
     int nr = 0;
     while (bitreader.hasLeft())
     {
           while(bitreader.getBit()) q++;//potentially dangerous with malformed files.
           for (int a = 0; a < log2(M); a++)//read out the sequential log2(M) bits
               if (bitreader.getBit())
                  nr += 1 << a;
           nr += q*M; //add the bits and the multiple of M 
           
           //now write out the nr.
           for (int a = 0; a < 32; a++) //assuming that an int has 32 bits.
           {
               if (nr & 1 << a)
                  {
                      bitwriter.putBit(true);
                  }else
                  {
                       bitwriter.putBit(false);
                  }
           }
     }
     bitreader.close();
     bitwriter.close();     
}

[edit] Applications

The FLAC audio codec uses Rice coding to represent linear prediction residues.
Apple's ALAC audio codec uses modfied Rice coding after its Adaptive FIR filter.

The Golomb-Rice coder is used in the entropy coding stage of Rice Algorithm based lossless image codecs. One such experiment yields a compression ratio graph given below. See other entries in this category at the bottom of this page.

[edit] References

Golomb, S.W. (1966). , Run-length encodings. IEEE Transactions on Information Theory, IT--12(3):399--401
R. F. Rice, "Some Practical Universal Noiseless Coding Techniques, " Jet Propulsion Laboratory, Pasadena, California, JPL Publication 79--22, Mar. 1979.
Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing and Indexing Documents and Images." Second Edition. Morgan Kaufmann Publishers, San Francisco CA. 1999 ISBN 1-55860-570-3
David Salomon. "Data Compression",ISBN 0-387-95045-1.
http://ese.wustl.edu/class/fl06/ese578/GolombCodingNotes.pdf

Retrieved from "http://en.wikipedia.org../../../g/o/l/Golomb_coding.html"

Category: Lossless compression algorithms