Unary coding

From Wikipedia, the free encyclopedia

Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality.

n	coding
1	1
2	01
3	001
4	0001
5	00001
6	000001
7	0000001
8	00000001
9	000000001
10	0000000001

Unary coding is an optimally efficient encoding for the following discrete probability distribution

$\operatorname{P}(n) = 2^{-n}\,$

for $n = 1,2,3,...$ .

In symbol-by-symbol coding, it is optimal for any geometric distribution

$\operatorname{P}(n) = (k-1)k^{-n}\,$

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

$\operatorname{P}(n) \ge \operatorname{P}(n+1) + \operatorname{P}(n+2)\,$

for $n = 1,2,3,...$ . Although it is the optimal symbol-by-symbol coding for such probability distributions, its optimality can, like that of Huffman coding, be over-stated. Arithmetic coding has better compression capability for the last two distributions mentioned above because it does not consider input symbols independently, but rather implicitly groups the inputs.

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

[edit] References

Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
Professor K.R Rao, EE5359:Principles of Digital Video Coding.

v • d • e

Data compression methods

Lossless compression methods

Theory	Entropy · Complexity · Redundancy

Entropy encoding	Huffman · Adaptive Huffman · Arithmetic (Shannon-Fano · Range) · Golomb · Exp-Golomb · Universal (Elias · Fibonacci) · Asymmetric binary

Dictionary	RLE · LZ77/78 · LZW · LZWL · LZO · DEFLATE · LZMA · LZX · LZJB

Others	CTW · BWT · PPM · DMC

Audio compression methods

Theory	Convolution · Sampling · Nyquist–Shannon theorem

Audio codec parts	LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · μ-law · MDCT · Fourier transform · Psychoacoustic model

Others	Dynamic range compression · Speech compression · Sub-band coding

Image compression methods

Terms	Color space · Pixel · Chroma subsampling · Compression artifact

Methods	RLE · Fractal · Wavelet · SPIHT · DCT · KLT

Others	Bit rate · Test images · PSNR quality measure · Quantization

Video compression

Terms	Video Characteristics · Frame · Frame types · Video quality

Video codec parts	Motion compensation · DCT · Quantization

Others	Video codecs · Rate distortion theory (CBR · ABR · VBR)