Unary coding

From Wikipedia, the free encyclopedia

Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n ones followed by a zero. The ones and zeros are interchangeable without loss of generality.

Unary coding is an optimally efficient encoding for the following discrete probability distribution

\operatorname{P}(n) = 2^{-n}\,

for n = 1,2,3,.... It is in fact optimal for any geometric distribution

\operatorname{P}(n) = (k-1)k^{-n}\,

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

\operatorname{P}(n) \ge \operatorname{P}(n+1) + \operatorname{P}(n+2)\,

for n = 1,2,3,....

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

[edit] References

  • Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
  • Professor K.R Rao, EE5359:Principles of Digital Video Coding.
In other languages