Talk:Canonical Huffman code

From Wikipedia, the free encyclopedia

This article probably needs the examples rewriting as Canonical Huffman coding starts with all-zeros for the shortest code and ends with all-ones for the longest code:

A   0
B  10
C 110
D 111

This is opposite to the examples as given. The method for construction is then:

code = 0
while more symbols:
    print symbol, code
    code = code + 1
    if next bit length:
        code = code << 1

Sladen 13:54, 15 December 2006 (UTC)

Done this and attempted to clean up. Please check over that it actually makes sense and is correct. Sladen 02:25, 20 December 2006 (UTC)

3 March 2007

The reason the longest codes typically have all-zeros is that this version of canonical Huffman coding guarantees that all non-zero bits will appear in the final ceil(log2(alphabetsize)) bits of the pattern. This allows the pattern to be stored in a fixed-size variable since the leading 0s don't need to be stored.

21 August 2007

The pseudo code in this article seems not working, perapse it's just me but... I get an invalid huffman tree if I follow it. But the one on this page is working perfectly. Somebody who know this better than me must check the validity of the pseudo code given in this article. (bad-self-learned-english, sorry...) —The preceding unsigned comment was added by 83.77.85.224 (talk) 09:28, August 21, 2007 (UTC)

[edit] Better examples

Two good example words might be headache and beachhead. Both of these words are at least 8 letters long, contain repeated letters and only use the first eight letters in the alphabet. cabbage and baggage are seven letters a piece.

[edit] headache

symbol weight Huffman bits code canonical
a      2      00      2    0    00
b      0              0
c      1      110     3    6    110
d      1      111     3    7    111
e      2      01      2    1    01
f      0              0
g      0              0
h      2      10      2    2    10
         8,-
   4,-           4,-
2,a   2,e   2,h      2,-
                  c,1   d,1
00    01    10    110   111
2,0,3,3,2,0,0,2

Could be combined with ace headache, abba had a bad headache.

symbol weight Huffman bits code canonical
a      7        00    2    0     00
b      3        10    2    1     01
c      1       111    3    4    100
d      3       110    3    5    101
e      2       101    3    6    110
f      0              0
g      0              0
h      2       110    3    7    111
                    19,-
         10,-                9,-
      7,a    3,b       6,-         3,-
                    3,d   2,e   2,h   1,c
      00     01     100   101   110   111
2,2,3,3,3,0,0,3

[edit] beachhead

symbol weight huffman? code bits
a      2        00      0   2
b      1       110      6   3
c      1      1110     14   4
d      1      1111     15   4
e      2        01      1   2
f      0                -   0
g      0                -   0
h      2        10      2   2

Could be combined with faded beachhead cafe.