LZWL

From Wikipedia, the free encyclopedia

LZWL is a syllable-based variant of the character-based LZW compression algorithm.

LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.

[edit] Syllables

According Compact Oxford English Dictionary syllable is defined as: ‘A unit of pronunciation having one vowel sound, with or without surrounding consonants, and forming all or part of a word.’

As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.

[edit] Algorithm

Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.

In initialization step the dictionary is filled up with all characters from alphabet. In each next step it is searched for maximal string S, which is from dictionary and matches the prefix of still non-coded part of the input. Number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and character that follows after S in file. Actual input position is moved forward by the length of S. Decoding has only one situation for solving. We can receive number of phrase, which is not from dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.

Syllable-base version is working over alphabet of syllables. In initialization step we add to the dictionary empty syllable and small syllables from database of frequent syllables. Finding string S and coding its number is analogical with character-based version, only that string S is a string of syllables. Number of phase S is encoded to output. It is possible that string S can be empty syllable.

If S is empty syllable, then we must get from file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to dictionary. Actual position in the file is moved forward by the length of S, in the case when S is empty syllable, the input position is moved forward by the length of K.

In adding a phrase to dictionary there is a difference to character-based version. Phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add new phrase to the dictionary. New phrase is created by concatenation S1 with the first syllable of S. This solution has two advantages. The first advantage is that strings are not created from syllables that appear only once. Second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.

[edit] External Links