Implementating Huffman algorithm is a good practice to progress in algorithms and is a good opportunity to train on working methods such as TDD.

Huffman compression method can be decomposed in 3 steps:

• computing a dictionary with frequency of occurrences
• building a tree representing frequencies and associated path
• storage of tree and path of each occurrence bit per bit

In next examples, we will work on the string “aaabbffppppeee kk aa”

## Computing frequencies

At first, we can imagine a dictionary of char * int containing each char frequency. It don’t like to work with chars when I implement binary data traitment algorithms. So, I created an empty function getting (byte * int) list

Then I wrote a test I executed each time I could compile:

A simple function to work with bytes list could be:

I did want to try compression on real files, so I couldn’t load entire content in a byte list. Next function is computing frequencies from a seekable stream.

## Building the tree

My tree model is simply:

The cost is is representing total occurencies count contained by child branches or leaf. The swith method helps to browse sub branches of a tree node.

When the model is written, we can write tests:

After tests writting, we can implement a function populating the tree from frequencies:

Calculated paths are:

occurrency path option binary frequency
a Some({[True;False]}) 10 5
p Some({[False; True]}) 01 4
e Some({[True; True; False]}) 110 3
b Some({[False; False; False]}) 000 2
f Some({[False; False; True]}) 001 2
k Some({[True; True; True; True]}) 1111 2
’ ‘ Some({[True; True; True; False]}) 1110 2

This table is demonstrating that implementation is correct. Occurrences whose frequencies are higher have the shortest paths.

A diagram summarizing this table could be like the following: ## Test online to compute a Huffman tree

Try huffman implementation compiled with fable.

occurrency binary frequency
a 10 5

## Storage

The occurrency ‘a’ will be coded in 2 bits. We can not write less than 8 bits in a stream. (with the WriteByte method) So I wrote a tiny BitWritter:

The buffer is a simple byte. The write method increases the len and shift bits of buffer. When len is equal to 8 bits, we write the byte in the stream.

To read bit per bit in a stream, I use:

Try this implementation on real files using gist