What is Huffman Coding?

Huffman coding algorithm is a compression algorithm that uses the Greedy Choice property to compress data in a lossless form.

How does it work?

  • Huffman Coding is Based on lengths of assigned codes that are based on frequencies.
  • Variable Length Codes are known as Prefix Codes.
    • We utilize tree-traversal path codes (where each node is represented by a number) to represent a character such as ”A” with a code, such as “000”. This practice of using a tree to encode characters is called a Code Tree.

Algorithm Summary:

  1. Identify the characters with the 2 lowest frequencies.
  2. Make a 2-leaf node tree for the 2 lowest frequency characters, their parent is a node of the sum of their frequencies.
  3. Add the next lowest frequency character as a child of a new parent node created with the value of the sum of the previous frequencies, and the child’s frequency.
    1. Repeat step 2 until we have a sum of frequencies that is greater than the next lowest frequency character in the array.
    2. If the next lowest frequency character is greater than the next lowest frequency character in the array then we start from step 2.

How is greedy choice a property?

Greedy choice is a property of this algorithm as this algorithm attempts to optimize the tree-traversal-path based upon the highest frequency appearing character.

Technical implementation: