What is Huffman Coding?
Huffman coding algorithm is a compression algorithm that uses the Greedy Choice property to compress data in a lossless form.
How does it work?
- Huffman Coding is Based on lengths of assigned codes that are based on frequencies.
- Variable Length Codes are known as Prefix Codes.
- We utilize tree-traversal path codes (where each node is represented by a number) to represent a character such as ”A” with a code, such as “000”. This practice of using a tree to encode characters is called a Code Tree.
Algorithm Summary:
- Identify the characters with the 2 lowest frequencies.
- Make a 2-leaf node tree for the 2 lowest frequency characters, their parent is a node of the sum of their frequencies.
- Add the next lowest frequency character as a child of a new parent node created with the value of the sum of the previous frequencies, and the child’s frequency.
- Repeat step 2 until we have a sum of frequencies that is greater than the next lowest frequency character in the array.
- If the next lowest frequency character is greater than the next lowest frequency character in the array then we start from step 2.
How is greedy choice a property?
Greedy choice is a property of this algorithm as this algorithm attempts to optimize the tree-traversal-path based upon the highest frequency appearing character.