Huffman Coding

Applying priority queues for lossless data compression.

Key Notes

Huffman Coding is a brilliant greedy algorithm used for lossless data compression. The core idea is to assign variable-length codes to input characters, with the lengths of the codes being based on the frequencies of the characters. Frequent characters get shorter codes, while infrequent characters get longer codes, leading to an overall reduction in the number of bits required to represent the data. The algorithm uses a min-priority queue, where each element is a tree node with a frequency value. Initially, the priority queue contains a leaf node for each character, with its frequency as the priority. The algorithm then repeatedly extracts the two nodes with the lowest frequencies from the queue. It creates a new internal node whose frequency is the sum of the two extracted nodes' frequencies, making the extracted nodes its left and right children. This new node is then inserted back into the priority queue. This process continues until only one node, the root of the Huffman tree, remains. The path from the root to any character's leaf node defines its binary code (e.g., left path is '0', right path is '1'). This method guarantees an optimal prefix code, meaning no character's code is a prefix of another's, which simplifies the decompression process.

Back to Heaps & Priority Queues

Huffman Coding

Applying priority queues for lossless data compression.

Key Notes