What is Data Compression?

Data compression is the process of encoding information using fewer bits than the original representation. A device that performs data compression is an encoder and reversal of that process is accomplished with a decoder.

Encoding information may be useful for transmitting and storing information, at the cost of computational resources for the compression and decompression processes.

Lossless Compression

Lossless compression reduces bits by identifying and eliminating statistical redundancy.

Lossy Compression

Lossy compression reduces bits by removing unnecessary or less important information.

Entropy in Data Compression

Quote

Shannon observed that a sentence full of random words resembles a pot of boiling water–it is chaotic and unordered, and humans find it difficult to predict what the next word in the sentence will be. 1

Entropy is used to denote the randomness of the data being input into a compression algorithm. The more entropy within an input, the lesser the compression ratio. This indicates that we may compress “more-random” data less.

Algorithms:

Sources & Related Readings:

Footnotes

  1. Entropy, Compression, and Information Content