Compression: lossy and lossless
Compression reduces the size of a file so it takes less storage space and transfers faster. OCR J277 tests the two types of compression, when each is appropriate, and how run-length encoding (a lossless algorithm) works.
Why compress files?
- Large files take longer to transmit over networks (email, web, streaming).
- Storage is not infinite — reducing file size saves space.
- Streaming audio/video in real time requires data to arrive fast enough — compression makes this possible.
- Example: an uncompressed 30-minute TV episode at 4K might be 100+ GB; compressed it can be under 1 GB.
Lossy compression
- Permanently removes some data from the file to reduce its size.
- The removed data cannot be recovered — the decompressed file is not identical to the original.
- The loss is designed to be imperceptible (or barely noticeable) to the human eye or ear.
- Achieves much higher compression ratios than lossless.
How lossy compression works (concept)
- In images: removes fine detail or colour information the eye is less sensitive to (e.g. in areas of uniform colour).
- In audio: removes sounds outside the range of human hearing or masked by louder sounds (psychoacoustic model).
- Repeated compression = cumulative quality loss (generation loss).
Examples
- Images: JPEG — lossy image compression; great for photographs.
- Audio: MP3, AAC — remove inaudible frequencies; typically reduce file size by 10:1.
- Video: H.264, H.265 — used for streaming (Netflix, YouTube).
When to use lossy
- When quality loss is acceptable and unnoticeable (web images, music streaming).
- When very small file size is a priority.
- When the file will not need to be edited further (editing a JPEG degrades quality each save).
Lossless compression
- All original data is preserved — the decompressed file is byte-for-byte identical to the original.
- Achieves lower compression ratios than lossy.
- Decompression is always perfect.
Examples
- Images: PNG, GIF — lossless; ideal for logos, screenshots, graphics with sharp edges.
- Audio: FLAC, WAV (uncompressed), ALAC.
- General: ZIP, GZIP — file archive compression (works on any file type).
When to use lossless
- When the original data must be fully recoverable (medical images, legal documents, software archives).
- When the file will be edited repeatedly.
- For text files, code, executables — any loss would corrupt the data.
Run-length encoding (RLE) — a lossless algorithm
RLE replaces repeated runs of the same value with a count + value pair.
Example (image pixels)
Original row: WWWWWWBBBWWWWWWWWWW (19 pixels)
RLE encoded: 6W 3B 10W (3 pairs instead of 19 individual values)
Binary example
Original: 11111110000011111111
RLE: (7,1) (5,0) (8,1) — 3 pairs instead of 20 bits.
RLE is very effective for images with large areas of the same colour (logos, diagrams, sky). It is ineffective for photographs with highly varied pixel values.
Comparison table
| Lossy | Lossless | |
|---|---|---|
| Data preserved? | No — some data lost permanently | Yes — original fully recoverable |
| File size reduction | Very high (10:1 or more) | Moderate |
| Quality | Degraded (ideally imperceptibly) | Perfect |
| Example formats | JPEG, MP3, H.264 | PNG, FLAC, ZIP |
| Best for | Photos, audio, video (streaming) | Medical/legal images, text, code |
Common OCR exam mistakes
- Saying lossless files are "the same size" after decompression — they are the same size as the ORIGINAL, not the compressed version. The compressed lossless file is smaller.
- Saying lossy compression "degrades quality every time you play/view it" — quality is fixed after the first compression. Further degradation only occurs when you re-compress an already-compressed file.
- Saying PNG is lossy — PNG is lossless. JPEG is lossy.
- Forgetting to state that lossy is "permanent" data loss — the removed data cannot be recovered.
AI-generated · claude-opus-4-7 · v3-ocr-computer-science