Notes

Data compression

Files take up storage and bandwidth — both finite. Compression shrinks file size by removing redundancy. The two main approaches at GCSE are lossless and lossy compression.

Why compress?

Storage — fit more in the same space.
Bandwidth — faster downloads / streaming.
Cost — less hosting and traffic.

Lossless compression

The compressed file decompresses back exactly identical to the original — no bits lost. Used when every bit matters: text, source code, executables, lossless image formats (PNG, BMP), zip archives.

How it works (in spirit):

Run-length encoding (RLE). Replace runs of repeated values with the value and a count. AAAAAA → 6A. Excellent for images with large flat areas.
Dictionary encoding (LZ77, Huffman). Build a dictionary of frequently occurring patterns, replace each with a short code. Used in ZIP, GZIP, PNG.
Huffman coding. Common characters get short codes; rare characters get long codes.

Typical ratio: 30-70% size reduction for text; less for already-random data.

Lossy compression

Discards data the human eye/ear can't easily detect. Smaller files at the cost of permanent quality loss — once decompressed, the original is gone.

JPEG for photos: discards fine colour detail, blocks of similar pixels are merged.
MP3, AAC, Opus for audio: removes inaudible frequencies (e.g. above 16 kHz) and masked sounds.
MP4, H.264, H.265 for video: combines lossy image + audio compression with motion prediction.

Typical ratio: 90% size reduction for photos and audio.

Which to use?

Scenario	Best choice	Why
Source code, legal documents	Lossless	Every bit must be exact
Photo album for the web	Lossy	Small files, eye-friendly
CT scan medical images	Lossless	Detail must survive
Music streaming	Lossy	Bandwidth matters more than perfection
Logo / icon	Lossless	Sharp edges and colours preserved
Camera raw → social post	Lossy	Acceptable quality loss for size

✦Worked example— Worked example — RLE

Compress AAAABBBCCDAA using RLE. Result: 4A3B2C1D2A — 10 characters → 10 characters here (no saving for short runs).

Now: AAAAAAAAAAAAAAAA (16 A's) → 16A — 3 characters from 16. Excellent.

RLE works only when there are long runs. Random text actually grows under RLE.

✦Worked example— Worked example — Huffman idea

Imagine a text where 'E' appears often, 'Z' rarely. Huffman might assign:

'E' → 10 (2 bits)
'Z' → 1110011 (7 bits)

Average bits per character drops, even though some get longer.

⚠Common mistakes— Pitfalls

Confusing lossy and lossless. Lossless = identical reconstruction; lossy = irreversible discard.
Believing "compressed = smaller always". Already-compressed data (a JPEG, an MP3) doesn't compress further with general-purpose tools.
Re-compressing lossy. Each save loses more quality (generation loss).
Using lossy for the wrong content. A scanned legal document needs every pixel.
Confusing compression with encryption. Compression makes files smaller; encryption makes them unreadable without a key.

Visual: lossy is one-way

ORIGINAL ─→ compress ─→ SMALLER LOSSY FILE ─→ decompress ─→ APPROXIMATION

You can't recover the original from the compressed version.

✦Worked example— Worked example — choose wisely

A web designer needs to put a 12 MP photo on a webpage. Should they use PNG (lossless) or JPEG (lossy)?

For a photo on a webpage:

File size matters (faster page load, less data for visitors).
Slight quality loss is invisible at typical viewing distances.
Choose JPEG for an order-of-magnitude smaller file.

For a logo with sharp edges and few colours:

Lossy compression creates ugly artefacts around edges.
File would already be small without compression.
Choose PNG for crisp output.

➜Try this— Quick check

State whether each is lossless or lossy:

ZIP archive — lossless.
JPEG — lossy.
MP3 — lossy.
PNG — lossless.
MP4 video — lossy.
FLAC audio — lossless.

AI-generated · claude-opus-4-7 · v3-deep-computer-science

Practice questions

Try each before peeking at the worked solution.

Question 14 marks
Lossless vs lossy
Define lossless and lossy compression and state how they differ in their effect on the original data.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 23 marks
Run-length encoding
Use run-length encoding to compress the string AAAAAABBBCCCCCCCC. State the compressed form and the original/compressed lengths.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 33 marks
Choose compression — photo
A photographer wants to email a 4000 × 3000 photo. Recommend a suitable compression type and justify.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 43 marks
Choose compression — code
A programmer wants to back up source code. Recommend a compression type and justify.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 53 marks
When RLE fails
Explain why RLE may increase the size of some files.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 63 marks
Re-compress and quality
A pupil opens a JPEG photo, edits it, and saves it as a JPEG repeatedly across many sessions. Explain why the image quality steadily decreases.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 72 marks
Compression vs encryption
State two differences between data compression and data encryption.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science

Flashcards

CS3.8 — Data compression — lossy and lossless

12-card SR deck for AQA GCSE Computer Science topic CS3.8

12 cards · spaced repetition (SM-2)

CS3.8Data compression: lossy vs lossless; benefits, drawbacks and choosing the right compression for a use case

Notes

Data compression

Why compress?

Lossless compression

Lossy compression

Which to use?

Visual: lossy is one-way

Practice questions

Lossless vs lossy

Run-length encoding

Choose compression — photo

Choose compression — code

When RLE fails

Re-compress and quality

Compression vs encryption

Flashcards