Notes

Character encoding

Computers store everything as numbers, including text. A character set is an agreed mapping between characters and integer codes. The two you must know are ASCII and Unicode.

ASCII (American Standard Code for Information Interchange)

ASCII uses 7 bits, allowing 2⁷ = 128 distinct characters. The 8th bit is typically padded with 0 to fit a byte. Codes 0-31 are control characters (newline, tab, escape); 32-126 are printable; 127 is DEL.

Memorise these milestones:

Character	ASCII
Space	32
'0'	48
'9'	57
'A'	65
'Z'	90
'a'	97
'z'	122

The gap of 32 between an uppercase and lowercase letter is exam gold — you can convert between cases with a single subtraction.

Extended ASCII

Some systems use the 8th bit too, giving 256 codes. The extra 128 codes were used differently in different "code pages" (Western European, Cyrillic, etc.) — confusing and incompatible. Replaced by Unicode for any serious internationalisation.

Unicode

Unicode is a much bigger character set covering every writing system in the world — Latin, Greek, Cyrillic, Arabic, CJK, emoji, mathematical symbols. The most common encoding is UTF-8:

1 byte for ASCII characters (compatible).
2-4 bytes for other characters.

Other Unicode encodings exist (UTF-16, UTF-32) but UTF-8 dominates the web.

For GCSE you mainly need the principles:

Unicode supports many more characters than ASCII (over a million code points in theory).
Each character may take more than one byte to store.
More storage is the trade-off for international support.

Comparing ASCII and Unicode

Property	ASCII	Unicode (UTF-8)
Bits per character	7 (8 with pad)	Variable: 8 to 32
Distinct characters	128	~1.1 million
Languages	English/US	All major writing systems
Storage cost (English text)	1 B/char	1 B/char
Storage cost (Chinese, emoji)	impossible	3-4 B/char

✦Worked example— Worked example — ASCII arithmetic

Convert the lowercase letter 'h' to uppercase using ASCII codes. ASC('h') = ASC('a') + 7 = 97 + 7 = 104. Subtract 32: 104 - 32 = 72. CHR(72) = 'H'. ✓

✦Worked example— Worked example — encoding a word

Encode the word "Hi" in ASCII.

'H' → 72 → 01001000
'i' → 105 → 01101001 Concatenated: 01001000 01101001 (2 bytes total).

✦Worked example— Worked example — Unicode storage

A 100-character English document in UTF-8 takes ≈ 100 bytes. The same document with 100 Chinese characters takes 100 × 3 = ≈ 300 bytes. ASCII could store the English version equally cheaply but could not store the Chinese version at all.

⚠Common mistakes— Pitfalls

Forgetting case difference is 32. Critical for case-conversion questions.
Confusing characters with digits. ASC('5') = 53, not 5. Subtract 48 to convert digit characters to integers.
Treating ASCII as 8-bit. It's 7-bit. The 8th bit is a pad or used for parity in some systems.
Assuming one Unicode character = one byte. UTF-8 uses variable-length encoding.
Mixing up character and code value. 'A' is the character; 65 is the code.

Why does ASCII still matter?

It's the foundation. The first 128 Unicode code points are identical to ASCII. So plain English text is interpreted identically by any modern system. You can almost always assume A=65, '0'=48 without checking.

➜Try this— Quick check

What is the ASCII code for 'M'? Hint: 'A' = 65 and M is the 13th letter (12 places after A). 65 + 12 = 77.

What is the binary representation of 'M' as a 7-bit code? 77 = 64 + 8 + 4 + 1 = 1001101.

AI-generated · claude-opus-4-7 · v3-deep-computer-science

Practice questions

Try each before peeking at the worked solution.

Question 13 marks
ASCII basics
State (a) the number of bits per character in standard ASCII, and (b) the number of distinct characters it can represent.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 24 marks
ASCII for letters
Given ASC('A') = 65 and ASC('a') = 97, calculate (a) ASC('D') and (b) ASC('y').
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 34 marks
Case conversion
Explain how you can convert the character 'h' to 'H' using arithmetic on its ASCII code, and state the resulting code.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 44 marks
Digit character to integer
A program reads the character '7' from a file. Explain what's wrong with treating its ASCII code as the number 7, and how to convert it correctly.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 52 marks
ASCII vs Unicode
Give two advantages of using Unicode rather than ASCII.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 62 marks
Storage cost trade-off
Explain one disadvantage of using Unicode (UTF-8) instead of ASCII.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science
Question 75 marks
Encode a word
Encode the 4-character word "BYTE" in 7-bit ASCII (write each character's ASCII code in decimal). Given ASC('A') = 65.
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-computer-science

Flashcards

CS3.5 — Character encoding — ASCII and Unicode

12-card SR deck for AQA GCSE Computer Science topic CS3.5

12 cards · spaced repetition (SM-2)

CS3.5Character encoding: ASCII (7-bit) and Unicode; representing text as a sequence of integer codes; comparing range and storage cost

Notes

Character encoding

ASCII (American Standard Code for Information Interchange)

Extended ASCII

Unicode

Comparing ASCII and Unicode

Why does ASCII still matter?

Practice questions

ASCII basics

ASCII for letters

Case conversion

Digit character to integer

ASCII vs Unicode

Storage cost trade-off

Encode a word

Flashcards