TopMyGrade

GCSE/Computer Science/AQA

CS3.5Character encoding: ASCII (7-bit) and Unicode; representing text as a sequence of integer codes; comparing range and storage cost

Notes

Character encoding

Computers store everything as numbers, including text. A character set is an agreed mapping between characters and integer codes. The two you must know are ASCII and Unicode.

ASCII (American Standard Code for Information Interchange)

ASCII uses 7 bits, allowing 2⁷ = 128 distinct characters. The 8th bit is typically padded with 0 to fit a byte. Codes 0-31 are control characters (newline, tab, escape); 32-126 are printable; 127 is DEL.

Memorise these milestones:

CharacterASCII
Space32
'0'48
'9'57
'A'65
'Z'90
'a'97
'z'122

The gap of 32 between an uppercase and lowercase letter is exam gold — you can convert between cases with a single subtraction.

Extended ASCII

Some systems use the 8th bit too, giving 256 codes. The extra 128 codes were used differently in different "code pages" (Western European, Cyrillic, etc.) — confusing and incompatible. Replaced by Unicode for any serious internationalisation.

Unicode

Unicode is a much bigger character set covering every writing system in the world — Latin, Greek, Cyrillic, Arabic, CJK, emoji, mathematical symbols. The most common encoding is UTF-8:

  • 1 byte for ASCII characters (compatible).
  • 2-4 bytes for other characters.

Other Unicode encodings exist (UTF-16, UTF-32) but UTF-8 dominates the web.

For GCSE you mainly need the principles:

  • Unicode supports many more characters than ASCII (over a million code points in theory).
  • Each character may take more than one byte to store.
  • More storage is the trade-off for international support.

Comparing ASCII and Unicode

PropertyASCIIUnicode (UTF-8)
Bits per character7 (8 with pad)Variable: 8 to 32
Distinct characters128~1.1 million
LanguagesEnglish/USAll major writing systems
Storage cost (English text)1 B/char1 B/char
Storage cost (Chinese, emoji)impossible3-4 B/char

Worked exampleWorked example — ASCII arithmetic

Convert the lowercase letter 'h' to uppercase using ASCII codes. ASC('h') = ASC('a') + 7 = 97 + 7 = 104. Subtract 32: 104 - 32 = 72. CHR(72) = 'H'. ✓

Worked exampleWorked example — encoding a word

Encode the word "Hi" in ASCII.

  • 'H' → 72 → 01001000
  • 'i' → 105 → 01101001 Concatenated: 01001000 01101001 (2 bytes total).

Worked exampleWorked example — Unicode storage

A 100-character English document in UTF-8 takes ≈ 100 bytes. The same document with 100 Chinese characters takes 100 × 3 = ≈ 300 bytes. ASCII could store the English version equally cheaply but could not store the Chinese version at all.

Common mistakesPitfalls

  1. Forgetting case difference is 32. Critical for case-conversion questions.
  2. Confusing characters with digits. ASC('5') = 53, not 5. Subtract 48 to convert digit characters to integers.
  3. Treating ASCII as 8-bit. It's 7-bit. The 8th bit is a pad or used for parity in some systems.
  4. Assuming one Unicode character = one byte. UTF-8 uses variable-length encoding.
  5. Mixing up character and code value. 'A' is the character; 65 is the code.

Why does ASCII still matter?

It's the foundation. The first 128 Unicode code points are identical to ASCII. So plain English text is interpreted identically by any modern system. You can almost always assume A=65, '0'=48 without checking.

Try thisQuick check

What is the ASCII code for 'M'? Hint: 'A' = 65 and M is the 13th letter (12 places after A). 65 + 12 = 77.

What is the binary representation of 'M' as a 7-bit code? 77 = 64 + 8 + 4 + 1 = 1001101.

AI-generated · claude-opus-4-7 · v3-deep-computer-science

Practice questions

Try each before peeking at the worked solution.

  1. Question 13 marks

    ASCII basics

    State (a) the number of bits per character in standard ASCII, and (b) the number of distinct characters it can represent.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  2. Question 24 marks

    ASCII for letters

    Given ASC('A') = 65 and ASC('a') = 97, calculate (a) ASC('D') and (b) ASC('y').

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  3. Question 34 marks

    Case conversion

    Explain how you can convert the character 'h' to 'H' using arithmetic on its ASCII code, and state the resulting code.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  4. Question 44 marks

    Digit character to integer

    A program reads the character '7' from a file. Explain what's wrong with treating its ASCII code as the number 7, and how to convert it correctly.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  5. Question 52 marks

    ASCII vs Unicode

    Give two advantages of using Unicode rather than ASCII.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  6. Question 62 marks

    Storage cost trade-off

    Explain one disadvantage of using Unicode (UTF-8) instead of ASCII.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

  7. Question 75 marks

    Encode a word

    Encode the 4-character word "BYTE" in 7-bit ASCII (write each character's ASCII code in decimal). Given ASC('A') = 65.

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-computer-science

Flashcards

CS3.5 — Character encoding — ASCII and Unicode

12-card SR deck for AQA GCSE Computer Science topic CS3.5

12 cards · spaced repetition (SM-2)