TopMyGrade

GCSE/Mathematics/AQA

S1Infer properties of populations from samples; sampling limitations

Notes

Sampling — populations, samples and bias

Statistics is the science of saying useful things about a population (everyone or everything you care about) using a sample (a subset you actually measure).

Population vs sample

  • Population: the entire group you want to learn about (e.g. all GCSE students in the UK).
  • Sample: the actual subset you collect data from (e.g. 200 students from one school).

You compute statistics (mean, median, etc.) on the sample and use them to estimate the corresponding parameters of the population.

Why sample at all?

Studying the entire population is usually too expensive, slow or impossible (think nationwide surveys, infinite continuous data, destructive testing). Sampling lets you make defensible claims using a fraction of the data.

What makes a good sample?

A good sample is:

  1. Random — every member of the population has a known, non-zero chance of being chosen.
  2. Representative — reflects the structure of the population.
  3. Large enough — bigger samples give more reliable estimates (P5 / Law of Large Numbers).

Sampling methods

  • Simple random sampling: every member has equal probability of selection (e.g. names from a hat, random number generator).
  • Systematic sampling: pick every k-th member from an ordered list (e.g. every 10th name on the register).
  • Stratified sampling: divide the population into strata (e.g. year groups) and sample proportionally from each.
  • Cluster sampling: pick whole clusters (classes, schools) and sample everyone in them.
  • Convenience sampling: ask whoever happens to be around — quick but typically biased.

Bias — when samples mislead

A biased sample systematically over- or under-represents some part of the population. Common sources:

  • Selection bias: only certain people are reachable (e.g. online survey misses people without internet).
  • Self-selection bias: only people who care strongly respond (e.g. complaint surveys).
  • Survivorship bias: only "survivors" are visible (e.g. studying successful start-ups).
  • Non-response bias: people who refuse differ from those who participate.

A biased sample can give a confident but wrong answer no matter how big it is.

Worked exampleExample: comparing methods

To estimate the mean shoe size of a school of 600 students:

  • Convenience: ask the football team. Will over-represent larger sizes — biased.
  • Stratified: pick numbers proportional to year groups. Reflects the school structure — usually best.
  • Simple random: random pick of 60 from the register. Fine, but a small chance of an unrepresentative draw.

Sample size

Bigger is better, but not without limits.

  • For a yes/no proportion to within a few percent: ≈ 400 is usually plenty.
  • Trade-offs: cost, time, response rate, and diminishing returns. Doubling the sample size cuts the random error by only about 30% (1/√2).

Common mistakesCommon mistakes (examiner traps)

  1. Equating "large sample" with "representative". A million convenience-sampled responses can still be biased.
  2. Confusing sample mean with population mean. Use for sample, μ for population.
  3. Ignoring non-response. Reporting only respondents distorts the picture.
  4. Sampling from the wrong population. A sample of GCSE pupils tells you nothing about A-Level students.
  5. Using a tiny pilot sample as the final answer.

Try thisQuick check

A school wants to estimate the average daily commute time of its 800 pupils. Suggest a stratified-sample plan that involves 40 pupils across 4 year groups.

Year groups all of equal size (200 each). Stratify by year: pick 10 pupils at random from each. Total = 40, balanced across years.

AI-generated · claude-opus-4-7 · v3-deep-statistics

Practice questions

Try each before peeking at the worked solution.

  1. Question 13 marks

    Population vs sample

    (F1) Define "population" and "sample" in statistics, and explain why we sample.

    [Foundation tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  2. Question 22 marks

    Identify bias

    (F2) A radio show asks listeners to phone in to vote on whether a policy is popular. Why might the result be biased?

    [Foundation tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  3. Question 33 marks

    Random sampling method

    (F/H3) Describe how to choose a simple random sample of 50 students from a list of 800 in alphabetical order.

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  4. Question 44 marks

    Stratified sample

    (H4) A school has 200 Y10, 240 Y11 and 160 Y12 pupils. A stratified sample of 75 is required. How many pupils should be chosen from each year?

    [Higher tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  5. Question 52 marks

    Systematic sampling

    (F/H5) A factory has 1000 items in a numbered list. Describe how to choose a systematic sample of 25.

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  6. Question 63 marks

    Comment on a sample size

    (F/H6) A poll uses 30 people to estimate the proportion of the UK adult population who own a car. Comment on the reliability.

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  7. Question 73 marks

    Choose the best method

    (H7) A council wants the average household income of a town. Three sampling methods are proposed:
    (a) Survey people leaving the train station at 5 p.m.
    (b) Stratified sample by postcode of randomly chosen households.
    (c) Online survey advertised on social media.

    State which is best and why.

    [Higher tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

Flashcards

S1 — Infer properties of populations from samples; sampling limitations

10-card SR deck for AQA GCSE Maths topic S1

10 cards · spaced repetition (SM-2)