TopMyGrade

GCSE/Mathematics/AQA

S6Interpret scatter graphs; correlation and causation

Notes

Scatter graphs, correlation and causation

When you have paired numerical data (e.g. height & weight per person), a scatter graph shows the relationship between the two variables. From it you can describe correlation and, with care, predict.

Plotting a scatter graph

Each data point is a single (x, y) pair. The x-variable goes on the horizontal axis (the "explanatory" or "input"), the y-variable on the vertical (the "response" or "output").

Example: heights and weights of 8 students. Plot each student as one point.

Correlation — what to describe

When you "describe the correlation" you give:

  1. Direction: positive (y rises as x rises), negative (y falls as x rises), or none.
  2. Strength: strong (points cluster tightly around a line), moderate, or weak.
  3. Form: usually linear; occasionally curved.

Examples:

  • Height & weight: strong positive correlation.
  • Hours of TV & exam mark: weak / moderate negative correlation.
  • Shoe size & maths mark: no correlation.

Line of best fit

Draw a straight line that approximately balances the points above and below it. Use it to:

  • Estimate y for a given x (or vice versa).
  • Compute a rough slope/intercept (gradient = approximate change in y per unit change in x).

⚠ Don't extrapolate far beyond the data. The relationship may break down outside the observed range.

Interpolation vs extrapolation

  • Interpolation: estimating within the range of the data → usually safe.
  • Extrapolation: estimating outside → risky; the trend might not continue.

If the data covers heights 1.5–1.9 m, predicting weight at 2.5 m is extrapolation and not justified.

Correlation ≠ causation

A correlation says two variables move together; it does NOT say one causes the other.

Reasons:

  • Common cause (third variable): ice-cream sales correlate with drowning rates because both depend on hot weather, not because ice cream causes drowning.
  • Reverse causation: A may cause B, or B may cause A — without further evidence we can't tell.
  • Coincidence: in small data sets, a correlation may be a fluke.

To establish causation you typically need a controlled experiment or strong domain knowledge.

Examiner-style correlation phrasing

"There is a strong positive correlation between hours studied and test score, indicating that students who studied more tended to score higher. However, this does not prove that studying causes higher marks — there may be other factors (e.g. interest in the subject, prior knowledge) influencing both."

Outliers in scatter

A point well away from the line of best fit may indicate:

  • A measurement or recording error.
  • A genuinely unusual case.

Comment, but don't silently delete.

Common mistakesCommon mistakes (examiner traps)

  1. Saying "correlation" when there's none — sometimes the answer really is "no correlation".
  2. Confusing direction with strength. Negative ≠ weak.
  3. Using the line of best fit far beyond the data.
  4. Inferring causation from correlation alone.
  5. Treating one outlier as the whole story — comment, but report the broader pattern.

Try thisQuick check

A scatter of "ice-cream sales" vs "shark attacks per beach day" shows strong positive correlation. Does eating ice cream cause shark attacks? Why or why not?

No — the correlation is real, but both variables likely depend on a third (hot weather → more swimmers AND more ice-cream consumption). Classic confounding variable example.

AI-generated · claude-opus-4-7 · v3-deep-statistics

Practice questions

Try each before peeking at the worked solution.

  1. Question 12 marks

    Describe correlation

    (F1) A scatter graph of test scores vs hours of revision shows points clustered along a line going from bottom-left to top-right. Describe the correlation.

    [Foundation tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  2. Question 21 mark

    Identify "no correlation"

    (F2) A scatter graph of foot length vs reading speed in adults shows points scattered randomly. State the type of correlation.

    [Foundation tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  3. Question 32 marks

    Use a line of best fit

    (F/H3) From a line of best fit on a scatter graph of hours studied (x) and test score (y), the line passes through (1, 35) and (5, 75). Estimate the test score for a student who studied 3 hours.

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  4. Question 42 marks

    Interpolation vs extrapolation

    (F/H4) The same scatter graph covers 1 to 5 hours of study. Comment on whether predicting a score at 8 hours is safe.

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  5. Question 52 marks

    Correlation vs causation

    (F/H5) A study finds a strong positive correlation between the number of fire engines at a fire and the cost of damage. Does this mean fire engines cause damage?

    [Crossover tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  6. Question 62 marks

    Spotting outliers

    (H6) On a scatter of 20 data points, one point sits well above the line of best fit. Comment on what this could mean and whether to discard it.

    [Higher tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

  7. Question 72 marks

    Estimate strength of correlation

    (H7) Two scatter graphs are described:

    • Graph A: points form a tight band around a steep line.
    • Graph B: points form a wide cloud around a slightly upward-trending line.

    Which has the stronger correlation, and why?

    [Higher tier]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-deep-statistics

Flashcards

S6 — Interpret scatter graphs; correlation and causation

10-card SR deck for AQA GCSE Maths topic S6

10 cards · spaced repetition (SM-2)