Notes

Scatter graphs, correlation and causation

When you have paired numerical data (e.g. height & weight per person), a scatter graph shows the relationship between the two variables. From it you can describe correlation and, with care, predict.

Plotting a scatter graph

Each data point is a single (x, y) pair. The x-variable goes on the horizontal axis (the "explanatory" or "input"), the y-variable on the vertical (the "response" or "output").

Example: heights and weights of 8 students. Plot each student as one point.

Correlation — what to describe

When you "describe the correlation" you give:

Direction: positive (y rises as x rises), negative (y falls as x rises), or none.
Strength: strong (points cluster tightly around a line), moderate, or weak.
Form: usually linear; occasionally curved.

Examples:

Height & weight: strong positive correlation.
Hours of TV & exam mark: weak / moderate negative correlation.
Shoe size & maths mark: no correlation.

Line of best fit

Draw a straight line that approximately balances the points above and below it. Use it to:

Estimate y for a given x (or vice versa).
Compute a rough slope/intercept (gradient = approximate change in y per unit change in x).

⚠ Don't extrapolate far beyond the data. The relationship may break down outside the observed range.

Interpolation vs extrapolation

Interpolation: estimating within the range of the data → usually safe.
Extrapolation: estimating outside → risky; the trend might not continue.

If the data covers heights 1.5–1.9 m, predicting weight at 2.5 m is extrapolation and not justified.

Correlation ≠ causation

A correlation says two variables move together; it does NOT say one causes the other.

Reasons:

Common cause (third variable): ice-cream sales correlate with drowning rates because both depend on hot weather, not because ice cream causes drowning.
Reverse causation: A may cause B, or B may cause A — without further evidence we can't tell.
Coincidence: in small data sets, a correlation may be a fluke.

To establish causation you typically need a controlled experiment or strong domain knowledge.

Examiner-style correlation phrasing

"There is a strong positive correlation between hours studied and test score, indicating that students who studied more tended to score higher. However, this does not prove that studying causes higher marks — there may be other factors (e.g. interest in the subject, prior knowledge) influencing both."

Outliers in scatter

A point well away from the line of best fit may indicate:

A measurement or recording error.
A genuinely unusual case.

Comment, but don't silently delete.

⚠Common mistakes— Common mistakes (examiner traps)

Saying "correlation" when there's none — sometimes the answer really is "no correlation".
Confusing direction with strength. Negative ≠ weak.
Using the line of best fit far beyond the data.
Inferring causation from correlation alone.
Treating one outlier as the whole story — comment, but report the broader pattern.

➜Try this— Quick check

A scatter of "ice-cream sales" vs "shark attacks per beach day" shows strong positive correlation. Does eating ice cream cause shark attacks? Why or why not?

No — the correlation is real, but both variables likely depend on a third (hot weather → more swimmers AND more ice-cream consumption). Classic confounding variable example.

AI-generated · claude-opus-4-7 · v3-deep-statistics

Practice questions

Try each before peeking at the worked solution.

Question 12 marks
Describe correlation
(F1) A scatter graph of test scores vs hours of revision shows points clustered along a line going from bottom-left to top-right. Describe the correlation.

[Foundation tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 21 mark
Identify "no correlation"
(F2) A scatter graph of foot length vs reading speed in adults shows points scattered randomly. State the type of correlation.

[Foundation tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 32 marks
Use a line of best fit
(F/H3) From a line of best fit on a scatter graph of hours studied (x) and test score (y), the line passes through (1, 35) and (5, 75). Estimate the test score for a student who studied 3 hours.

[Crossover tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 42 marks
Interpolation vs extrapolation
(F/H4) The same scatter graph covers 1 to 5 hours of study. Comment on whether predicting a score at 8 hours is safe.

[Crossover tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 52 marks
Correlation vs causation
(F/H5) A study finds a strong positive correlation between the number of fire engines at a fire and the cost of damage. Does this mean fire engines cause damage?

[Crossover tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 62 marks
Spotting outliers
(H6) On a scatter of 20 data points, one point sits well above the line of best fit. Comment on what this could mean and whether to discard it.

[Higher tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics
Question 72 marks
Estimate strength of correlation
(H7) Two scatter graphs are described:
- Graph A: points form a tight band around a steep line.
- Graph B: points form a wide cloud around a slightly upward-trending line.
Which has the stronger correlation, and why?

[Higher tier]
Ask AI about this
AI-generated · claude-opus-4-7 · v3-deep-statistics

Flashcards

S6 — Interpret scatter graphs; correlation and causation

10-card SR deck for AQA GCSE Maths topic S6

10 cards · spaced repetition (SM-2)

S6Interpret scatter graphs; correlation and causation

Notes

Scatter graphs, correlation and causation

Plotting a scatter graph

Correlation — what to describe

Line of best fit

Interpolation vs extrapolation

Correlation ≠ causation

Examiner-style correlation phrasing

Outliers in scatter

Practice questions

Describe correlation

Identify "no correlation"

Use a line of best fit

Interpolation vs extrapolation

Correlation vs causation

Spotting outliers

Estimate strength of correlation

Flashcards