TopMyGrade

GCSE/Mathematics/OCR

S6Interpret scatter graphs; correlation and causation

Notes

Scatter graphs, correlation and causation

Scatter graphs are tested on Papers 2 and 3 of OCR J560. Questions ask students to draw, describe and use lines of best fit, and to distinguish correlation from causation. These earn straightforward marks if the vocabulary is precise.

What is a scatter graph?

A scatter graph (scatter diagram) shows the relationship between two variables. Each data point is plotted as a coordinate pair (x, y). The pattern of points reveals whether and how the variables are associated.

Types of correlation

PatternNameExample
Points slope up left to rightPositive correlationHeight and shoe size
Points slope down left to rightNegative correlationHours of TV watched and exam mark
No clear patternNo correlationHat size and IQ

Strength of correlation:

  • Strong: points are closely clustered around a line.
  • Weak: points are more scattered.

Always describe both direction (positive/negative) and strength (strong/weak/no) in an OCR answer.

Line of best fit

Draw the line that best fits the data:

  • Passes through the mean point (x̄, ȳ) — plot the mean of x and mean of y, and make the line pass through this.
  • Roughly equal numbers of points above and below the line.
  • Drawn as a single straight line across the data range.

Interpolation: reading a value within the data range. Reliable. Extrapolation: reading a value beyond the data range. Unreliable — the relationship may not continue.

Example: If the line of best fit is drawn for heights (x) and arm spans (y), you can estimate arm span for a height within the range of the data (interpolation). Estimating for a height far outside the range is extrapolation and unreliable.

Using the line of best fit

Example: From a scatter graph, the line of best fit has equation y = 0.8x + 5. Predict y when x = 20. y = 0.8(20) + 5 = 21.

Alternatively, read the estimate directly off the graph by going up from x to the line, then across to the y-axis.

Correlation vs causation

Correlation does not imply causation.

Just because two variables are correlated does not mean one causes the other. There may be a third (confounding) variable, or the correlation may be coincidental.

Example: Ice cream sales and drowning rates are positively correlated — but ice cream does not cause drowning. Hot weather causes both.

OCR exam questions will ask you to "explain why correlation does not mean causation" — give an alternative explanation.

Outliers on scatter graphs

An outlier (anomalous data point) is a point that does not follow the general pattern. When drawing the line of best fit, outliers should be ignored (do not allow the line to be pulled toward them).

Common OCR exam mistakes

  1. Drawing a line of best fit that goes from point to point (it should be one straight line through the cloud).
  2. Forcing the line through the origin — the line of best fit need not pass through (0,0).
  3. Describing correlation as only "positive" without stating strength.
  4. Claiming that correlation means causation.
  5. Reading from the line of best fit when using extrapolation — failing to note this is unreliable.

AI-generated · claude-opus-4-7 · v3-ocr-maths

Practice questions

Try each before peeking at the worked solution.

  1. Question 13 marks

    Describe the correlation

    A scatter graph shows the temperature (°C) on the x-axis and the number of hot drinks sold in a café on the y-axis.

    The points show a strong negative correlation.

    (a) What does "strong negative correlation" mean in this context? [2]
    (b) Give a reason why the correlation does not imply causation. [1]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-ocr-maths

  2. Question 24 marks

    Use the line of best fit

    The scatter graph shows the revision time (hours) and the test score (%) for 12 students. The line of best fit passes through (2, 45) and (8, 75).

    (a) Find the equation of the line of best fit. [3]
    (b) Estimate the test score for a student who revises for 5 hours. [1]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-ocr-maths

  3. Question 33 marks

    Identify an outlier and comment

    A scatter graph shows a strong positive correlation between age of car (years) and repair cost (£). One point lies far from the line of best fit.

    (a) What is this point called? [1]
    (b) Should this point be included when drawing the line of best fit? Explain. [2]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-ocr-maths

  4. Question 42 marks

    Interpolation vs extrapolation

    Data on arm span (cm) and height (cm) of 15 students is displayed in a scatter graph. The data ranges from height 145 cm to 185 cm.

    (a) A student is 165 cm tall. Explain why using the line of best fit to estimate their arm span is reliable. [1]
    (b) Another estimate is made for a student who is 210 cm tall. Explain why this estimate is less reliable. [1]

    Ask AI about this

    AI-generated · claude-opus-4-7 · v3-ocr-maths

Flashcards

S6 — Interpret scatter graphs; correlation and causation

10-card SR deck for OCR Mathematics (J560) topic S6

10 cards · spaced repetition (SM-2)