Scatter graphs, correlation and causation
Scatter graphs are tested on Papers 2 and 3 of OCR J560. Questions ask students to draw, describe and use lines of best fit, and to distinguish correlation from causation. These earn straightforward marks if the vocabulary is precise.
What is a scatter graph?
A scatter graph (scatter diagram) shows the relationship between two variables. Each data point is plotted as a coordinate pair (x, y). The pattern of points reveals whether and how the variables are associated.
Types of correlation
| Pattern | Name | Example |
|---|---|---|
| Points slope up left to right | Positive correlation | Height and shoe size |
| Points slope down left to right | Negative correlation | Hours of TV watched and exam mark |
| No clear pattern | No correlation | Hat size and IQ |
Strength of correlation:
- Strong: points are closely clustered around a line.
- Weak: points are more scattered.
Always describe both direction (positive/negative) and strength (strong/weak/no) in an OCR answer.
Line of best fit
Draw the line that best fits the data:
- Passes through the mean point (x̄, ȳ) — plot the mean of x and mean of y, and make the line pass through this.
- Roughly equal numbers of points above and below the line.
- Drawn as a single straight line across the data range.
Interpolation: reading a value within the data range. Reliable. Extrapolation: reading a value beyond the data range. Unreliable — the relationship may not continue.
Example: If the line of best fit is drawn for heights (x) and arm spans (y), you can estimate arm span for a height within the range of the data (interpolation). Estimating for a height far outside the range is extrapolation and unreliable.
Using the line of best fit
Example: From a scatter graph, the line of best fit has equation y = 0.8x + 5. Predict y when x = 20. y = 0.8(20) + 5 = 21.
Alternatively, read the estimate directly off the graph by going up from x to the line, then across to the y-axis.
Correlation vs causation
Correlation does not imply causation.
Just because two variables are correlated does not mean one causes the other. There may be a third (confounding) variable, or the correlation may be coincidental.
Example: Ice cream sales and drowning rates are positively correlated — but ice cream does not cause drowning. Hot weather causes both.
OCR exam questions will ask you to "explain why correlation does not mean causation" — give an alternative explanation.
Outliers on scatter graphs
An outlier (anomalous data point) is a point that does not follow the general pattern. When drawing the line of best fit, outliers should be ignored (do not allow the line to be pulled toward them).
Common OCR exam mistakes
- Drawing a line of best fit that goes from point to point (it should be one straight line through the cloud).
- Forcing the line through the origin — the line of best fit need not pass through (0,0).
- Describing correlation as only "positive" without stating strength.
- Claiming that correlation means causation.
- Reading from the line of best fit when using extrapolation — failing to note this is unreliable.
AI-generated · claude-opus-4-7 · v3-ocr-maths