Describing a population using statistics
Once you've collected and summarised data, you need to communicate the population's characteristics in plain English supported by figures.
A description checklist
A good descriptive statement covers:
- Centre — where the typical value sits (mean or median).
- Spread — how variable the data is (range or IQR).
- Shape — symmetric, skewed, bimodal, peaked.
- Outliers / unusual values — comment if present.
- Context — what the data measures and the units.
Example: "The median commute time was 28 minutes (IQR 12 minutes), with most journeys lying between 20 and 40 minutes. A small number of commutes longer than 60 minutes suggests a few people travel from far outside the local area."
Skewness — how to spot it
- Symmetric: mean ≈ median, both quartiles roughly equidistant from the median.
- Right-skewed (positive): long tail to the right; mean > median.
- Left-skewed (negative): long tail to the left; mean < median.
For income data, right-skew is typical: a few very-high earners drag the mean above the median.
Estimating from samples — implied claims
Saying "the median commute is 28 minutes" is shorthand for "the sample median is 28; we estimate the population median is around 28."
For a defensible claim:
- The sample must be representative (S1).
- The sample must be large enough that random fluctuation is small.
- Outliers should be checked, not silently removed.
Inferring from comparison
Often you'll compare two populations using their summary statistics. The mark schemes always demand:
- A central comparison (medians).
- A spread comparison (IQRs).
- Context (what does this mean for the situation?).
Writing in context
Bad: "The median is 28." Better: "The median commute time was 28 minutes, indicating that half of the people in the sample took less than 28 minutes."
Plotting the description
Often a single diagram supports the description:
- Box plot: shows median, IQR, range — perfect for skew identification.
- Histogram / cumulative frequency curve: shows the shape of the distribution.
⚠Common mistakes— Common mistakes (examiner traps)
- Numbers without context — "the mean is 7.4" means nothing without units.
- Citing only the mean. Always include a measure of spread and a comment on shape if relevant.
- Comparing without context — "A is higher" — higher what?
- Treating sample statistics as exact population values.
- Missing skewness — unequal whiskers in a box plot are an immediate clue.
➜Try this— Quick check
A box plot of 100 students' weights (kg): min 38, Q1 52, median 60, Q3 75, max 88. Describe the population.
- Median 60 kg, IQR 23 kg.
- Q3 − median = 15; median − Q1 = 8 → right-skewed.
- "Half the students weighed between 52 and 75 kg, with a median of 60 kg. The distribution is skewed toward heavier weights, with a small number of students above 80 kg."
AI-generated · claude-opus-4-7 · v3-deep-statistics