Variance and Standard Deviation
A beginner-friendly guide to variance and standard deviation: definitions, calculation formulas, unbiased estimation, and why we square deviations.
Variance and Standard Deviation: From Formulas to Why We Square
Variance and standard deviation are the central measures for describing how spread out data are.
In this article, we’ll cover:
- Definitions and intuition
- Hand calculations and the shortcut formula
- Why sample variance uses (Bessel’s correction)
- Why deviations are squared instead of absolute values
- Interactive demos to build intuition
0. Notation
- Data points:
- Sample mean:
- Population variance:
- Sample variance (unbiased):
- Standard deviation: ,
1. Intuition: What are Variance and Standard Deviation?
Because deviations sum to zero, we square them and average:
Taking the square root returns to the original unit (meters, seconds, dollars, …):
- Variance = average squared distance from the mean
- Standard deviation = typical distance from the mean
Interactive Demo ①: Small vs. Large Variance (Histogram + ±σ bands)
This demo renders a responsive histogram (D3) for three presets—Low / Medium / High variance—computed with the sample variance ().
It overlays a red mean line (labelled “” for readability; numerically equal to here) and shaded , bands to visualize typical spread.
It also lists raw data and live stats, so you can link the shape of the histogram to the numbers.
Interactive Demo: Small vs. Large Variance
Low Variance (n = 18)
Distribution Histogram
Key Observations:
- • Low variance → data clustered tightly around the mean (narrow bell)
- • High variance → data spread widely from the mean (wide, flat bell)
- • The smooth curve approximates the normal distribution shape
- • Standard deviation bands show typical spread ranges
2. Shortcut Formula for Variance
Expanding the square gives a useful computational shortcut:
Derivation (step by step):
- Sum both sides:
- Use →
Thus,
3. Worked Example
Data: ()
- Mean:
- Sum of squared deviations:
- Sample variance:
- Standard deviation:
4. Why Divide by ? (Unbiased Estimation)
If we divide by :
the expectation is biased low:
Instead, dividing by gives
which satisfies —the unbiased estimator.
Interactive Demo ②: vs. (Unbiased vs. Biased)
This widget draws random samples from a fixed “population,” then shows side by side:
- variance with denominator (biased),
- variance with denominator (unbiased), and
- the true population variance.
You’ll see the -denominator underestimates on average, especially for small , while lines up with the population variance as theory predicts.
Interactive Demo: Understanding Biased vs Unbiased Variance
Sample Histogram
Step-by-Step Calculations
Key Observations:
Why n-1? When we use the sample mean x̄ to calculate deviations, we lose one degree of freedom. The biased estimator (÷n) systematically underestimates the population variance, especially for small samples.
Bessel's Correction: Dividing by (n-1) instead of n corrects this bias. The unbiased estimator's expected value equals the true population variance: E[s²] = σ².
Try different sample sizes: Notice how the bias is more pronounced with smaller samples (n=3-10) but becomes negligible as n grows large. The histogram shows how your sample compares to the true population distribution.
5. Why Do We Square Deviations?
- Compatible with the mean (least squares).
Minimizing yields ; minimizing yields the median. - Clean algebra.
relies on squaring. - Smooth and differentiable.
Essential for optimization and regression (absolute value has kinks at ). - Geometric meaning.
Squared deviations are squared Euclidean distances; orthogonal decompositions (ANOVA, regression) behave like Pythagoras. - Additivity for independent sums.
If , then .
(Caveat: squaring is sensitive to outliers; use robust alternatives like the median and MAD when needed.)
6. Summary
- Variance = squared deviations averaged; Standard deviation = square root in original units.
- Shortcut formula simplifies hand calculations.
- ensures unbiased estimation of population variance.
- Squaring offers algebraic, geometric, and optimization advantages.