Variance and Standard Deviation

A beginner-friendly guide to variance and standard deviation: definitions, calculation formulas, unbiased estimation, and why we square deviations.

BeginnerVarianceStandard Deviation

Variance and Standard Deviation: From Formulas to Why We Square

Variance and standard deviation are the central measures for describing how spread out data are.
In this article, we’ll cover:

  • Definitions and intuition
  • Hand calculations and the shortcut formula
  • Why sample variance uses n1n-1 (Bessel’s correction)
  • Why deviations are squared instead of absolute values
  • Interactive demos to build intuition

0. Notation

  • Data points: x1,,xnx_1,\dots,x_n
  • Sample mean: xˉ=1ni=1nxi\displaystyle \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i
  • Population variance: σ2=Var(X)=E ⁣[(Xμ)2]\sigma^2=\mathrm{Var}(X)=\mathbb{E}\!\big[(X-\mu)^2\big]
  • Sample variance (unbiased): s2=1n1i=1n(xixˉ)2\displaystyle s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2
  • Standard deviation: σ=σ2\sigma=\sqrt{\sigma^2}, s=s2s=\sqrt{s^2}

1. Intuition: What are Variance and Standard Deviation?

Because deviations xixˉx_i-\bar{x} sum to zero, we square them and average:

s2=1n1i=1n(xixˉ)2.s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2.

Taking the square root returns to the original unit (meters, seconds, dollars, …):

  • Variance = average squared distance from the mean
  • Standard deviation = typical distance from the mean

Interactive Demo ①: Small vs. Large Variance (Histogram + ±σ bands)

This demo renders a responsive histogram (D3) for three presets—Low / Medium / High variance—computed with the sample variance (n1n-1).
It overlays a red mean line (labelled “μ\mu” for readability; numerically equal to xˉ\bar{x} here) and shaded ±1σ\pm1\sigma, ±2σ\pm2\sigma bands to visualize typical spread.
It also lists raw data and live stats, so you can link the shape of the histogram to the numbers.

Interactive Demo: Small vs. Large Variance

Low Variance (n = 18)

Raw Data:
[4.6, 4.7, 4.8, 4.8, 4.9, 4.9, 4.9, 5.0, 5.0, 5.0, 5.0, 5.1, 5.1, 5.1, 5.2, 5.2, 5.3, 5.4]
Mean (x̄)
5.00
Variance (s²)
0.04
Std Dev (s)
0.21

Distribution Histogram

Key Observations:

  • • Low variance → data clustered tightly around the mean (narrow bell)
  • • High variance → data spread widely from the mean (wide, flat bell)
  • • The smooth curve approximates the normal distribution shape
  • • Standard deviation bands show typical spread ranges

2. Shortcut Formula for Variance

Expanding the square gives a useful computational shortcut:

i=1n(xixˉ)2=i=1nxi2nxˉ2.(Shortcut Formula)\sum_{i=1}^n (x_i-\bar{x})^2 = \sum_{i=1}^n x_i^2 - n\,\bar{x}^{\,2}. \tag{Shortcut Formula}

Derivation (step by step):

  1. (xixˉ)2=xi22xixˉ+xˉ2(x_i-\bar{x})^2=x_i^2-2x_i\bar{x}+\bar{x}^2
  2. Sum both sides: (xixˉ)2=xi22xˉxi+nxˉ2\sum (x_i-\bar{x})^2=\sum x_i^2 -2\bar{x}\sum x_i + n\bar{x}^2
  3. Use xi=nxˉ\sum x_i=n\bar{x}xi2nxˉ2\sum x_i^2-n\bar{x}^2

Thus,

s2=1n1(xi2nxˉ2).s^2=\frac{1}{n-1}\Big(\sum x_i^2-n\bar{x}^2\Big).

3. Worked Example

Data: 2,4,4,4,5,5,7,92,4,4,4,5,5,7,9 (n=8n=8)

  • Mean: xˉ=5\bar{x}=5
  • Sum of squared deviations: 3232
  • Sample variance: s2=32/74.57s^2=32/7\approx4.57
  • Standard deviation: s2.14s\approx2.14

4. Why Divide by n1n-1? (Unbiased Estimation)

If we divide by nn:

1n(xixˉ)2,\frac{1}{n}\sum (x_i-\bar{x})^2,

the expectation is biased low:

E ⁣[1n(XiXˉ)2]=n1nσ2.\mathbb{E}\!\left[\frac{1}{n}\sum (X_i-\bar{X})^2\right]=\frac{n-1}{n}\sigma^2.

Instead, dividing by n1n-1 gives

s2=1n1(xixˉ)2,s^2=\frac{1}{n-1}\sum (x_i-\bar{x})^2,

which satisfies E[s2]=σ2\mathbb{E}[s^2]=\sigma^2—the unbiased estimator.

Interactive Demo ②: nn vs. n1n-1 (Unbiased vs. Biased)

This widget draws random samples from a fixed “population,” then shows side by side:

  • variance with denominator nn (biased),
  • variance with denominator n1n-1 (unbiased), and
  • the true population variance.

You’ll see the nn-denominator underestimates on average, especially for small nn, while n1n-1 lines up with the population variance as theory predicts.

Interactive Demo: Understanding Biased vs Unbiased Variance

8
5
1.5
Current Sample Data:
[]

Sample Histogram

Sample Histogram
Sample Mean (x̄)

Step-by-Step Calculations

Key Observations:

Why n-1? When we use the sample mean x̄ to calculate deviations, we lose one degree of freedom. The biased estimator (÷n) systematically underestimates the population variance, especially for small samples.

Bessel's Correction: Dividing by (n-1) instead of n corrects this bias. The unbiased estimator's expected value equals the true population variance: E[s²] = σ².

Try different sample sizes: Notice how the bias is more pronounced with smaller samples (n=3-10) but becomes negligible as n grows large. The histogram shows how your sample compares to the true population distribution.

5. Why Do We Square Deviations?

  1. Compatible with the mean (least squares).
    Minimizing (xim)2\sum(x_i-m)^2 yields m=xˉm=\bar{x}; minimizing xim\sum|x_i-m| yields the median.
  2. Clean algebra.
    Var(X)=E[X2](E[X])2\mathrm{Var}(X)=\mathbb{E}[X^2]-(\mathbb{E}[X])^2 relies on squaring.
  3. Smooth and differentiable.
    Essential for optimization and regression (absolute value has kinks at 00).
  4. Geometric meaning.
    Squared deviations are squared Euclidean distances; orthogonal decompositions (ANOVA, regression) behave like Pythagoras.
  5. Additivity for independent sums.
    If XYX\perp Y, then Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X+Y)=\mathrm{Var}(X)+\mathrm{Var}(Y).

(Caveat: squaring is sensitive to outliers; use robust alternatives like the median and MAD when needed.)

6. Summary

  • Variance = squared deviations averaged; Standard deviation = square root in original units.
  • Shortcut formula simplifies hand calculations.
  • n1n-1 ensures unbiased estimation of population variance.
  • Squaring offers algebraic, geometric, and optimization advantages.
← Back to Encyclopedia