分散と標準偏差

2025年8月17日

分散・標準偏差の定義、計算、n-1補正（ベッセル補正）、偏差を二乗する理由を初学者向けに解説します。

BeginnerVarianceStandard Deviation

分散と標準偏差：式から直感まで

分散と標準偏差は、データのばらつきを表す基本指標です。この記事では以下を扱います。

定義と直感
計算のショートカット式
なぜ標本分散は $n-1$ で割るのか
なぜ偏差を絶対値でなく二乗するのか

0. 記号

データ： $x_1,\dots,x_n$
標本平均： $\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i$
母分散： $\sigma^2=\mathbb{E}[(X-\mu)^2]$
不偏標本分散： $s^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2$
標準偏差： $\sigma=\sqrt{\sigma^2},\ s=\sqrt{s^2}$

1. 直感

偏差 $x_i-\bar{x}$ は総和が0になるため、ばらつき指標としては二乗して平均を取ります。

s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2

分散：平均からの二乗距離の平均
標準偏差：分散の平方根で、元の単位に戻した量

Interactive Demo: Small vs. Large Variance

Low Variance (n = 18)

Raw Data:

[4.6, 4.7, 4.8, 4.8, 4.9, 4.9, 4.9, 5.0, 5.0, 5.0, 5.0, 5.1, 5.1, 5.1, 5.2, 5.2, 5.3, 5.4]

Mean (x̄)

5.00

Variance (s²)

0.04

Std Dev (s)

0.21

Distribution Histogram

Key Observations:

• Low variance → data clustered tightly around the mean (narrow bell)
• High variance → data spread widely from the mean (wide, flat bell)
• The smooth curve approximates the normal distribution shape
• Standard deviation bands show typical spread ranges

2. ショートカット公式

展開すると、

\sum_{i=1}^n (x_i-\bar{x})^2 = \sum_{i=1}^n x_i^2 - n\bar{x}^2

となり、計算を簡略化できます。したがって

s^2=\frac{1}{n-1}\left(\sum x_i^2-n\bar{x}^2\right)

です。

3. 例

データ $2,4,4,4,5,5,7,9$ （ $n=8$ ）では

$\bar{x}=5$
偏差二乗和 $=32$
$s^2=32/7\approx4.57$
$s\approx2.14$

4. なぜ $n-1$ で割るか

$n$ で割ると平均的に過小評価になります。

\mathbb{E}\left[\frac{1}{n}\sum (X_i-\bar{X})^2\right]=\frac{n-1}{n}\sigma^2

そこで $n-1$ で割ると

\mathbb{E}[s^2]=\sigma^2

となり、不偏推定量になります。

Interactive Demo: Understanding Biased vs Unbiased Variance

Sample Size (n)8

Population Mean (μ)5

Population Std (σ)1.5

Current Sample Data:

[]

Sample Histogram

Sample Mean (x̄)

Step-by-Step Calculations

Key Observations:

Why n-1? When we use the sample mean x̄ to calculate deviations, we lose one degree of freedom. The biased estimator (÷n) systematically underestimates the population variance, especially for small samples.

Bessel's Correction: Dividing by (n-1) instead of n corrects this bias. The unbiased estimator's expected value equals the true population variance: E[s²] = σ².

Try different sample sizes: Notice how the bias is more pronounced with smaller samples (n=3-10) but becomes negligible as n grows large. The histogram shows how your sample compares to the true population distribution.

5. なぜ二乗するのか

平均との整合性がよい（最小二乗）
代数的に扱いやすい（分散公式）
微分可能で最適化しやすい
幾何学的にユークリッド距離と対応する
独立和で分散の加法性が成り立つ

まとめ

分散と標準偏差は、ばらつきを定量化する最重要指標。
ショートカット式で手計算を簡単にできる。
$n-1$ 補正は不偏性のために必要。
二乗には理論的・計算的な利点がある。

分散と標準偏差：式から直感まで

0. 記号

1. 直感

Interactive Demo: Small vs. Large Variance

Low Variance (n = 18)

Distribution Histogram

Key Observations:

2. ショートカット公式

3. 例

4. なぜ n−1n-1n−1 で割るか

Interactive Demo: Understanding Biased vs Unbiased Variance

Sample Histogram

Step-by-Step Calculations

Key Observations:

5. なぜ二乗するのか

まとめ

4. なぜ $n-1$ で割るか