Principal Component Analysis (2D)

August 15, 2025

From centering the data to deriving Var(z) = w^T S w and solving the eigenvalue problem, this article explains PCA step-by-step using the maximum variance approach.

PCALinear AlgebraEigenvaluesEigenvectors

1. Introduction

Principal Component Analysis (PCA) can be understood as finding the direction in which the data has the largest variance.
Here we explain PCA in 2D, starting from raw data and ending at the eigenvalue problem — with a step-by-step derivation of the key formula:

\mathrm{Var}(z) = \mathbf{w}^\top \mathbf{S} \mathbf{w}.

2. Step 0: Data Centering

Given $n$ observations of two variables $X$ and $Y$ , we first center each variable so that its mean is zero:

Column means: $\bar{x} = \frac{1}{n} \sum_{i=1}^n X_i,\quad \bar{y} = \frac{1}{n} \sum_{i=1}^n Y_i$
Centering: $X_i' = X_i - \bar{x},\quad Y_i' = Y_i - \bar{y}$
Let $\mathbf{x} = (X', Y')^\top$ be the centered random vector.
Then $E[\mathbf{x}] = \mathbf{0}$ .

3. Step 1: Covariance Matrix

From the centered data, we compute the covariance matrix:

\mathbf{S} = E[\mathbf{x}\mathbf{x}^\top] = \begin{pmatrix} \sigma_{xx} & \sigma_{xy} \\ \sigma_{xy} & \sigma_{yy} \end{pmatrix}

where:

$\sigma_{xx} = \mathrm{Var}(X')$
$\sigma_{yy} = \mathrm{Var}(Y')$
$\sigma_{xy} = \mathrm{Cov}(X', Y')$

4. Step 2: Defining the Maximum Variance Direction

Let $\mathbf{w} \in \mathbb{R}^2$ be a unit vector ( $\|\mathbf{w}\|=1$ ) representing a direction.
The projection of $\mathbf{x}$ onto this direction is:

z = \mathbf{w}^\top \mathbf{x}

The first principal component is the direction $\mathbf{w}$ that maximizes the variance of $z$ :

\max_{\|\mathbf{w}\|=1} \ \mathrm{Var}(z).

5. Detailed Derivation: Why $\mathrm{Var}(z) = \mathbf{w}^\top \mathbf{S} \mathbf{w}$

5.1 Setup

Centered vector: $\mathbf{x} = \begin{pmatrix} X' \\ Y' \end{pmatrix}, \quad E[\mathbf{x}] = \mathbf{0}$
Covariance matrix: $\mathbf{S} = E[\mathbf{x}\mathbf{x}^\top] = \begin{pmatrix} \sigma_{xx} & \sigma_{xy} \\ \sigma_{xy} & \sigma_{yy} \end{pmatrix}$
Projection direction: $\mathbf{w} = (w_1, w_2)^\top$
Projected scalar: $z = \mathbf{w}^\top \mathbf{x} = w_1 X' + w_2 Y'$

5.2 Variance Definition

By definition:

\mathrm{Var}(z) = E[z^2] - (E[z])^2

Since the data is centered:

E[z] = \mathbf{w}^\top E[\mathbf{x}] = 0

Thus:

\mathrm{Var}(z) = E[z^2]

5.3 Expanding $E[z^2]$

First:

z^2 = (\mathbf{w}^\top \mathbf{x})^2 = (\mathbf{w}^\top \mathbf{x})(\mathbf{w}^\top \mathbf{x})

Because this is a scalar, we can reorder terms:

(\mathbf{w}^\top \mathbf{x})(\mathbf{w}^\top \mathbf{x}) = \mathbf{w}^\top (\mathbf{x}\mathbf{x}^\top) \mathbf{w}

5.4 Bringing constants outside the expectation

Since $\mathbf{w}$ is constant with respect to the expectation:

\mathrm{Var}(z) = E[\mathbf{w}^\top \mathbf{x} \mathbf{x}^\top \mathbf{w}] = \mathbf{w}^\top E[\mathbf{x} \mathbf{x}^\top] \mathbf{w}

5.5 Recognizing the covariance matrix

By definition of $\mathbf{S}$ :

E[\mathbf{x} \mathbf{x}^\top] = \mathbf{S}

So we obtain:

\boxed{\mathrm{Var}(z) = \mathbf{w}^\top \mathbf{S} \mathbf{w}}

5.6 Component form for intuition

If $\mathbf{w} = (w_1, w_2)^\top$ , then:

\mathbf{w}^\top \mathbf{S} \mathbf{w} = w_1^2\sigma_{xx} + 2w_1w_2\sigma_{xy} + w_2^2\sigma_{yy}

This shows the variance is a quadratic form combining variances and covariance, weighted by direction coefficients.

6. Step 3: Solving via Lagrange Multipliers

We now solve:

\max_{\mathbf{w}} \ \mathbf{w}^\top \mathbf{S} \mathbf{w} \quad\text{s.t.}\quad \mathbf{w}^\top \mathbf{w} = 1

Lagrangian:

\mathcal{L}(\mathbf{w}, \lambda) = \mathbf{w}^\top \mathbf{S} \mathbf{w} - \lambda (\mathbf{w}^\top \mathbf{w} - 1)

Differentiating and setting to zero:

2\mathbf{S}\mathbf{w} - 2\lambda\mathbf{w} = 0 \quad\Rightarrow\quad \mathbf{S}\mathbf{w} = \lambda \mathbf{w}

We have reduced PCA to an eigenvalue problem.

7. Step 4: Eigenvalues and Principal Components — Meaning and Interpretation

From the Lagrange multiplier method, we obtained the eigenvalue equation:

\mathbf{S}\mathbf{w} = \lambda \mathbf{w}.

This tells us two things:

7.1 Eigenvectors = Principal Component Directions

Each eigenvector $\mathbf{w}_k$ of the covariance matrix $\mathbf{S}$ points in a direction in the data space.
Geometrically, if you draw an arrow in the direction of $\mathbf{w}_k$ , it shows how you would “look” at the data to see a certain pattern of variation.
In PCA, these directions are orthogonal (perpendicular) to each other — they define a new coordinate system aligned with the data’s natural spread.

7.2 Eigenvalues = Variance Along Those Directions

The corresponding eigenvalue $\lambda_k$ tells you how much variance the data has when projected onto $\mathbf{w}_k$ .
If $\lambda_k$ is large, it means the data is very spread out in that direction.
If $\lambda_k$ is small, the data is tightly clustered along that direction.

7.3 Ordering by Variance

Sort the eigenvalues in descending order: $\lambda_1 \ge \lambda_2 \ge \dots$
The eigenvector $\mathbf{w}_1$ associated with the largest eigenvalue $\lambda_1$ is the first principal component: the direction of maximum variance in the data.
$\mathbf{w}_2$ (second principal component) is orthogonal to $\mathbf{w}_1$ and corresponds to the second-largest variance $\lambda_2$ .
This continues for higher dimensions, ensuring each new axis is perpendicular to all previous ones.

7.4 Why This Matters in PCA

By keeping only the first few principal components (largest eigenvalues), we retain most of the variance while reducing dimensionality.
In 2D, the first principal component often captures the “main trend” of the data, while the second captures the orthogonal “secondary trend.”
This interpretation is the bridge between the geometry (rotation of coordinate axes) and the statistics (variance explained).

8. Step-by-Step Interactive Demo

Step-by-Step PCA in 2D

Step 0: Raw Data

Original 2D dataset with correlation.

Mean: (0, 0) n = 0

Raw Data — Show original scatter plot.
Centering — Animate subtraction of means so the centroid is at the origin.
Covariance Matrix — Display $\mathbf{S}$ and explain its entries.
Search — Find the eigenvectors that maximize the variance of the projected data.
Principal Components — Display both PC1 and PC2 on the scatter, with their variances.

9. Key Takeaways

PCA can be seen as variance maximization.
The variance of a projection is a quadratic form $\mathbf{w}^\top \mathbf{S} \mathbf{w}$ .
Solving the maximization with a unit-length constraint leads to the eigenvalue problem.
Eigenvectors = PC directions, eigenvalues = variances along them.