Principal Component Analysis (2D)
From centering the data to deriving Var(z) = w^T S w and solving the eigenvalue problem, this article explains PCA step-by-step using the maximum variance approach.
1. Introduction
Principal Component Analysis (PCA) can be understood as finding the direction in which the data has the largest variance.
Here we explain PCA in 2D, starting from raw data and ending at the eigenvalue problem — with a step-by-step derivation of the key formula:
2. Step 0: Data Centering
Given observations of two variables and , we first center each variable so that its mean is zero:
- Column means:
- Centering:
- Let be the centered random vector.
Then .
3. Step 1: Covariance Matrix
From the centered data, we compute the covariance matrix:
where:
4. Step 2: Defining the Maximum Variance Direction
Let be a unit vector () representing a direction.
The projection of onto this direction is:
The first principal component is the direction that maximizes the variance of :
5. Detailed Derivation: Why
5.1 Setup
- Centered vector:
- Covariance matrix:
- Projection direction:
- Projected scalar:
5.2 Variance Definition
By definition:
Since the data is centered:
Thus:
5.3 Expanding
First:
Because this is a scalar, we can reorder terms:
5.4 Bringing constants outside the expectation
Since is constant with respect to the expectation:
5.5 Recognizing the covariance matrix
By definition of :
So we obtain:
5.6 Component form for intuition
If , then:
This shows the variance is a quadratic form combining variances and covariance, weighted by direction coefficients.
6. Step 3: Solving via Lagrange Multipliers
We now solve:
Lagrangian:
Differentiating and setting to zero:
We have reduced PCA to an eigenvalue problem.
7. Step 4: Eigenvalues and Principal Components — Meaning and Interpretation
From the Lagrange multiplier method, we obtained the eigenvalue equation:
This tells us two things:
7.1 Eigenvectors = Principal Component Directions
- Each eigenvector of the covariance matrix points in a direction in the data space.
- Geometrically, if you draw an arrow in the direction of , it shows how you would “look” at the data to see a certain pattern of variation.
- In PCA, these directions are orthogonal (perpendicular) to each other — they define a new coordinate system aligned with the data’s natural spread.
7.2 Eigenvalues = Variance Along Those Directions
- The corresponding eigenvalue tells you how much variance the data has when projected onto .
- If is large, it means the data is very spread out in that direction.
- If is small, the data is tightly clustered along that direction.
7.3 Ordering by Variance
- Sort the eigenvalues in descending order:
- The eigenvector associated with the largest eigenvalue is the first principal component: the direction of maximum variance in the data.
- (second principal component) is orthogonal to and corresponds to the second-largest variance .
- This continues for higher dimensions, ensuring each new axis is perpendicular to all previous ones.
7.4 Why This Matters in PCA
- By keeping only the first few principal components (largest eigenvalues), we retain most of the variance while reducing dimensionality.
- In 2D, the first principal component often captures the “main trend” of the data, while the second captures the orthogonal “secondary trend.”
- This interpretation is the bridge between the geometry (rotation of coordinate axes) and the statistics (variance explained).
8. Step-by-Step Interactive Demo
Step-by-Step PCA in 2D
Step 0: Raw Data
Original 2D dataset with correlation.
- Raw Data — Show original scatter plot.
- Centering — Animate subtraction of means so the centroid is at the origin.
- Covariance Matrix — Display and explain its entries.
- Search — Find the eigenvectors that maximize the variance of the projected data.
- Principal Components — Display both PC1 and PC2 on the scatter, with their variances.
9. Key Takeaways
- PCA can be seen as variance maximization.
- The variance of a projection is a quadratic form .
- Solving the maximization with a unit-length constraint leads to the eigenvalue problem.
- Eigenvectors = PC directions, eigenvalues = variances along them.