Least Squares Regression

A gentle explanation of simple linear regression by combining the mathematical derivation with a visual demo of best-fit lines.

RegressionLeast Squares

What Does “Closest Line” Mean?

In simple linear regression, we fit a line of the form:

yi=β0+β1xi+εiy_i = \beta_0 + \beta_1 x_i + \varepsilon_i

Here, β0\beta_0 is the intercept, β1\beta_1 is the slope, and ε\varepsilon is the errors.
Given data points (xi,yi)(x_i, y_i), the goal of least squares is to find the line that is best “close” to all these points—in other words, the sum of squared vertical distances from each point to the line is minimized.

Defining the Objective Function

We measure quality by the sum of squared distances (errors):

S(β0,β1)=i=1nεi=i=1n(yi(β0+β1xi))2S(\beta_0, \beta_1)=\sum_{i=1}^n\varepsilon_i = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2

Our task: find β0,β1\beta_0, \beta_1 that minimize SS.

Derivation: Formula for Slope and Intercept

Step 1: Partial Derivatives for Minimization

Take partial derivatives with respect to β0\beta_0 and β1\beta_1, and set them to zero.

With respect to β0\beta_0:

Sβ0=2i=1n(yiβ0β1xi)=0\frac{\partial S}{\partial \beta_0} = -2 \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i) = 0

With respect to β1\beta_1:

Sβ1=2i=1nxi(yiβ0β1xi)=0\frac{\partial S}{\partial \beta_1} = -2 \sum_{i=1}^n x_i (y_i - \beta_0 - \beta_1 x_i) = 0

Step 2: Normal Equations

From these two conditions we get:

  1. yi=nβ0+β1xi\sum y_i = n \beta_0 + \beta_1 \sum x_i
  2. xiyi=β0xi+β1xi2\sum x_i y_i = \beta_0 \sum x_i + \beta_1 \sum x_i^2

Solving these, we obtain:

β1=(xixˉ)(yiyˉ)(xixˉ)2\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
β0=yˉβ1xˉ\beta_0 = \bar{y} - \beta_1 \bar{x}

where xˉ=1nxi\bar{x} = \tfrac{1}{n} \sum x_i and yˉ=1nyi\bar{y} = \tfrac{1}{n} \sum y_i.

Visual Demo of the Intuition

In the visualization below, you can drag the points around and see:

  • The regression line updating in real time
  • Vertical red lines showing distances from each point to the line
  • As you adjust the line, notice that these red distances—and particularly their sum of squares—change

When the sum of squared red distances becomes minimized, that line is the least squares solution.

Interactive Least Squares Regression

Regression Line: y = 1.10 + 0.95x
Sum of Squared Errors: 0.70
Drag the points to see how the regression line adjusts to minimize the sum of squared errors
XY
Data points (drag to move)
Regression line
Vertical distances (errors)
Mean point (x̄, ȳ)

Key Insights:

  • • The regression line always passes through the mean point (x̄, ȳ)
  • • Red dashed lines show the vertical distances from each point to the line
  • • The algorithm minimizes the sum of squared red distances
  • • Try moving points to see how the line responds instantly

Summary

Least squares regression finds the line that is as close as possible to all data points, in the sense of minimizing the sum of squared vertical distances. The derivation may be algebraic, but the underlying concept is beautifully intuitive and easy to visualize with the demo above.

← Back to Encyclopedia