Generalized Linear Models (GLM)

July 18, 2025

A step-by-step introduction to Generalized Linear Models (GLMs) starting from basic linear regression, explaining distributions, link functions, and model construction with an interactive tool.

RegressionGLM

Introduction to Generalized Linear Models (GLMs) Starting from Simple Regression

Introduction

Linear regression is a fundamental statistical model used to predict continuous outcomes. But what if your target variable is binary (like yes/no), or a count (like number of events)? Standard linear regression may fail: it can predict values outside the feasible range—like negative counts or probabilities over 1.

This is where Generalized Linear Models (GLMs) come in. GLMs extend linear regression by incorporating different distributions and transformations, making them applicable to a wide range of data types. In this article, we’ll build a GLM step by step, starting from the simple linear regression model you may already know.

1. A Quick Refresher: Simple Linear Regression

A typical simple linear regression model is written as:

Y = \beta_0 + \beta_1 X + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2)

This model assumes:

The outcome $Y$ is a continuous variable,
The error term $\varepsilon$ follows a normal distribution,
The mean of $Y$ is described by a linear function of $X$ : $\mathbb{E}[Y] = \beta_0 + \beta_1 X$ .

This setup works well when $Y$ behaves like a normal variable. But what if it doesn’t?

2. When Linear Regression Falls Short

Real-world data often violates the assumptions of linear regression. For example:

Binary outcome: Will a customer click an ad? ( $Y \in \{0, 1\}$ )
Count data: How many accidents happen per day? ( $Y \in \mathbb{N}$ )

Linear regression can produce invalid predictions in these cases:

Probabilities outside [0, 1],
Negative counts,
Non-constant variance and non-normal residuals.

To handle such cases, we need a more flexible framework: the Generalized Linear Model.

3. The Three Key Components of a GLM

A Generalized Linear Model is built from three components:

(1) Distribution of the Response Variable

GLMs assume the response variable $Y$ follows a distribution from the exponential family, such as:

Normal (for continuous data),
Bernoulli (for binary outcomes),
Poisson (for counts).

(2) Linear Predictor

Just like in linear regression, we define a linear combination of the predictors:

\eta = \beta_0 + \beta_1 X

In this article, we use this single-variable predictor for simplicity and clarity. In practice, the linear predictor can involve multiple variables and interaction terms, but we’ll stick with one variable ( $X$ ) to make the core ideas easier to follow.

(3) Link Function

The link function connects the mean of the response variable to the linear predictor:

\eta = g(\mu), \quad \text{where } \mu = \mathbb{E}[Y]

This function transforms the mean $\mu$ into a scale suitable for a linear model.

Common choices include:

Identity: $g(\mu) = \mu$ (used in linear regression),
Logit: $g(\mu) = \log\left(\frac{\mu}{1 - \mu}\right)$ (used for binary data),
Log: $g(\mu) = \log(\mu)$ (used for counts).

4. Example: Logistic Regression for Binary Data

Suppose we want to model whether someone clicks an ad ( $Y \in \{0, 1\}$ ). We can use:

Distribution: Bernoulli
Link function: Logit
Linear predictor: $\eta = \beta_0 + \beta_1 X$

Then the model becomes:

\log\left(\frac{p}{1 - p}\right) = \beta_0 + \beta_1 X

Solving for $p = \mathbb{P}(Y=1)$ gives:

p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}

This formula ensures that $p$ always lies between 0 and 1—something linear regression can’t guarantee.

5. How to Build a GLM: Step-by-Step

Here’s how to construct a GLM for any type of data:

Check your response variable
Is it continuous, binary, or a count?
Choose a distribution
Pick one from the exponential family (Normal, Bernoulli, Poisson, etc.).
Select a link function
This should match the range of your outcome and the nature of your predictor.
Define a linear predictor
Build an expression like $\eta = \beta_0 + \beta_1 X$ .
Estimate the parameters
Typically done using Maximum Likelihood Estimation (MLE).
Evaluate the model
Use metrics like AIC, deviance, or residuals to check the fit.

Interactive GLM Builder

Explore how GLMs work by choosing a distribution, link function, and predictor below. Adjust parameters to see how the model behaves:

Interactive GLM Builder

Choose Distribution

Select Link Function

Define Linear Predictor

View Predictions

Step 1: Choose Distribution

Normal Distribution

For continuous outcomes (e.g., height, temperature)

Example: Predicting house prices

Bernoulli Distribution

For binary outcomes (success/failure)

Example: Will a customer click an ad?

Poisson Distribution

For count data (number of events)

Example: Number of accidents per day

Summary

GLMs extend the familiar linear regression model by allowing for non-normal distributions and transforming the relationship between predictors and outcomes using link functions. This makes them powerful tools for modeling a wide range of data, from binary outcomes to counts and beyond.

Once you understand the GLM framework, you’ll be ready to dive into logistic regression, Poisson regression, and more specialized models.