Logistic Regression
Understand logistic regression from the sigmoid function to maximum likelihood estimation and cross-entropy loss, with an interactive demo.
From Linear to Logistic
In Simple Linear Regression, we modeled a continuous outcome as a linear function of . But what if is binary — just 0 or 1?
For example: Will a patient develop a disease? Will a customer click an ad?
Linear regression can predict values outside , which makes no sense for probabilities. We need a model that always outputs a value between 0 and 1.
This is exactly what logistic regression does — and it fits naturally into the GLM framework as a special case with a Bernoulli distribution and a logit link function.
The Setup
We have:
- Features
- A binary outcome
- A linear predictor
The key question: how do we map to a probability ?
The Sigmoid Function
The answer is the sigmoid (logistic) function:
This function has elegant properties:
- For any real , the output is always between 0 and 1:
- At , the probability is exactly 0.5
- As , the probability approaches 1
- As , the probability approaches 0
The sigmoid smoothly “squashes” the entire real line into the interval , giving us a valid probability.
Maximum Likelihood Estimation
How do we find the best parameters and ? We use maximum likelihood estimation (MLE) — find the parameters that make the observed data most probable.
For a single data point , the likelihood is:
where and .
For the full dataset, the likelihood is the product over all observations:
Taking the log gives us the log-likelihood:
Cross-Entropy Loss
Unlike OLS in linear regression, we cannot solve this by taking derivatives and setting them to zero — there is no closed-form solution.
Instead, we minimize the cross-entropy loss, which is the negative log-likelihood:
This loss function penalizes confident wrong predictions heavily:
- If but the model predicts , the loss becomes very large
- If but the model predicts , the loss becomes very large
We optimize this using iterative methods like gradient descent, where we repeatedly update:
Interactive Demo
Explore logistic regression hands-on. Adjust the weight and bias sliders to see the sigmoid curve change, or click Start Gradient Descent to watch the model learn optimal parameters automatically.
Click on the chart to add data points (top half → y=1, bottom half → y=0)
Things to Try
- Click on the chart to add new data points (top half = y=1, bottom half = y=0)
- Toggle the “Cross-Entropy Loss” tab to see the loss landscape as a heatmap
- Dark blue regions indicate low loss — the model fits the data well there
- Bright green/yellow regions indicate high loss — poor fit
- The red dot marks the current — gradient descent moves it toward the blue region
- Start gradient descent and watch the red dot navigate toward the minimum on the loss surface
- Add overlapping points (e.g., y=1 near x=−2) to see how the model handles noise
Connection to GLMs
Logistic regression is a Generalized Linear Model with:
| Component | Choice |
|---|---|
| Distribution | Bernoulli |
| Link function | Logit: |
| Linear predictor |
Once you classify observations, you can evaluate your model using Sensitivity, Specificity, and ROC curves.
Summary
Logistic regression transforms the unbounded linear predictor into a probability via the sigmoid function, then finds optimal parameters by minimizing cross-entropy loss through iterative optimization. It is one of the most fundamental classification models in statistics and machine learning — simple enough to interpret, yet powerful enough for real-world applications.