Logistic Regression

March 22, 2026

Understand logistic regression from the sigmoid function to maximum likelihood estimation and cross-entropy loss, with an interactive demo.

RegressionClassificationGLM

From Linear to Logistic

In Simple Linear Regression, we modeled a continuous outcome $y$ as a linear function of $x$ . But what if $y$ is binary — just 0 or 1?

For example: Will a patient develop a disease? Will a customer click an ad?

Linear regression can predict values outside $[0, 1]$ , which makes no sense for probabilities. We need a model that always outputs a value between 0 and 1.

This is exactly what logistic regression does — and it fits naturally into the GLM framework as a special case with a Bernoulli distribution and a logit link function.

The Setup

We have:

Features $x = (x_1, x_2, \ldots, x_n)$
A binary outcome $y \in \{0, 1\}$
A linear predictor $z = w^\top x + b$

The key question: how do we map $z \in (-\infty, \infty)$ to a probability $p \in (0, 1)$ ?

The Sigmoid Function

The answer is the sigmoid (logistic) function:

P(y = 1 \mid x) = \frac{e^z}{1 + e^z} = \frac{1}{1 + e^{-z}}

This function has elegant properties:

For any real $z$ , the output is always between 0 and 1: $\;0 < \frac{e^z}{1+e^z} < 1$
At $z = 0$ , the probability is exactly 0.5
As $z \to +\infty$ , the probability approaches 1
As $z \to -\infty$ , the probability approaches 0

The sigmoid smoothly “squashes” the entire real line into the interval $(0, 1)$ , giving us a valid probability.

Maximum Likelihood Estimation

How do we find the best parameters $w$ and $b$ ? We use maximum likelihood estimation (MLE) — find the parameters that make the observed data most probable.

For a single data point $(x_i, y_i)$ , the likelihood is:

P(y_i \mid x_i) = p_i^{\,y_i} (1 - p_i)^{1 - y_i}

where $p_i = \frac{e^{z_i}}{1 + e^{z_i}}$ and $z_i = w^\top x_i + b$ .

For the full dataset, the likelihood is the product over all observations:

L = \prod_{i=1}^{n} p_i^{\,y_i} (1 - p_i)^{1 - y_i}

Taking the log gives us the log-likelihood:

\log L = \sum_{i=1}^{n} \left[ y_i \log p_i + (1 - y_i) \log(1 - p_i) \right]

Cross-Entropy Loss

Unlike OLS in linear regression, we cannot solve this by taking derivatives and setting them to zero — there is no closed-form solution.

Instead, we minimize the cross-entropy loss, which is the negative log-likelihood:

\mathcal{L} = -\sum_{i=1}^{n} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right]

This loss function penalizes confident wrong predictions heavily:

If $y_i = 1$ but the model predicts $p_i \approx 0$ , the loss $-\log(p_i)$ becomes very large
If $y_i = 0$ but the model predicts $p_i \approx 1$ , the loss $-\log(1 - p_i)$ becomes very large

We optimize this using iterative methods like gradient descent, where we repeatedly update:

w \leftarrow w - \alpha \frac{\partial \mathcal{L}}{\partial w}, \qquad b \leftarrow b - \alpha \frac{\partial \mathcal{L}}{\partial b}

Interactive Demo

Explore logistic regression hands-on. Adjust the weight and bias sliders to see the sigmoid curve change, or click Start Gradient Descent to watch the model learn optimal parameters automatically.

Click on the chart to add data points (top half → y=1, bottom half → y=0)

Weight (w): 2.000

Bias (b): 0.000

Iteration

Log Loss

0.0000

Accuracy

92.9%

y = 1y = 0SigmoidDecision Boundary

Things to Try

Click on the chart to add new data points (top half = y=1, bottom half = y=0)
Toggle the “Cross-Entropy Loss” tab to see the loss landscape as a heatmap
- Dark blue regions indicate low loss — the model fits the data well there
- Bright green/yellow regions indicate high loss — poor fit
- The red dot marks the current $(w, b)$ — gradient descent moves it toward the blue region
Start gradient descent and watch the red dot navigate toward the minimum on the loss surface
Add overlapping points (e.g., y=1 near x=−2) to see how the model handles noise

Connection to GLMs

Logistic regression is a Generalized Linear Model with:

Component	Choice
Distribution	Bernoulli
Link function	Logit: $g(\mu) = \log\frac{\mu}{1-\mu}$
Linear predictor	$\eta = w^\top x + b$

Once you classify observations, you can evaluate your model using Sensitivity, Specificity, and ROC curves.

Summary

Logistic regression transforms the unbounded linear predictor into a probability via the sigmoid function, then finds optimal parameters by minimizing cross-entropy loss through iterative optimization. It is one of the most fundamental classification models in statistics and machine learning — simple enough to interpret, yet powerful enough for real-world applications.