Fisher Information

August 5, 2025

Fisher Information measures the amount of information a model provides about a parameter. Learn its precise definition, intuitive meaning, and connection to estimation theory.

EstimationFisherInformation

Introduction

In statistical estimation, we often ask: How precisely can we estimate an unknown parameter?

Fisher Information gives us a theoretical answer to this question. It is a central concept in statistics that tells us:

How sensitive a probability model is to its parameter
How much information the model provides about the parameter
What the best possible estimation accuracy could be

In this article, we’ll explain Fisher Information step-by-step, from formal definition to visual intuition, and explore how it connects to estimation limits like the Cramér–Rao bound.

1. The Formal Definition

Suppose we observe data $X$ generated from a probability model $p(x \mid \theta)$ that depends on a parameter $\theta$ . Then, the Fisher Information at $\theta$ is defined as:

\mathcal{I}(\theta) = \mathbb{E}_\theta\left[ \left( \frac{d}{d\theta} \log p(X \mid \theta) \right)^2 \right]

This expectation is taken under the assumption that $\theta$ is the true parameter value. So:

Fisher Information measures how much, on average, the data generated under $\theta$ “reacts” to changes in $\theta$ .

It is not a property of a specific dataset, but of the model itself at a given parameter value.

2. What Does “Information” Mean?

Here, “information” refers to the identifiability of $\theta$ from the data.

A model has high Fisher Information at $\theta$ if, when data is generated under that $\theta$ , the likelihood function reacts sharply to changes in $\theta$ .

In that case:

Small changes in $\theta$ cause big changes in likelihood
The likelihood function is sharply peaked
The model gives a strong statistical signal about where $\theta$ is

3. Why the Variance of the Score?

The derivative of the log-likelihood is called the score function:

S(\theta) = \frac{d}{d\theta} \log p(X \mid \theta)

This measures how sensitive the log-likelihood is to changes in $\theta$ for a given data point $X$ .

However, under regularity conditions, the expected score is always zero:

\mathbb{E}_\theta[S(\theta)] = 0

So we look at the variance of the score instead:

\mathcal{I}(\theta) = \mathrm{Var}_\theta[S(\theta)] = \mathbb{E}_\theta[S(\theta)^2]

This captures how wildly the score function fluctuates due to randomness in the data. In short:

The more variable the score, the more “information” the model provides about $\theta$ — on average.

4. Fisher Information Is a Function of the Parameter

A subtle but important point:

Fisher Information is a function of the parameter $\theta$ , not of the observed data.

It is not something you directly observe. Instead, it tells you how much information the model provides, assuming $\theta$ is the true value.

That’s why we can use Fisher Information to derive theoretical bounds on estimation accuracy, even before collecting data.

5. Estimating Fisher Information from Data

Since the true $\theta$ is unknown, we often substitute an estimate $\hat{\theta}$ and compute:

\mathcal{I}_{\text{obs}}(\hat{\theta}) = - \left. \frac{d^2}{d\theta^2} \log L(\theta) \right|_{\theta = \hat{\theta}}

This is called the observed Fisher Information. It provides a data-dependent approximation of the true information.

6. Fisher Information and the Cramér–Rao Bound

Fisher Information plays a key role in determining the best possible accuracy of any unbiased estimator.

The Cramér–Rao lower bound states:

\mathrm{Var}(\hat{\theta}) \geq \frac{1}{\mathcal{I}(\theta)}

So if Fisher Information is high, we can — in theory — estimate $\theta$ very precisely. If it’s low, there’s an unavoidable limit on estimation accuracy.

7. Visual Demo: Likelihood, Score, and Information

Fisher Information: Understanding Estimation Precision

Change the standard deviation σ and observe how the sharpness of the likelihood function, the slope of the score function, and the Fisher Information change together.

Standard Deviation σ:1.0

Fisher Information: 5.00

Estimation Lower Bound: 0.200

Score Function Slope: -5.0

Log-Likelihood Function

Small σ → Sharp peak → Easy to estimate

Score Function

Small σ → Steep slope → Sensitive to small errors

Fisher Information I(σ)

Small σ → High Fisher Info → High precision

Key Points:

• Small σ → Sharp likelihood, steep score slope, high Fisher Information → High precision estimation
• Large σ → Flat likelihood, gentle score slope, low Fisher Information → Low precision estimation
• Fisher Information I(σ) = n/σ² quantifies the "curvature" of the likelihood function
• Higher Fisher Information → Lower estimation variance (Cramér-Rao bound: Var ≥ 1/I)

Try this: Move the σ slider from 0.3 to 2.0 and observe how:
1) The likelihood function becomes "sharper" (smaller σ)
2) The score function slope becomes "steeper" (smaller σ)
3) The Fisher Information becomes "higher" (smaller σ)
This demonstrates why smaller noise leads to more precise parameter estimation.

Use this demo to visually explore:

How the likelihood function changes as the parameter changes
How the score function (the slope of log-likelihood) behaves
How the Fisher Information grows sharper or flatter depending on the model

Try adjusting parameters in common distributions (like the normal distribution), and observe how the likelihood curve’s shape — and the Fisher Information — respond.

Conclusion

Fisher Information is not just a mathematical curiosity — it is a fundamental quantity that:

Measures how sensitive a model is to its parameters
Defines the theoretical limit of estimation accuracy
Guides how we design estimators, confidence intervals, and experiments

Although defined as a function of the parameter, it tells us how the entire model enables us to learn from data.