Fisher Information

Fisher Information measures the amount of information a model provides about a parameter. Learn its precise definition, intuitive meaning, and connection to estimation theory.

EstimationFisherInformation

Introduction

In statistical estimation, we often ask: How precisely can we estimate an unknown parameter?

Fisher Information gives us a theoretical answer to this question. It is a central concept in statistics that tells us:

  • How sensitive a probability model is to its parameter
  • How much information the model provides about the parameter
  • What the best possible estimation accuracy could be

In this article, we’ll explain Fisher Information step-by-step, from formal definition to visual intuition, and explore how it connects to estimation limits like the Cramér–Rao bound.


1. The Formal Definition

Suppose we observe data XX generated from a probability model p(xθ)p(x \mid \theta) that depends on a parameter θ\theta. Then, the Fisher Information at θ\theta is defined as:

I(θ)=Eθ[(ddθlogp(Xθ))2]\mathcal{I}(\theta) = \mathbb{E}_\theta\left[ \left( \frac{d}{d\theta} \log p(X \mid \theta) \right)^2 \right]

This expectation is taken under the assumption that θ\theta is the true parameter value. So:

Fisher Information measures how much, on average, the data generated under θ\theta “reacts” to changes in θ\theta.

It is not a property of a specific dataset, but of the model itself at a given parameter value.


2. What Does “Information” Mean?

Here, “information” refers to the identifiability of θ\theta from the data.

A model has high Fisher Information at θ\theta if, when data is generated under that θ\theta, the likelihood function reacts sharply to changes in θ\theta.

In that case:

  • Small changes in θ\theta cause big changes in likelihood
  • The likelihood function is sharply peaked
  • The model gives a strong statistical signal about where θ\theta is

3. Why the Variance of the Score?

The derivative of the log-likelihood is called the score function:

S(θ)=ddθlogp(Xθ)S(\theta) = \frac{d}{d\theta} \log p(X \mid \theta)

This measures how sensitive the log-likelihood is to changes in θ\theta for a given data point XX.

However, under regularity conditions, the expected score is always zero:

Eθ[S(θ)]=0\mathbb{E}_\theta[S(\theta)] = 0

So we look at the variance of the score instead:

I(θ)=Varθ[S(θ)]=Eθ[S(θ)2]\mathcal{I}(\theta) = \mathrm{Var}_\theta[S(\theta)] = \mathbb{E}_\theta[S(\theta)^2]

This captures how wildly the score function fluctuates due to randomness in the data. In short:

The more variable the score, the more “information” the model provides about θ\theta — on average.


4. Fisher Information Is a Function of the Parameter

A subtle but important point:

Fisher Information is a function of the parameter θ\theta, not of the observed data.

It is not something you directly observe. Instead, it tells you how much information the model provides, assuming θ\theta is the true value.

That’s why we can use Fisher Information to derive theoretical bounds on estimation accuracy, even before collecting data.


5. Estimating Fisher Information from Data

Since the true θ\theta is unknown, we often substitute an estimate θ^\hat{\theta} and compute:

Iobs(θ^)=d2dθ2logL(θ)θ=θ^\mathcal{I}_{\text{obs}}(\hat{\theta}) = - \left. \frac{d^2}{d\theta^2} \log L(\theta) \right|_{\theta = \hat{\theta}}

This is called the observed Fisher Information. It provides a data-dependent approximation of the true information.


6. Fisher Information and the Cramér–Rao Bound

Fisher Information plays a key role in determining the best possible accuracy of any unbiased estimator.

The Cramér–Rao lower bound states:

Var(θ^)1I(θ)\mathrm{Var}(\hat{\theta}) \geq \frac{1}{\mathcal{I}(\theta)}

So if Fisher Information is high, we can — in theory — estimate θ\theta very precisely. If it’s low, there’s an unavoidable limit on estimation accuracy.


7. Visual Demo: Likelihood, Score, and Information

Fisher Information: Understanding Estimation Precision

Change the standard deviation σ and observe how the sharpness of the likelihood function, the slope of the score function, and the Fisher Information change together.

1.0
Fisher Information: 5.00
Estimation Lower Bound: 0.200
Score Function Slope: -5.0

Log-Likelihood Function

Small σ → Sharp peak → Easy to estimate

Score Function

Small σ → Steep slope → Sensitive to small errors

Fisher Information I(σ)

Small σ → High Fisher Info → High precision

Key Points:

  • Small σ → Sharp likelihood, steep score slope, high Fisher Information → High precision estimation
  • Large σ → Flat likelihood, gentle score slope, low Fisher Information → Low precision estimation
  • Fisher Information I(σ) = n/σ² quantifies the "curvature" of the likelihood function
  • Higher Fisher Information → Lower estimation variance (Cramér-Rao bound: Var ≥ 1/I)

Try this: Move the σ slider from 0.3 to 2.0 and observe how:
1) The likelihood function becomes "sharper" (smaller σ)
2) The score function slope becomes "steeper" (smaller σ)
3) The Fisher Information becomes "higher" (smaller σ)
This demonstrates why smaller noise leads to more precise parameter estimation.

Use this demo to visually explore:

  • How the likelihood function changes as the parameter changes
  • How the score function (the slope of log-likelihood) behaves
  • How the Fisher Information grows sharper or flatter depending on the model

Try adjusting parameters in common distributions (like the normal distribution), and observe how the likelihood curve’s shape — and the Fisher Information — respond.


Conclusion

Fisher Information is not just a mathematical curiosity — it is a fundamental quantity that:

  • Measures how sensitive a model is to its parameters
  • Defines the theoretical limit of estimation accuracy
  • Guides how we design estimators, confidence intervals, and experiments

Although defined as a function of the parameter, it tells us how the entire model enables us to learn from data.

← Back to Encyclopedia