Fisher Information
Fisher Information measures the amount of information a model provides about a parameter. Learn its precise definition, intuitive meaning, and connection to estimation theory.
Introduction
In statistical estimation, we often ask: How precisely can we estimate an unknown parameter?
Fisher Information gives us a theoretical answer to this question. It is a central concept in statistics that tells us:
- How sensitive a probability model is to its parameter
- How much information the model provides about the parameter
- What the best possible estimation accuracy could be
In this article, we’ll explain Fisher Information step-by-step, from formal definition to visual intuition, and explore how it connects to estimation limits like the Cramér–Rao bound.
1. The Formal Definition
Suppose we observe data generated from a probability model that depends on a parameter . Then, the Fisher Information at is defined as:
This expectation is taken under the assumption that is the true parameter value. So:
Fisher Information measures how much, on average, the data generated under “reacts” to changes in .
It is not a property of a specific dataset, but of the model itself at a given parameter value.
2. What Does “Information” Mean?
Here, “information” refers to the identifiability of from the data.
A model has high Fisher Information at if, when data is generated under that , the likelihood function reacts sharply to changes in .
In that case:
- Small changes in cause big changes in likelihood
- The likelihood function is sharply peaked
- The model gives a strong statistical signal about where is
3. Why the Variance of the Score?
The derivative of the log-likelihood is called the score function:
This measures how sensitive the log-likelihood is to changes in for a given data point .
However, under regularity conditions, the expected score is always zero:
So we look at the variance of the score instead:
This captures how wildly the score function fluctuates due to randomness in the data. In short:
The more variable the score, the more “information” the model provides about — on average.
4. Fisher Information Is a Function of the Parameter
A subtle but important point:
Fisher Information is a function of the parameter , not of the observed data.
It is not something you directly observe. Instead, it tells you how much information the model provides, assuming is the true value.
That’s why we can use Fisher Information to derive theoretical bounds on estimation accuracy, even before collecting data.
5. Estimating Fisher Information from Data
Since the true is unknown, we often substitute an estimate and compute:
This is called the observed Fisher Information. It provides a data-dependent approximation of the true information.
6. Fisher Information and the Cramér–Rao Bound
Fisher Information plays a key role in determining the best possible accuracy of any unbiased estimator.
The Cramér–Rao lower bound states:
So if Fisher Information is high, we can — in theory — estimate very precisely. If it’s low, there’s an unavoidable limit on estimation accuracy.
7. Visual Demo: Likelihood, Score, and Information
Fisher Information: Understanding Estimation Precision
Change the standard deviation σ and observe how the sharpness of the likelihood function, the slope of the score function, and the Fisher Information change together.
Log-Likelihood Function
Small σ → Sharp peak → Easy to estimate
Score Function
Small σ → Steep slope → Sensitive to small errors
Fisher Information I(σ)
Small σ → High Fisher Info → High precision
Key Points:
- • Small σ → Sharp likelihood, steep score slope, high Fisher Information → High precision estimation
- • Large σ → Flat likelihood, gentle score slope, low Fisher Information → Low precision estimation
- • Fisher Information I(σ) = n/σ² quantifies the "curvature" of the likelihood function
- • Higher Fisher Information → Lower estimation variance (Cramér-Rao bound: Var ≥ 1/I)
Try this: Move the σ slider from 0.3 to 2.0 and observe how:
1) The likelihood function becomes "sharper" (smaller σ)
2) The score function slope becomes "steeper" (smaller σ)
3) The Fisher Information becomes "higher" (smaller σ)
This demonstrates why smaller noise leads to more precise parameter estimation.
Use this demo to visually explore:
- How the likelihood function changes as the parameter changes
- How the score function (the slope of log-likelihood) behaves
- How the Fisher Information grows sharper or flatter depending on the model
Try adjusting parameters in common distributions (like the normal distribution), and observe how the likelihood curve’s shape — and the Fisher Information — respond.
Conclusion
Fisher Information is not just a mathematical curiosity — it is a fundamental quantity that:
- Measures how sensitive a model is to its parameters
- Defines the theoretical limit of estimation accuracy
- Guides how we design estimators, confidence intervals, and experiments
Although defined as a function of the parameter, it tells us how the entire model enables us to learn from data.