Sensitivity, Specificity, and ROC
How to evaluate medical test performance using a 2×2 table, ROC curves, and AUC calculations.
Today’s topic: sensitivity, specificity, and the ROC curve!
Let’s start with a basic truth: no medical test is perfect.
Sometimes a test says you’re sick when you’re not — that’s a false positive.
Sometimes it says you’re healthy when you’re actually sick — that’s a false negative.
The 2×2 Table: Four Possible Test Outcomes
To understand how well a test performs, imagine a simple 2×2 table:
| Disease Present | Disease Absent | |
|---|---|---|
| Test Positive | True Positive (TP) | False Positive (FP) |
| Test Negative | False Negative (FN) | True Negative (TN) |
This leads to four key outcomes:
- True Positive (TP): You are sick, and the test detects it.
- False Positive (FP): You’re healthy, but the test says you’re sick.
- True Negative (TN): You’re healthy, and the test confirms it.
- False Negative (FN): You are sick, but the test misses it.
Predictive Values: What Do Test Results Really Mean?
Positive Predictive Value (PPV)
The proportion of positive test results that are true positives:
Negative Predictive Value (NPV)
The proportion of negative test results that are true negatives:
These help answer practical questions like:
“If I tested positive, what are the chances I actually have it?”
Sensitivity and Specificity: Focusing on the Test Mechanics
Sensitivity (True Positive Rate)
How well the test detects people who are truly sick:
Specificity (True Negative Rate)
How well the test identifies healthy individuals:
These values help summarize what a test result might mean.
But what if a test gives too many false positives?
You might think: “Why not just raise the cutoff?”
Or if it’s missing real cases: “Why not lower it?”
And that’s the key issue—changing the cutoff shifts both sensitivity and specificity.
Improving one often comes at the cost of the other. This is the tradeoff we need to understand.
The Tradeoff: Adjusting the Test Threshold
Here’s the tricky part: sensitivity and specificity are a tradeoff.
Changing the test’s cutoff threshold — the point where a result is called “positive” — shifts the balance:
- Lowering the threshold increases sensitivity but reduces specificity.
- Raising the threshold does the opposite: higher specificity, lower sensitivity.
Each threshold setting gives a new 2×2 table. Imagine a slider that controls this threshold. You can try it with this interactive artifact!
Interactive Threshold Slider with 2×2 Table
2×2 Confusion Matrix
Calculated Metrics
Test Score Distribution
Instructions: Move the threshold slider to see how changing the cutoff value affects the 2×2 table and calculated metrics. Lower thresholds increase sensitivity but decrease specificity, and vice versa. This demonstrates the fundamental tradeoff in diagnostic testing.
So how do we choose the best threshold?
The ROC Curve: A Visual of All Possible Tradeoffs
To see the entire range of outcomes, we use the ROC curve. For many threshold values, we plot:
- x-axis: False Positive Rate (FPR = 1 − Specificity)
- y-axis: True Positive Rate (Sensitivity)
A perfect test reaches the top-left corner (FPR = 0, TPR = 1). A random test lies along the diagonal.
You can see how to make ROC curve with this simulation!
Live ROC Curve Builder
Current Point Metrics
Area Under Curve (AUC)
Random test: 0.500
Current test: Excellent
About AUC
The AUC represents the probability that the test will correctly rank a randomly chosen positive case higher than a randomly chosen negative case.
Calculated using the trapezoidal rule for numerical integration.
ROC Curve
🔴 Red dot: Current threshold point
🔵 Blue line: Complete ROC curve
Gray dashed: Random classifier (AUC = 0.5)
Instructions: Adjust the threshold slider to see how each point on the ROC curve is generated. Click "Animate ROC Building" to watch the curve being constructed point by point. The AUC represents the overall discriminative ability of the test - a perfect test would hug the top-left corner (AUC = 1.0), while a random test follows the diagonal (AUC = 0.5).
AUC: Summarizing Test Performance with One Number
The Area Under the ROC Curve (AUC) captures overall test quality:
Trapezoidal Rule:
Approximate area by summing trapezoids between ROC points:
Rectangular Rule (Left Riemann Sum):
Simpler but slightly less accurate:
In Summary
- Use a 2×2 table to define sensitivity, specificity, and predictive values.
- Test thresholds create a tradeoff between sensitivity and specificity.
- ROC curves visualize this tradeoff, and AUC gives a single-number summary.
Understanding these tools helps clinicians choose and interpret medical tests with clarity and confidence.