ATE, ATT, and ATC

A clear explanation of ATE, ATT, and ATC in causal inference, using a regression-based example with visual illustrations.

Causal InferenceRegressionBeginner

Introduction: Who Is the Effect For?

In causal inference, we want to understand how much impact a treatment or intervention (like attending cram school) has. But it’s crucial to clarify who the effect is being measured for.

For example, if test scores increased after attending a cram school, are we referring to:

  • All students?
  • Only the students who attended?
  • Students who didn’t attend, but might have?

These are all different questions. In this post, we’ll explain three key concepts that answer them:

  • ATE: Average Treatment Effect — the effect if everyone received the treatment.
  • ATT: Average Treatment effect on the Treated — the effect for those who actually received the treatment.
  • ATC: Average Treatment effect on the Controls — the effect for those who didn’t receive the treatment.

We’ll walk through an example using regression and clarify when and how each effect is used.

A Simple Example: Does Cram School Raise Test Scores?

Suppose we survey 100 students:

  • T (treatment): Did the student attend a cram school? (1 = yes, 0 = no)
  • Y (outcome): Final test score

Average test scores:

  • Students who attended: 85
  • Students who did not attend: 72

A naive conclusion might be: “Cram school increases scores by 13 points!” But that’s not necessarily a causal conclusion.

What Are ATE, ATT, and ATC?

ATE: Average Treatment Effect

”What if everyone attended cram school versus if no one did? How much would the average score differ?”

This is useful when evaluating general policy impact.

ATT: Effect on the Treated

”For students who actually attended, how much better did they do compared to if they hadn’t attended?”

This tells us how much the treatment helped the people who received it.

ATC: Effect on the Untreated

”For students who did not attend, what would have happened if they had?”

This tells us whether the treatment would have been helpful to those who missed it.

Why Differentiate?

The method you use can affect which of these you’re actually estimating.

For example, if students who attend cram school are more motivated or already better-performing, the ATT may differ greatly from the ATE. Being precise about who the effect is for prevents misleading conclusions.

Estimating ATE via Simple Regression

We can estimate the ATE using a simple regression model:

Yi=β0+β1Ti+εiY_i = \beta_0 + \beta_1 T_i + \varepsilon_i

Where:

  • YiY_i is student ii‘s test score
  • TiT_i indicates cram school attendance (1 or 0)
  • β1\beta_1 estimates the average effect of attending (≈ ATE)

Here, β^1\hat{\beta}_1 gives us an estimate of ATE.

Estimating ATT and ATC: It’s Not So Simple

ATT and ATC are effects for specific subgroups.

For ATT, we ask: “How would students who attended have done if they hadn’t?” But we can’t directly observe that. Worse, students who choose to attend cram school often share characteristics:

  • Higher prior achievement
  • More motivation
  • Supportive home environment

These confounding variables (covariates) affect both the decision to attend and the test scores.

Why Adjust for Covariates?

To fairly compare treated and untreated students, we must find untreated students who resembled the treated group in background:

  • Similar prior test scores
  • Similar family or study environments

This isn’t just about statistical fairness — it’s also about clearly choosing which group is the basis of comparison:

  • Compare to the treated group → you estimate ATT
  • Compare to the untreated group → you estimate ATC

The choice of comparison group shapes the interpretation of the effect.

Using Regression with Interaction Terms

One way to adjust for covariates is regression with interactions:

Yi=β0+β1Ti+β2Xi+β3(Ti×Xi)+εiY_i = \beta_0 + \beta_1 T_i + \beta_2 X_i + \beta_3 (T_i \times X_i) + \varepsilon_i
  • XiX_i is a covariate (e.g., prior achievement)
  • Ti×XiT_i \times X_i lets the treatment effect vary by background

This model captures how treatment effects differ by individual characteristics.

To estimate ATT, focus on the covariates of those who actually received treatment. For ATC, base your analysis on the untreated group’s characteristics.

Interactive Visualization

This interactive tool visualizes the differences between ATE, ATT, and ATC using a simple simulation.

Understanding Causal Effects: ATE, ATT, and ATC

Explore the differences between Average Treatment Effect (ATE), Average Treatment Effect on the Treated (ATT), and Average Treatment Effect on the Controls (ATC) using a cram school example.

Average Treatment Effect (ATE)

Key Question:

"What if EVERYONE attended vs. NO ONE attended?"

Compares outcomes if the entire population received treatment versus if no one did.

Effect Size:

8.2 points

On average, cram school would increase test scores for the entire population.

Student Population (n=100)

Attended Cram School (40 students)FOCUS GROUP
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚
📚

Average Score: 85 points

Did Not Attend (60 students)FOCUS GROUP
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤
👤

Average Score: 72 points

What We're Comparing for ATE

Scenario A: Everyone attends cram school

Average score would be ~80 points

VS

Scenario B: No one attends cram school

Average score would be ~72 points

Difference: +8.2 points

Population-wide effect of the policy

Quick Comparison

ATE

All students in the population

+8.2 pts

ATT

Students who attended cram school

+13 pts

ATC

Students who didn't attend cram school

+9 pts

Summary

  • ATE, ATT, and ATC help us define who the effect is for.
  • Different analytical methods estimate different effects — so we must be clear about our target.
  • Estimating ATT and ATC requires accounting for covariates and deciding which group to compare against.
  • Regression models with interaction terms allow flexible modeling of treatment effects across subgroups.

Understanding these distinctions helps make causal analysis more honest and insightful.

← Back to Encyclopedia