7 Instrumental Variables
7.1 Why Instrumental Variables?
Chapter 6 showed how causal effects can be identified when all confounders are observed and can be blocked by conditioning on \(X\). When some confounders are unobserved, back-door adjustment fails. This chapter develops instrumental variables (IV) as an alternative identification strategy: rather than blocking the confounding path \(T \leftarrow U \to Y\), IV exploits an external variable \(Z\) whose effect on \(T\) is free of confounding by \(U\). This chapter asks what IV identifies and under what assumptions; Chapter 13 asks how that estimand is computed and tested in practice.
7.1.1 The Endogeneity Problem
Unconfoundedness \((Y(0), Y(1)) \indep T \mid X\) requires that every variable affecting both treatment and outcome is observed and included in \(X\). In many empirical settings this is implausible: in labor economics, unobserved ability or motivation affects both schooling decisions and wages; in epidemiology, unobserved health behaviors affect both treatment uptake and outcomes. Whenever an unobserved confounder \(U\) creates a back-door path \(T \leftarrow U \to Y\), the adjustment formula fails: \[\int f(y \mid t, x)\, p(x)\, dx \;\neq\; f(y \mid \doop(T{=}t)).\] The gap is the endogeneity bias. We need a different identification strategy.
7.1.2 The IV Idea
An instrumental variable \(Z\) is an observed variable that: (1) moves treatment \(T\) (relevance); (2) does so exogenously — \(Z\) is unrelated to the unobserved confounder \(U\) (exogeneity); (3) affects \(Y\) only through \(T\) — \(Z\) has no direct path to \(Y\) (exclusion). Under these three conditions, the variation in \(T\) induced by \(Z\) is free of confounding by \(U\), so the ratio of the \(Z\)-induced variation in \(Y\) to the \(Z\)-induced variation in \(T\) becomes a meaningful target for identification.
IV does not eliminate the confounding path \(T \leftarrow U \to Y\), nor does it make treatment as-if randomly assigned for the full population. Instead, IV avoids confounding rather than controlling for it — a distinction that matters both for interpreting what is identified (the LATE for compliers, not the ATE) and for understanding which assumption does the heaviest lifting (exclusion, not unconfoundedness).
7.2 Graphical Setup and Core Assumptions
7.2.1 The IV DAG
The causal structure for a basic IV model with covariates \(X\) is:
The three IV assumptions correspond to three distinct features of this DAG: (1) Relevance: there is a directed path \(Z \to T\); (2) Exogeneity: conditional on \(X\), the DAG implies \(Z \indep U \mid X\) by d-separation; (3) Exclusion: every directed path from \(Z\) to \(Y\) in \(\mathcal{G}\) passes through \(T\) — there is no direct edge \(Z \to Y\).
The distinction between exogeneity and exclusion is fundamental. Exogeneity says the instrument is not confounded with latent causes of the outcome. Exclusion says the instrument has no causal channel to the outcome except through treatment. A randomized encouragement may be exogenous by design, yet still violate exclusion if the encouragement changes outcomes through information, motivation, or stigma apart from the treatment itself.
7.2.2 The Three Assumptions in Three Languages
The same three assumptions can be expressed in three causal languages. These are parallel formulations, not literally identical statements: each highlights a different aspect of the design. The graphical formulation is most useful for causal design; the structural formulation is most useful for deriving moment restrictions; the potential-outcomes formulation prepares the ground for the LATE framework. The assumptions are ordered relevance \(\to\) exogeneity \(\to\) exclusion, reflecting the natural sequence in which a researcher assesses them.
| Assumption | Potential outcomes | Structural / econometric | Do-calculus / DAG |
|---|---|---|---|
| Relevance | \(P(T_i(1) \neq T_i(0)) > 0\) | \(\pi \neq 0\) in \(T = \pi Z + \delta^\top X + \eta\) | \(Z \to T\) in \(\mathcal{G}\) (no d-separation) |
| Exogeneity | \(Z \indep (Y(0), Y(1), T(0), T(1)) \mid X\) | \(\E[\varepsilon \mid Z, X] = 0\) | \(Z \indep U \mid X\) |
| Exclusion | \(Y_i(t, z) = Y_i(t, z')\) for all \(z, z'\) | \(Z\) absent from structural equation for \(Y\) | \(f(y \mid x, \doop(T{=}t), z) = f(y \mid x, \doop(T{=}t))\) |
Why the do-calculus formulation is preferred. The exclusion restriction in the do-calculus column reads: \[f(y \mid x, \doop(T{=}t), z) = f(y \mid x, \doop(T{=}t)).\] This is a statement about the interventional density — the distribution of \(Y\) after we have set \(T = t\) by do-surgery. The do-operator makes it impossible to confuse this with the observational statement \(f(y \mid x, T{=}t, z) = f(y \mid x, T{=}t)\), which is a much weaker condition.
7.2.3 Relevance
Graphically, relevance means \(Z\) and \(T\) are not d-separated in \(\mathcal{G}\). The conditional covariance \(\mathrm{Cov}(Z, T \mid X)\) and the first-stage \(F\)-statistic are empirical diagnostics for the observable association; the causal claim that \(Z\) shifts \(T\) is design-based.
7.2.4 Exogeneity
In the structural linear model, the same requirement appears as \(\E[Z\varepsilon \mid X] = 0\), because the structural residual \(\varepsilon\) is a function of \(U\). This moment condition is a consequence of the graphical assumption, not a definition of exogeneity. A researcher who adopts the moment condition as a primitive has no guarantee that \(Z\) is free of back-door paths to the outcome.
Exogeneity is untestable: the unobserved confounder \(U\) is, by definition, unobserved.
7.2.5 Exclusion
Exclusion is a causal restriction, not a conditional-correlation restriction. Adding a direct arrow \(Z \to Y\) to the DAG is exactly the formal counterpart of exclusion failing. In the mutilated graph \(\mathcal{G}_{\overline{T}}\), the path \(Z \to Y\) would remain open. Rule 1 of the do-calculus, which would allow removing \(Z\) from the density of \(Y\) given \(\doop(T{=}t)\), no longer applies.
7.3 Identification in the Linear Homogeneous-Effect Model
7.3.1 The Linear Structural Model
Consider the linear structural model: \[Y = \alpha + \beta T + \gamma^\top X + \varepsilon, \tag{7.1}\] \[T = \pi Z + \delta^\top X + \eta, \tag{7.2}\] where \(\varepsilon\) and \(\eta\) are structural errors with \(\mathrm{Cov}(\varepsilon, \eta) = \rho\sigma\tau \neq 0\). The non-zero covariance is the source of endogeneity: OLS applied to Equation 7.1 gives a biased estimator of \(\beta\). The reduced form substitutes Equation 13.2 into Equation 7.1: \[Y = \alpha + \beta\pi Z + (\beta\delta + \gamma)^\top X + (\beta\eta + \varepsilon).\] The reduced-form coefficient on \(Z\) is \(\beta\pi\): the total effect of the instrument on the outcome. Dividing by the first-stage coefficient \(\pi\) recovers \(\beta\), provided \(\pi \neq 0\).
7.3.2 The OLS Bias
The probability limit of the OLS estimator is: \[\mathrm{plim}\; \hat\beta_{\mathrm{OLS}} = \beta + \frac{\mathrm{Cov}(T, \varepsilon)}{\mathrm{Var}(T)} = \beta + \frac{\rho\sigma\tau}{\pi^2 \mathrm{Var}(Z) + \tau^2}.\] The bias is zero only if \(\rho = 0\) (no unobserved confounding) or if the instrument perfectly determines \(T\). The direction of the bias depends on the sign of \(\rho\).
7.3.3 Derivation of the Wald Estimand
The derivation has three ingredients: the first stage (how much \(Z\) moves \(T\)), the reduced form (how much \(Z\) moves \(Y\)), and the exclusion restriction (any effect of \(Z\) on \(Y\) must operate through \(T\)).
- Exogeneity (\(Z \indep U \mid X\)) implies \(\E[\varepsilon \mid Z, X] = 0\).
- Exclusion (no \(Z\) term in the outcome equation) combined with exogeneity yields the moment condition \(\E[\varepsilon \cdot Z \mid X] = 0\).
- Multiplying Equation 7.1 by \((Z - \E[Z \mid X])\) and applying step 2: \(\mathrm{Cov}(Y, Z \mid X) = \beta \cdot \mathrm{Cov}(T, Z \mid X)\), which is the reduced-form decomposition: the \(Z\)–\(Y\) covariance is entirely attributable to the causal path \(Z \to T \to Y\).
- Relevance (\(\mathrm{Cov}(Z, T \mid X) \neq 0\)) ensures the denominator is non-zero:
\[\beta = \frac{\mathrm{Cov}(Y,\, Z \mid X)}{\mathrm{Cov}(T,\, Z \mid X)}. \tag{7.3}\]
In the binary instrument case (\(Z \in \{0,1\}\)) without covariates, this simplifies to the Wald estimand:
7.4 Why the IV Assumptions Matter
Each assumption is load-bearing, and each failure mode produces a distinct, quantifiable distortion of the Wald estimand.
When relevance fails. If \(\pi = 0\), the Wald estimand is undefined. When \(\pi\) is small but nonzero, the instrument is weak. The estimator’s variance diverges as \(\pi \to 0\), and finite-sample bias pulls the IV estimate toward the OLS estimate at a rate proportional to \(1/F\), where \(F\) is the first-stage \(F\)-statistic.
When exclusion fails. If \(Y = \alpha + \beta T + \delta Z + \gamma^\top X + \varepsilon\) with \(\delta \neq 0\), the Wald estimand converges to \(\beta + \delta/\pi\). The bias \(\delta/\pi\) is amplified by a weak first stage: a small direct effect combined with a weak instrument can produce large bias. This is why a weak instrument with a plausible exclusion violation is not “nearly valid” — it may be severely misleading.
When exogeneity fails. If \(\mathrm{Cov}(Z, \varepsilon \mid X) \neq 0\), the Wald estimand converges to \(\beta + \mathrm{Cov}(\varepsilon, Z)/\mathrm{Cov}(T, Z)\). Again the bias is amplified by weak instruments.
| Assumption | Directly testable? | Basis for assessment |
|---|---|---|
| Relevance | Association testable; causal claim design-based | First-stage \(F\)-statistic; the causal \(Z \to T\) link rests on the design |
| Exogeneity | No | Institutional knowledge; randomization (if available); placebo regressions on pre-determined outcomes |
| Exclusion | No in just-identified case; partially in overidentified case | Institutional argument; overidentification test (\(J\)-test, Chapter 13) checks mutual consistency but cannot confirm all instruments are valid (Kitagawa 2015) |
7.5 Lab: OLS vs. IV Across Instrument Strengths
This lab verifies the bias formulas numerically and traces the bias–variance tradeoff across the full range of instrument strength. It studies estimator behavior conditional on the IV assumptions being true; it does not address whether a proposed instrument is valid in an applied study.
Estimators. OLS regresses \(Y\) on \(T\): \(\hat\beta_{\mathrm{OLS}} = \mathrm{Cov}(Y, T)/\mathrm{Var}(T)\), converging to \(1 + 0.8/(\pi^2+1)\). IV uses the Wald estimator: \(\hat\beta_{\mathrm{IV}} = \mathrm{Cov}(Y, Z)/\mathrm{Cov}(T, Z)\), consistent for \(\beta\) for any \(\pi \neq 0\).
Results (\(n = 500\), \(B = 2{,}000\) replications, seed 2024):
| \(\pi\) | \(F\) | Theory bias | OLS mean | OLS RMSE | IV mean | IV RMSE |
|---|---|---|---|---|---|---|
| 0.00 | 1 | +0.800 | 1.801 | 0.802 | — | — |
| 0.10 | 6 | +0.792 | 1.791 | 0.792 | 0.872 | 4.393 |
| 0.15 | 13 | +0.782 | 1.782 | 0.783 | 0.764 | 6.491 |
| 0.20 | 21 | +0.769 | 1.769 | 0.770 | 0.940 | 0.385 |
| 0.30 | 46 | +0.734 | 1.735 | 0.735 | 0.975 | 0.165 |
| 0.50 | 127 | +0.640 | 1.639 | 0.640 | 0.996 | 0.091 |
| 1.00 | 500 | +0.400 | 1.401 | 0.402 | 0.999 | 0.045 |
| 2.00 | 2006 | +0.160 | 1.160 | 0.161 | 1.000 | 0.022 |
Lesson 1: The OLS bias formula is exact. The theory bias \(0.8/(\pi^2+1)\) matches the simulated OLS bias to four decimal places across all eight values of \(\pi\). OLS is biased in the direction of \(\rho\) at every value of \(\pi\), including \(\pi = 0\).
Lesson 2: IV is consistent but has catastrophically heavy tails when the instrument is weak. At \(\pi = 0.10\) (\(F \approx 6\)) and \(\pi = 0.15\) (\(F \approx 13\)), the IV mean is far from the true value. The IV median is \(\approx 1.01\) at both values, confirming consistency — the mean is dragged off by a small fraction of replications in which the first stage is near zero. Standard deviations of \(4.4\) and \(6.5\) make these estimates useless.
Lesson 3: The RMSE crossover occurs near \(F \approx 20\). IV first beats OLS on RMSE at \(\pi = 0.20\) (\(F \approx 21\)): \(0.385 < 0.770\). The Staiger–Stock rule of thumb (\(F \geq 10\)) is slightly too lenient: at \(F \approx 13\), IV RMSE is still \(6.5\), nearly \(8\times\) OLS. A more conservative threshold of \(F \geq 20\)–\(25\) is needed.
Lesson 4: A strong instrument eliminates both problems. At \(\pi = 1.00\) (\(F \approx 500\)), IV RMSE \(= 0.045\) while OLS RMSE \(= 0.402\) — a 9-fold improvement from IV. OLS efficiency is illusory: its small variance is offset by a large, persistent bias.
7.6 Multiple Instruments and Overidentification
When there are exactly as many instruments as endogenous variables (\(q = p\)), the model is just-identified. When \(q > p\), the model is overidentified: the extra instruments impose additional moment restrictions that are testable. Under the homogeneous-effect linear model, every valid instrument must imply the same structural coefficient \(\beta\), so those extra restrictions are testable. Under heterogeneous treatment effects, valid instruments can legitimately identify different LATEs because they shift treatment for different complier populations.
The order condition (\(q \geq p\), counting requirement) and the rank condition (instruments are linearly independent in the first stage) are both necessary for identification. The Sargan–Hansen \(J\)-test (Sargan 1958; Hansen 1982) formalizes the mutual-consistency check; rejection indicates at least one instrument either violates exogeneity or exclusion, or identifies a different LATE, but does not localize which. The test statistic and asymptotic distribution are derived in Chapter 13.
7.7 Heterogeneous Treatment Effects and the LATE Framework
7.7.1 Compliance Types
The instrument \(Z\) only shifts treatment for compliers: always-takers and never-takers have the same treatment status regardless of \(Z\), so they contribute nothing to the denominator \(\E[T \mid Z{=}1] - \E[T \mid Z{=}0]\).
7.7.2 The Monotonicity Assumption
Monotonicity is not a generic law of causal inference. It is a design-specific claim about how this particular instrument changes treatment behavior. Switching the instrument from 0 to 1 may induce some units to take treatment (compliers) and leave others unaffected (always-takers or never-takers), but it should not reverse anyone’s treatment decision.
7.7.3 The LATE Theorem
7.8 Interpreting IV Estimands
7.8.1 What the Two Frameworks Say
| Framework 1 (linear, homogeneous) | Framework 2 (heterogeneous effects) | |
|---|---|---|
| Key assumption | \(\tau_i = \beta\) for all \(i\) | Monotonicity; no defiers |
| What IV identifies | \(\beta = \text{ATE} = \text{ATT} = \text{LATE}\) | \(\tau_{\mathrm{LATE}} = \E[\tau_i \mid \text{complier}]\) |
| Estimand depends on instrument? | No (same \(\beta\) regardless of \(Z\)) | Yes (different \(Z\) \(\Rightarrow\) different compliers \(\Rightarrow\) different LATE) |
| Required for ATE? | Yes, automatically | Only if all units are compliers or effects homogeneous |
Framework 1 is a special case of Framework 2: when \(\tau_i = \beta\) for all \(i\), the LATE equals the ATE equals \(\beta\). In applied work, the default interpretation of the Wald estimand is the LATE; the ATE interpretation requires the additional homogeneity argument of Framework 1.
7.8.2 When Does LATE Equal ATE?
LATE equals ATE only under additional structure, most notably treatment-effect homogeneity. To see this, decompose the ATE by compliance type: \[\text{ATE} = P(\text{co})\,\E[\tau_i \mid \text{co}] + P(\text{at})\,\E[\tau_i \mid \text{at}] + P(\text{nt})\,\E[\tau_i \mid \text{nt}].\] The LATE equals only the first term divided by its probability weight. ATE and LATE coincide if and only if: (1) mean treatment effects are equal across compliance types, or (2) everyone is a complier, or (3) the average effects for always-takers and never-takers happen to equal the LATE — an untestable coincidence.
7.8.3 Different Instruments, Different Estimands
Because the LATE is specific to the complier population, and different instruments select different complier populations, two valid instruments for the same treatment can legitimately identify different LATEs. This is informative about treatment effect heterogeneity, not a contradiction.
7.8.4 The Policy Relevance of LATE
For many policy questions, the LATE is exactly the right estimand. If a policy is designed to encourage a subset of the population to take treatment, then the effect on compliers (those who respond to the encouragement) is precisely what the policy-maker wants to know. When the ATE over the full population is required, IV alone is insufficient under heterogeneous effects; additional assumptions or a second instrument are needed to extrapolate from the LATE to the ATE.
7.9 Practical Guidance on Defending an IV Design
A researcher proposing an instrument should be able to answer five questions explicitly:
- What exactly is the instrument? Specify \(Z\) precisely: its source of variation, the level at which it varies, and the population to which it applies.
- Why does it shift treatment? Articulate the causal mechanism by which \(Z\) moves \(T\). The first-stage \(F\)-statistic is a diagnostic for instrument strength, not a substitute for a causal account of the \(Z \to T\) link.
- Why is it as-if random relative to latent outcome determinants? The most credible sources are designed randomization (lotteries, randomized encouragement), natural experiments, and shift-share designs (Bartik 1991; Goldsmith-Pinkham et al. 2020). Placebo regressions on pre-determined outcomes provide partial evidence.
- Why can it affect the outcome only through treatment? Exclusion is untestable in just-identified models. A useful diagnostic: how large would the direct effect \(\delta\) have to be, relative to \(\pi\), to overturn the estimated causal effect? When the first stage is weak, the answer is: not very large.
- What population margin does it shift? Identify the complier population. This determines the LATE that is being identified and governs the external validity of the estimates.
7.10 Applied Example: Charter School Lotteries and the KIPP Lynn Study
This example illustrates a canonical randomized-encouragement IV design. The lottery randomizes offer status \(Z\), not actual treatment \(S\) (years of KIPP attendance). Winning the lottery does not force a student to attend KIPP, and losing does not make later attendance impossible. Thus the lottery offer is the instrument and actual attendance is the endogenous treatment.
Setting. KIPP (Knowledge Is Power Program) schools follow a “No Excuses” model: extended school days, longer academic year, selective teacher hiring, and strict behavioral norms. KIPP Academy Lynn was substantially oversubscribed beginning in 2005. Massachusetts law requires oversubscribed charter schools to select students by lottery, so the school conducted randomized admissions lotteries from 2005 through 2008. The outcome \(Y_{igt}\) is the student’s standardized score on the Massachusetts MCAS, normalized to mean zero and standard deviation one within each subject–grade–year cell statewide.
Mapping the three assumptions (Angrist et al. 2012).
Relevance. Lottery winners were offered a seat and about 80% accepted; losers rarely enrolled elsewhere at KIPP. The first-stage regression yields a coefficient of approximately \(1.2\): lottery winners had spent about 1.2 more years at KIPP than lottery losers at the time of each MCAS exam. The first-stage \(F\)-statistic is far above conventional thresholds.
Exogeneity. Offer status was determined by randomly drawn lottery-sequence numbers, so independence holds by design. A joint test of covariate balance yields \(p = 0.615\), consistent with no pre-lottery differences.
Exclusion. The offer merely provides access to a school; it does not itself deliver instruction. One potential violation is a discouragement effect: losing the lottery might demoralize students. The authors address this by noting that scores of lottery losers are typical of demographically comparable students in Lynn, inconsistent with large discouragement effects. Random assignment of the instrument does not, by itself, imply exclusion — it only guarantees exogeneity.
First stage, reduced form, and 2SLS. The model is just-identified (one excluded instrument per endogenous variable), so the 2SLS estimator of \(\theta\) (effect per year at KIPP) equals the ratio of the reduced-form coefficient on \(Z_i\) to the first-stage coefficient \(\pi\).
| Subject | First stage | Reduced form | 2SLS |
|---|---|---|---|
| Math | 1.221 (0.068) | 0.430 (0.067) | 0.352 (0.053) |
| ELA | 1.228 (0.068) | 0.164 (0.073) | 0.133 (0.059) |
Standard errors clustered at the student level. \(N = 833\) student-by-test observations.
Each year at KIPP raises math scores by approximately \(0.35\sigma\) and ELA scores by approximately \(0.13\sigma\). The reduced-form estimate for math (\(0.43\sigma\)) is larger than the 2SLS estimate (\(0.35\sigma\)) because the first stage exceeds 1: lottery winners accumulated somewhat more than one additional year at KIPP per unit of follow-up time.
LATE interpretation. The 2SLS estimand \(\theta\) is not the effect of KIPP on all students in Lynn, nor on all applicants — it is the average per-year treatment effect for the lottery compliers. Compliance is partial in both directions: some lottery winners do not enroll and some lottery losers eventually find entry.
Treatment effect heterogeneity. Reading gains (\(\approx 0.13\sigma\) overall) are driven almost entirely by students classified as having limited English proficiency (LEP, \(\approx 0.43\sigma\)) and special education needs (SPED, \(\approx 0.27\sigma\)); non-LEP, non-SPED students show negligible ELA gains. Math effects are large and positive across all subgroups but are largest for LEP and lower-achieving students. This connects directly to Section 7.8: different instruments would identify distinct subpopulation LATEs; the overall 2SLS estimate is a weighted average of subgroup LATEs with weights proportional to each subgroup’s share of the complier population.
7.11 IV versus Back-Door Adjustment
| Dimension | Back-door / propensity score | Instrumental variables |
|---|---|---|
| Core assumption | All confounders observed: \((Y(0),Y(1)) \indep T \mid X\) | Valid instrument: relevance, exogeneity, exclusion |
| Unobserved confounders | Fatal: back-door adjustment fails | Permitted: IV routes around \(U\) |
| Estimand | ATE, ATT, or ATC; all coincide under homogeneity | LATE (compliers only); reduces to common \(\beta\) under homogeneous effects |
| Testability | Unconfoundedness is untestable; overlap is testable | Relevance testable; exogeneity and exclusion untestable (just-identified) |
| Main threat | Unmeasured confounder | Exclusion restriction violation |
| Identifies ATE? | Yes, under strong ignorability | Only under homogeneous effects |
Complementary failure modes. Back-door adjustment fails when \(X\) does not capture all confounders. IV fails when the exclusion restriction is violated: the bias in the Wald estimand is \(\delta/\pi\), amplified by weak instruments. The two failures are orthogonal: back-door adjustment requires many observed covariates but tolerates no unobserved ones, while IV tolerates unobserved confounders but requires an instrument with no direct effect on the outcome.
When both strategies are available. The Hausman (1978) endogeneity test compares OLS and IV: under the null that \(T\) is exogenous given \(X\), both are consistent, and a large discrepancy is evidence of endogeneity. Under the null, an appropriately scaled quadratic form in \(\hat\beta_{\mathrm{IV}} - \hat\beta_{\mathrm{OLS}}\) is asymptotically \(\chi^2_p\). Disagreement does not by itself tell us which method is wrong: the two strategies typically target different estimands (ATE vs. LATE), and disagreement can arise from legitimate effect heterogeneity rather than failure of either assumption.
7.12 Chapter Summary
| Symbol | Meaning |
|---|---|
| \(Z\) | Instrument |
| \(\pi\) | First-stage coefficient: \(\E[\partial T/\partial Z]\) |
| \(\rho\) | Endogeneity: \(\mathrm{Cov}(\varepsilon, \eta)/(\sigma\tau)\) |
| \(\tau_{\mathrm{LATE}}\) | \(\E[Y(1) - Y(0) \mid T_i(1) > T_i(0)]\) |
| Wald estimand | \((\E[Y\mid Z{=}1]-\E[Y\mid Z{=}0])/(\E[T\mid Z{=}1]-\E[T\mid Z{=}0])\) |
| Reduced form | Total effect of \(Z\) on \(Y\) |
| First stage | Effect of \(Z\) on \(T\) |
- IV identifies effects from exogenous treatment variation. When back-door adjustment fails because an unobserved \(U\) creates a path \(T \leftarrow U \to Y\), a valid instrument \(Z\) identifies the causal effect by exploiting only the component of treatment variation that \(Z\) induces. IV does not block the confounding path — it avoids it.
- Three assumptions, ordered by testability. Relevance can be assessed with the first-stage \(F\)-statistic (though the \(F\)-statistic is a sample diagnostic). Exogeneity and exclusion must be defended by institutional knowledge, design logic, and causal structure. Each violated assumption produces a distinct, quantifiable bias, amplified by weak instruments.
- Framework 1: homogeneous-effect SEM \(\Rightarrow\) Wald identifies \(\beta\). Under constant treatment effects and the linear structural model, the Wald estimand identifies \(\beta = \text{ATE} = \text{ATT} = \text{LATE}\).
- Framework 2: heterogeneity + monotonicity \(\Rightarrow\) Wald identifies LATE. Under heterogeneous treatment effects and monotonicity, the Wald estimand identifies the average treatment effect for compliers only — those whose treatment status changes with the instrument, a latent subgroup defined by \((T(0), T(1))\).
- Different instruments identify different effects. The LATE depends on the instrument through the complier population it selects. This is informative about treatment effect heterogeneity, not a contradiction. LATE equals ATE only under additional structure.
- IV versus back-door adjustment. Complementary failure modes: back-door fails when confounders are unobserved; IV fails when the exclusion restriction is violated or the instrument is not truly exogenous.
- Estimation deferred to Chapter 13. This chapter establishes what IV identifies and under what assumptions. How the Wald ratio is estimated from finite data — reduced-form regression, two-stage least squares, asymptotic inference, and overidentification tests — is the subject of Chapter 13.
7.13 Problems
1. The three IV assumptions in three languages. Consider the DAG: \(\{Z \to T,\; T \to Y,\; U \to T,\; U \to Y,\; X \to T,\; X \to Y,\; X \to Z\}\) with \(U\) unobserved.
- List all back-door paths from \(T\) to \(Y\). Does \(X\) alone satisfy the back-door criterion? Explain.
- Verify the three IV assumptions using d-separation: (i) Relevance: show \(Z\) and \(T\) are not d-separated in \(\mathcal{G}\). (ii) Exogeneity: show \(Z \indep U \mid X\) in \(\mathcal{G}\). (iii) Exclusion: show \(Y \indep Z \mid T, X\) in \(\mathcal{G}_{\overline{T}}\).
- Now add the arrow \(Z \to Y\) to the DAG. Which IV assumption is violated? Show explicitly which step of the Wald derivation in Section 8.7 breaks down.
- Translate each of the three IV assumptions into the structural language: write the equations for \(T\) and \(Y\) and identify which coefficient restriction corresponds to each assumption.
2. Bias under assumption violations. Let \(Y = \beta T + \varepsilon\) and \(T = \pi Z + \eta\) with \(\E[\varepsilon \mid Z] = 0\) and \(\pi \neq 0\).
- Starting from \(\E[Y \mid Z{=}1] - \E[Y \mid Z{=}0]\), substitute the structural equation for \(Y\) and simplify. What role does exogeneity play?
- Show that \(\E[T \mid Z{=}1] - \E[T \mid Z{=}0] = \pi\) in the linear first-stage model.
- Derive the Wald estimand and confirm it equals \(\beta\).
- Now suppose the exclusion restriction fails and \(Y = \beta T + \delta Z + \varepsilon\) with \(\delta \neq 0\). Derive the probability limit of the Wald estimator and confirm the bias formula from Section 7.4.
- Suppose instead that exogeneity fails: \(\E[\varepsilon \mid Z] = cZ\) for some constant \(c \neq 0\). Derive the probability limit of the Wald estimator and express the bias in terms of \(c\) and \(\pi\). Compare the structure of this bias with the exclusion violation bias.
3. Order, rank, and the limits of overidentification. Consider a model with one endogenous variable \(T\) and two instruments \(Z_1\) and \(Z_2\), both satisfying exogeneity and exclusion.
- State the order condition and verify it is satisfied.
- State the rank condition. What would it mean geometrically if the rank condition failed — i.e., if \(Z_1\) and \(Z_2\) were perfectly collinear in the first-stage regression?
- Explain intuitively why having two valid instruments rather than one should improve estimation precision.
- Now suppose \(Z_1\) is valid but \(Z_2\) violates the exclusion restriction. Under what conditions does the Sargan–Hansen \(J\)-test have power to detect \(Z_2\)’s invalidity? Under what conditions does the test fail?
- Why does passing the \(J\)-test not confirm that both \(Z_1\) and \(Z_2\) are valid? Give a concrete example in which both instruments are invalid and the \(J\)-test has no power.
4. Compliance types and the LATE. In a binary instrument, binary treatment study, suppose the population has: 30% compliers with average treatment effect \(\tau_c = 6\); 25% always-takers with \(\tau_a = 3\); 45% never-takers with \(\tau_n = 1\); no defiers.
- Compute \(P(\text{complier}) = \E[T \mid Z{=}1] - \E[T \mid Z{=}0]\).
- Compute the ATE as a weighted average of \(\tau_c\), \(\tau_a\), \(\tau_n\) with appropriate weights.
- The Wald estimand equals \(\tau_c = 6\). By how much does this overstate the ATE, and why?
- A second study uses a different binary instrument \(Z'\) with a complier population of 50% and a LATE of 2. Is this contradictory? What can you infer about the relative treatment effect in the two complier populations?
- Explain, using compliance type language, why the denominator of the Wald estimand equals \(P(\text{complier})\).
5. The exclusion restriction: plausibility and violations. Evaluate the exclusion restriction for each proposed instrument. For each, state (i) whether the restriction is plausible and why; (ii) a specific mechanism by which it could be violated; and (iii) whether the violation would bias the IV estimate upward or downward.
- Instrument: rainfall in the home region of a politician, used as an instrument for government infrastructure spending. Outcome: local economic growth.
- Instrument: distance to the nearest hospital, used as an instrument for hospital admission. Outcome: 30-day mortality.
- Instrument: a randomly assigned financial incentive to enroll in a health screening program. Outcome: health status two years later.
- Instrument: lottery number in the Vietnam-era draft lottery, used as an instrument for military service. Outcome: lifetime earnings. [This is the Angrist (1990) study; discuss why this instrument is widely regarded as satisfying the exclusion restriction.]
6. IV versus back-door adjustment. A researcher studies the effect of job training (\(T\)) on earnings (\(Y\)). Two strategies are available: (A) a rich set of pre-treatment covariates \(X\) and a propensity-score estimator; (B) a lottery that randomly selected units to be offered training (not required to attend), used as instrument \(Z\).
- Under what assumption does strategy (A) identify the ATE? What specific unobserved variable would most plausibly violate this assumption?
- Strategy (B) identifies a LATE. Describe the complier population in words. Is the LATE likely to be larger or smaller than the ATE in this setting? Explain.
- Both strategies are implemented and yield estimates of $1,800 and $2,400 per year, respectively. Describe a Hausman-type test that uses both estimates. Under what null hypothesis does the test have an approximate \(\chi^2\) distribution?
- If the two estimates differ significantly, which strategy would you trust more and why? What additional evidence would help distinguish the two explanations (endogeneity bias in (A) versus LATE \(\neq\) ATE in (B))?