13 Estimation under Instrumental Variables
13.1 From Identification to Estimation
Chapter 7 established what IV identifies and under what assumptions. This chapter shows how to estimate that target from finite data using sample analogs of the same orthogonality restrictions.
Chapter 7 showed that when an instrument \(Z\) satisfies relevance, exogeneity, and exclusion, the residualized-covariance ratio \(\mathrm{Cov}(\tilde Z, Y)/\mathrm{Cov}(\tilde Z, T)\) is a well-defined function of observable quantities but does not identify any specific causal parameter without a fourth structural assumption. This chapter works within the linear constant-effect model: \[Y = \alpha + \beta T + \gamma^\top X + \varepsilon, \tag{13.1}\] \[T = a_T + \pi Z + \delta^\top X + \eta, \tag{13.2}\] where the fourth structural assumption is the constant-effect restriction: \(Y_i(1) - Y_i(0) = \beta\) for all \(i\). Under the three core assumptions plus this restriction, the structural coefficient \(\beta\) is identified by the Wald formula: \[\beta = \frac{\mathrm{Cov}(\tilde Z,\, Y)}{\mathrm{Cov}(\tilde Z,\, T)}.\]
This chapter addresses the estimation problem: how to recover \(\beta\) from a finite sample. The answer is not a single formula but a family of estimators — the Wald estimator, two-stage least squares (2SLS), and the generalized method of moments (GMM) — each of which is a sample analog of the same underlying orthogonality restriction.
Chapter structure. Core material (Section 13.2–Section 13.7) develops the mainstream IV estimators and their asymptotic theory. Advanced enrichment (Section 13.8–Section 13.9) introduces GEL and the control function approach.
13.2 The Wald Estimator and the IV Regression Estimator
13.2.1 The Wald Estimator
We begin with the simplest setting: a binary instrument \(Z \in \{0,1\}\), scalar treatment \(T\), scalar outcome \(Y\), and no additional covariates \(X\).
The numerator \(\bar{Y}_1 - \bar{Y}_0\) estimates the reduced form: the total effect of the instrument on the outcome. The denominator \(\bar{T}_1 - \bar{T}_0\) estimates the first stage: the effect of the instrument on treatment uptake. Their ratio recovers the effect of treatment on the outcome by attributing all of the instrument’s effect on \(Y\) to the path \(Z \to T \to Y\) — valid precisely because the exclusion restriction rules out any direct path \(Z \to Y\).
13.2.2 The IV Regression Estimator with Covariates
When observed covariates \(X \in \mathbb{R}^p\) are present, the Wald estimator is no longer applicable. We derive the general IV estimator from the estimating-equation principle.
Notation. Stack observations into \(\mathbf{Y}, \mathbf{T}, \mathbf{Z} \in \mathbb{R}^n\) and \(\mathbf{X} \in \mathbb{R}^{n \times p}\). Let \(\bar{\mathbf{X}} = [\mathbf{1}_n,\,\mathbf{X}]\) and define the annihilator matrix: \[M_X = I_n - \bar{\mathbf{X}}\bigl(\bar{\mathbf{X}}^\top\bar{\mathbf{X}}\bigr)^{-1}\bar{\mathbf{X}}^\top, \tag{13.4}\] the orthogonal projection onto the orthogonal complement of \(\mathrm{col}(\bar{\mathbf{X}})\).
Derivation via constrained normal equations. The structural model imposes: \[\E\!\left[\begin{pmatrix} 1 \\ X_i \\ Z_i \end{pmatrix}\varepsilon_i\right] = 0, \qquad \E[T_i\,\varepsilon_i] \neq 0. \tag{13.5}\]
Consider the sample normal equations one would obtain by running OLS of \(\mathbf{Y}\) on \((\mathbf{T}, \bar{\mathbf{X}}, \mathbf{Z})\), with residual \(\hat{\mathbf{e}} = \mathbf{Y} - \mathbf{T}\beta - \bar{\mathbf{X}}\bar\gamma\):
\[\mathbf{T}^\top \hat{\mathbf{e}} = \mathbf{0}, \quad (\text{NE-T}) \quad \bar{\mathbf{X}}^\top \hat{\mathbf{e}} = \mathbf{0}, \quad (\text{NE-X}) \quad \mathbf{Z}^\top \hat{\mathbf{e}} = \mathbf{0}. \quad (\text{NE-Z})\]
At the true \(\beta\), (NE-T) fails because \(\E[T_i\varepsilon_i] \neq 0\). We drop the invalid (NE-T) and solve the \((p+2)\)-dimensional system (NE-X), (NE-Z). From (NE-X): \(\hat{\bar\gamma} = (\bar{\mathbf{X}}^\top\bar{\mathbf{X}})^{-1}(\bar{\mathbf{X}}^\top\mathbf{Y} - \bar{\mathbf{X}}^\top\mathbf{T}\beta)\). Substituting into (NE-Z) and simplifying using \(M_X\): \(\mathbf{Z}^\top M_X \mathbf{T}\,\beta = \mathbf{Z}^\top M_X \mathbf{Y}\).
The Wald estimator is the special case \(Z \in \{0,1\}\), no covariates: \(M_X = I_n - n^{-1}\mathbf{1}\mathbf{1}^\top\) is the centering matrix, and \(\hat\beta_{\mathrm{IV}} = \hat\beta_{\mathrm{Wald}}\).
13.2.3 Structural Form, Reduced Form, and the Reduced Form Regression
The structural form. System Equation 13.1–Equation 13.2 is the structural form: each equation describes how one variable is determined by others, including endogenous variables on the right-hand side. The structural coefficient \(\beta\) has a causal interpretation, but \(T\) is correlated with \(\varepsilon\), so OLS applied to the structural form is inconsistent.
The reduced form. The reduced form is obtained by solving the structural system so each endogenous variable is expressed purely as a function of exogenous variables \(Z\) and \(X\). Substituting Equation 13.2 into Equation 13.1: \[Y = \underbrace{(\alpha + \beta a_T)}_{\alpha_{\mathrm{rf}}} + \underbrace{\beta\pi}_{\phi}\, Z + \underbrace{(\beta\delta + \gamma)^\top}_{\gamma_{\mathrm{rf}}^\top}\, X + \underbrace{(\beta\eta + \varepsilon)}_{\nu}. \tag{13.7}\]
Writing compactly: \(Y = \alpha_{\mathrm{rf}} + \phi Z + \gamma_{\mathrm{rf}}^\top X + \nu\), where \(\phi = \beta\pi\), \(\nu = \beta\eta + \varepsilon\). Both right-hand-side variables (\(Z\) and \(X\)) are exogenous, so OLS is consistent for \(\phi\). Neither \(\beta\) nor \(\pi\) is separately identified from the reduced form alone.
Relationship to the first stage and IV estimator. The first-stage regression estimator is \(\hat\pi_{\mathrm{FS}} = \widehat{\mathrm{Cov}}(\tilde{Z}, T)/\widehat{\mathrm{Var}}(\tilde{Z})\). Since \(\phi = \beta\pi\), the sample analog gives: \[\hat\beta_{\mathrm{IV}} = \frac{\hat\phi_{\mathrm{RF}}}{\hat\pi_{\mathrm{FS}}}, \tag{13.9}\] reproducing Equation 13.6. The reduced form delivers the instrument’s total effect on the outcome; the first stage scales it by the instrument’s effect on treatment; the ratio recovers the structural parameter.
Intent-to-treat interpretation. When \(Z\) is randomly assigned, \(\hat\phi_{\mathrm{RF}}\) estimates the intent-to-treat (ITT) effect: the average effect on \(Y\) of being assigned \(Z = 1\) rather than \(Z = 0\), regardless of actual treatment uptake. The ITT requires only exogeneity of \(Z\), not the exclusion restriction, and is often of direct policy interest.
13.3 Two-Stage Least Squares
Two-stage least squares (2SLS) extends the Wald estimator to settings with continuous or multi-valued instruments, multiple instruments, and observed covariates.
Stage 1: The First-Stage Regression. Regress \(T\) on \(Z\) and \(X\) by OLS, obtaining fitted values \(\hat{T}_i = \hat{a}_T + \hat\pi^\top Z_i + \hat\delta^\top X_i\). In matrix form, \(\hat{\mathbf{T}} = P_W \mathbf{T}\) where \(P_W\) is the projection onto the column space of the instrument design matrix \(\mathbf{W}\). The fitted values \(\hat{T}_i\) isolate the component of treatment variation spanned by \((1, Z, X)\) — the variation that is exogenous under IV validity.
Stage 2: The Second-Stage Regression. Regress \(Y\) on \(\hat{T}\) and \(X\) by OLS.
The component of \(T\) orthogonal to \((Z, X)\) — the residual \(\hat\eta_i = T_i - \hat{T}_i\), correlated with \(\varepsilon_i\) when \(T\) is endogenous — is dropped before the causal coefficient is estimated.
13.4 Equivalence of the IV Regression Estimator and 2SLS
In the scalar-instrument linear model, Wald, IV regression, and 2SLS are not competing methods; they are different representations of the same sample analog of the identification formula.
13.5 The Moment-Condition View and GMM
13.5.1 2SLS as a Method-of-Moments Estimator
Stack the constant, covariates, and instrument into \(W = (1,\,X^\top,\,Z^\top)^\top\). The IV moment condition is \(\E[W\,(Y - \alpha - \beta T - \gamma^\top X)] = 0\). Setting \(\theta = (\alpha, \beta, \gamma^\top)^\top\) and defining \(U(O;\,\theta) = W\,(Y - D^\top\theta)\) where \(D = (1, T, X^\top)^\top\), the identifying condition is \(\E\{U(O;\,\theta_0)\} = 0\) — exactly an estimating equation in the sense of Chapter 10.
13.5.2 Overidentification and GMM
When \(q > 1\) instruments are available, the model is overidentified: more moment conditions than parameters. The system \(\mathbb{P}_n U(O;\,\theta) = 0\) is generically overdetermined and has no exact solution.
The generalized method of moments (GMM) minimizes a weighted quadratic form in the sample moments: \[\hat\theta_{\mathrm{GMM}} = \arg\min_{\theta}\;\bigl[\mathbb{P}_n U(O;\,\theta)\bigr]^\top\,\hat\Omega_n\,\bigl[\mathbb{P}_n U(O;\,\theta)\bigr].\]
Different choices of \(\hat\Omega_n\) yield different estimators: \(\hat\Omega_n = (n^{-1}\sum_i W_i W_i^\top)^{-1}\) yields 2SLS; \(\hat\Omega_n = [\mathbb{P}_n U(O;\,\hat\theta)U(O;\,\hat\theta)^\top]^{-1}\) yields the efficient GMM estimator. Under homoskedasticity, efficient GMM and 2SLS coincide. Under heteroskedasticity, efficient GMM is weakly (and generically strictly) more efficient.
13.6 Asymptotic Theory of the GMM Estimator
13.6.1 Setup and Notation
Stack the structural regressors and instruments: \[D_i = (1,\, T_i,\, X_i^\top)^\top \in \mathbb{R}^k, \qquad W_i = (1,\, X_i^\top,\, Z_i^\top)^\top \in \mathbb{R}^m,\] where \(k = p + 2\) and \(m = p + 1 + q\). The IV moment condition is \(\E[U(O_i;\,\theta_0)] = 0\), \(U(O_i;\,\theta) = W_i(Y_i - D_i^\top\theta)\). Define the sensitivity matrix \(A = \E[W_i D_i^\top] \in \mathbb{R}^{m \times k}\) and moment variance matrix \(\Sigma = \E[\varepsilon_i^2\, W_i W_i^\top]\).
13.6.2 Asymptotic Distribution
The formula is an instance of the general estimating-equation theory from Chapter 10: the sensitivity matrix \(A\) plays the role of \(-\E[\partial U/\partial\theta^\top]\) and the moment variance \(\Sigma\) plays the role of \(\E[UU^\top]\).
13.6.3 Two Important Special Cases
Exactly identified case (\(q = 1\), \(m = k\)). \(A\) is square and invertible. All weighting matrices yield the same estimator. The sandwich variance simplifies to: \[V_{\mathrm{IV}} = A^{-1}\,\Sigma\,(A^\top)^{-1}. \tag{13.12}\] In the scalar no-covariate case (\(p = 0\)), the block corresponding to \(\beta\) is: \[V_\beta = \frac{\E[\varepsilon^2 \tilde{Z}^2]}{(\E[\tilde{Z}\,T])^2}, \qquad \tilde{Z} = Z - \E[Z].\] Under homoskedasticity and first-stage relation \(T = a_T + \pi Z + \eta\): \(V_\beta = \sigma^2/(\pi^2\,\mathrm{Var}(Z))\).
Efficient GMM. The asymptotic variance \(V_{\mathrm{GMM}}(\Omega)\) is minimized by \(\Omega^\ast = \Sigma^{-1}\), giving: \[V_{\mathrm{eff}} = \bigl(A^\top\Sigma^{-1}A\bigr)^{-1}. \tag{13.13}\] This is the semiparametric efficiency lower bound for IV estimation within the linear IV moment-restriction model \(\E[W\varepsilon] = 0\).
13.6.4 Consistent Variance Estimation
Let \(\hat\varepsilon_i = Y_i - D_i^\top\hat\theta\) denote the structural residuals. Consistent estimators: \(\hat{A} = n^{-1}\sum_i W_i D_i^\top\), \(\hat\Sigma = n^{-1}\sum_i \hat\varepsilon_i^2\, W_i W_i^\top\). The heteroskedasticity-robust sandwich variance estimator is: \[\hat{V}_{\mathrm{GMM}} = (\hat{A}^\top\hat\Omega_n\hat{A})^{-1}\,\hat{A}^\top\hat\Omega_n\,\hat\Sigma\,\hat\Omega_n\hat{A}\,(\hat{A}^\top\hat\Omega_n\hat{A})^{-1}. \tag{13.14}\]
In the exactly identified case, Equation 13.14 reduces to \(\hat{A}^{-1}\hat\Sigma(\hat{A}^\top)^{-1}\), independent of \(\hat\Omega_n\). In applications, the default should be heteroskedasticity-robust standard errors; cluster-robust standard errors are required when observations within groups share unmodeled common shocks.
13.7 Weak Instruments and Inferential Fragility
When the first-stage relationship is weak, IV estimation becomes severely fragile. A weak instrument is not merely an efficiency problem: it makes the finite-sample distribution of IV estimators highly non-normal, magnifies bias toward OLS, and undermines conventional confidence intervals.
13.7.1 The Weak-Instrument Problem
The 2SLS closed form Equation 13.10 divides by the sample covariance \(\widehat{\mathrm{Cov}}(\tilde{Z}, T)\). When the population first-stage coefficient \(\pi\) is near zero, the consequences are (Bound et al. 1995):
- Finite-sample bias toward OLS. As \(\pi \to 0\), the 2SLS bias approaches the OLS bias rather than zero.
- Non-Gaussian finite-sample distribution. The distribution of \(\hat\beta_{\mathrm{2SLS}}\) can be highly skewed or heavy-tailed, rendering the \(N(0,\,V_{\mathrm{2SLS}})\) approximation unreliable.
- Size distortion. Wald-type confidence intervals can severely undercover the true parameter.
13.7.2 Diagnostic: The First-Stage \(F\)-Statistic
The most widely used diagnostic is the \(F\)-statistic from the first-stage regression, testing the joint significance of \(Z\) after partialling out \(X\). Staiger and Stock (1997) argued informally for \(F \geq 10\) as adequate instrument strength; Stock and Yogo (2005) provided formal critical values. A large first-stage \(F\) supports relevance, but says nothing directly about exogeneity or exclusion. The first-stage \(F\)-statistic is a relevance diagnostic, not a certificate of instrument validity.
13.7.3 Weak-Instrument-Robust Inference
When first-stage strength is uncertain, alternative inferential procedures with size guarantees are needed.
The Anderson–Rubin (AR) test (Anderson and Rubin, 1949) inverts the question: it tests whether a hypothesized value \(\beta_0\) is consistent with the IV moment condition. Substituting \(\beta_0\) into the structural model gives \(Y - \beta_0 T = \alpha + \gamma^\top X + (\varepsilon + (\beta - \beta_0)T)\). If \(\beta_0 = \beta\), the composite error is uncorrelated with \(Z\) by instrument validity. The AR test regresses \(Y - \beta_0 T\) on \(Z\) and \(X\) and tests the null that the coefficient on \(Z\) is zero. Under the classical homoskedastic Gaussian linear model the \(F\)-statistic has an exact finite-sample \(F\)-distribution regardless of instrument strength. Inverting this test yields a confidence set valid whether or not the instrument is weak.
The conditional likelihood ratio (CLR) test of Moreira (2003) extends the AR idea more efficiently to the multiple-instrument case.
13.8 Generalized Empirical Likelihood
GEL uses the same IV moment restrictions but incorporates them through an implied reweighting of the sample rather than through a separately estimated covariance weight matrix. Section Section 13.6 showed that efficient GMM requires a two-step procedure; this two-step structure introduces finite-sample bias (Newey and Smith 2004). GEL provides a one-step alternative.
13.8.1 The GEL Estimator
Let \(U_i(\theta) = W_i(Y_i - D_i^\top\theta)\) and \(\bar{U}(\theta) = n^{-1}\sum_i U_i(\theta)\). GEL introduces a strictly convex function \(G\colon \mathcal{V} \to \mathbb{R}\) (open interval \(\mathcal{V}\) containing zero) and solves the saddle-point problem: \[\hat\theta_{\mathrm{GEL}} = \arg\min_{\theta}\;\sup_{\lambda \in \Lambda_n(\theta)}\;\frac{1}{n}\sum_{i=1}^n [-G(\lambda^\top U_i(\theta))], \tag{13.15}\] where \(\lambda \in \mathbb{R}^m\) is an auxiliary dual variable. \(G\) is normalized: \(G(0) = 0\), \(g(0) = 1\), \(g'(0) = 1\) where \(g = G'\).
13.8.2 The Convex-Conjugate Duality (Optional)
The GEL saddle-point problem Equation 13.15 is the Lagrangian dual of a minimum-discrepancy (MD) primal problem that re-weights observations. Define the Legendre–Fenchel conjugate of \(G\): \[F(\omega) = \sup_{v \in \mathcal{V}}\,[\omega v - G(v)], \tag{13.16}\] a strictly convex function with \(F(1) = 0\) (by the normalization). The MD estimator minimizes a convex divergence between observation weights \(\omega_i\) and the reference \(\omega_i = 1\), subject to \(\sum_i \omega_i U_i(\theta) = 0\).
The GEL special cases correspond to different divergences: EL uses reverse KL; ET uses forward KL; CUE uses a quadratic distance.
13.8.3 Asymptotic Properties and Comparison with GMM
For EL specifically, \(T_{\mathrm{EL}} = -2\sum_i \log(n\hat\pi_i)\), the empirical likelihood ratio statistic. To higher order, GEL estimators have smaller bias than two-step GMM: GEL eliminates the bias from estimating the Jacobian; EL additionally eliminates bias from estimating the weighting matrix \(\Sigma\).
13.9 The Control Function Approach
The control-function approach offers an alternative route to handling endogeneity: instead of projecting treatment onto instruments, it augments the outcome model with a control variable that absorbs the endogenous component of treatment selection.
2SLS achieves identification by replacing the endogenous regressor \(T\) with its exogenous projection \(\hat{T}\). The control function approach adds the first-stage residual to the outcome regression as an explicit control for the endogenous variation, rather than removing it from the treatment variable.
13.9.1 Linear Model and Equivalence to 2SLS
The linear control-function representation requires the linear control-function assumption: \[\E[\varepsilon \mid \eta, Z, X] = \rho\,\eta, \qquad \rho = \frac{\mathrm{Cov}(\varepsilon, \eta)}{\mathrm{Var}(\eta)}. \tag{13.18}\]
This holds under joint normality of \((\varepsilon, \eta)\) given \((Z, X)\). Defining \(\xi = \varepsilon - \rho\eta\), assumption Equation 13.18 is equivalent to \(\E[\xi \mid \eta, Z, X] = 0\). Substituting into the outcome equation: \(Y = \alpha + \beta T + \gamma^\top X + \rho\eta + \xi\). Including \(\eta\) as an additional regressor renders \(T\) exogenous in the augmented regression.
13.9.2 Testing Endogeneity via Residual Inclusion
The control function representation yields a natural test of \(H_0: \rho = 0\) (exogeneity of \(T\)). The \(t\)-statistic on \(\hat\eta\) in the augmented regression tests endogeneity. A rejection suggests endogeneity under the maintained instrument assumptions; it is not a stand-alone validation of the instrument, since the test takes instrument validity as given. This is the regression-based form of the Hausman (1978) endogeneity test.
13.9.3 Brief Note on Nonlinear Extensions
2SLS does not carry over. In a probit or Poisson outcome model, 2SLS is generally inconsistent: plugging in \(\hat T\) breaks the nonlinear link function.
Related control-variable methods exist. Imbens and Newey (2009) showed that under independence, \((\varepsilon, \eta) \indep Z \mid X\), and scalar monotonicity, \(T = h(Z, X, \eta)\) strictly monotonic in a scalar \(\eta\), the conditional CDF: \[V = F_{T \mid Z, X}(T \mid Z, X) \tag{13.19}\] is a valid control variable in the sense that \(T \indep \varepsilon \mid X, V\). Conditioning on \((X, V)\) recovers structural variation in \(T\), allowing identification of the average structural function. Note that \(X\) must be retained in the conditioning set; dropping it gives the stronger \(T \indep \varepsilon \mid V\), which fails whenever \(X\) has any direct effect on \(\varepsilon\).
13.10 Chapter Summary
| Symbol | Meaning |
|---|---|
| \(\hat\beta_{\mathrm{Wald}}\) | Wald estimator: \((\bar{Y}_1 - \bar{Y}_0)/(\bar{T}_1 - \bar{T}_0)\) |
| \(\hat\phi_{\mathrm{RF}}\) | Reduced form regression estimator of \(\phi = \beta\pi\) |
| \(\hat\pi_{\mathrm{FS}}\) | First-stage regression estimator of \(\pi\) |
| \(M_X\) | Annihilator matrix (within-\(X\) residuals) |
| \(\hat\beta_{\mathrm{IV}}\) | IV regression estimator Equation 13.6 |
| \(\hat\beta_{\mathrm{2SLS}}\) | 2SLS estimator Equation 13.10 |
| \(V_{\mathrm{GMM}}(\Omega)\) | Sandwich variance Equation 13.11 |
| \(V_{\mathrm{eff}}\) | Efficient GMM variance Equation 13.13 |
| \(\hat{V}_{\mathrm{GMM}}\) | Consistent variance estimator Equation 13.14 |
| \(J\)-statistic | Sargan–Hansen overidentification test |
| \(\hat\theta_{\mathrm{GEL}}\) | GEL estimator Equation 13.15 |
- Wald, IV regression, and 2SLS are one estimator. The Wald estimator, IV regression estimator, and 2SLS are numerically identical in the single-instrument case: they are different representations of the same sample analog of the identification formula. 2SLS extends to multiple instruments; the Wald estimator is the further special case \(Z \in \{0,1\}\), \(X\) absent.
- Structural form vs. reduced form. The structural form contains endogenous regressors; the reduced form expresses each endogenous variable as a function of exogenous variables. The reduced form coefficient \(\phi = \beta\pi\) is estimable by OLS; \(\beta\) is recovered only via the ratio \(\hat\phi_{\mathrm{RF}}/\hat\pi_{\mathrm{FS}}\).
- 2SLS as moment-of-moments. 2SLS is the method-of-moments estimator for the IV orthogonality condition \(\E[W\varepsilon] = 0\), an instance of Chapter 10’s estimating-equation framework.
- GMM and efficient weighting. Efficient GMM achieves the semiparametric efficiency bound; 2SLS is efficient under homoskedasticity but not generally under heteroskedasticity. All standard errors should be heteroskedasticity-robust; cluster-robust when observations within groups share common shocks.
- Weak instruments. Weak instruments cause finite-sample bias toward OLS, non-Gaussian distributions, and size distortion. The first-stage \(F\)-statistic is a relevance diagnostic, not a validity certificate. Anderson–Rubin confidence sets provide weak-instrument-robust inference.
- GEL. GEL estimators achieve the efficient GMM variance in one step without a preliminary weighting step. To higher order, they have smaller bias than two-step GMM in overidentified models.
- Control function. In the linear model, the control function approach is numerically equivalent to 2SLS; it provides a direct test of endogeneity via the \(t\)-statistic on \(\hat\eta\). In nonlinear models, related control-variable methods exist under the stronger independence and scalar-monotonicity conditions of Imbens and Newey (2009).
13.11 Problems
1. The Wald estimator as a ratio-of-moments estimator.
- Augment \(\beta\) with an intercept \(\alpha\) and express the structural model as the solution to the two-dimensional moment condition \(\E[(1, Z)^\top(Y - \alpha - \beta T)] = 0\). Verify exact identification and solve to recover \(\beta = \Delta_Y/\Delta_T\).
- Using \(\bar{Y}_z \xrightarrow{p} \mu_Y(z)\) and \(\bar{T}_z \xrightarrow{p} \mu_T(z)\), prove \(\hat\beta_{\mathrm{Wald}} \xrightarrow{p} \beta\) via the continuous mapping theorem.
- Apply the delta method to \((\hat\Delta_Y, \hat\Delta_T)^\top\) to show \(\sqrt{n}(\hat\beta_{\mathrm{Wald}} - \beta) \xrightarrow{d} N(0, V)\) where \(V = \Delta_T^{-2}\sum_{z \in \{0,1\}}\mathrm{Var}(Y_i - \beta T_i \mid Z_i = z)/p_z\), and confirm this matches the IV variance formula Equation 13.12 in the scalar no-covariate case.
2. Matrix form of 2SLS and why second-stage standard errors are wrong.
Let \(\mathbf{D} \in \mathbb{R}^{n \times k}\) be the full regressor matrix and \(\mathbf{W} \in \mathbb{R}^{n \times m}\) the full instrument matrix. Let \(P_\mathbf{W} = \mathbf{W}(\mathbf{W}^\top\mathbf{W})^{-1}\mathbf{W}^\top\).
- Show the 2SLS estimator can be written as \(\hat\theta_{\mathrm{2SLS}} = (\mathbf{D}^\top P_\mathbf{W}\mathbf{D})^{-1}\mathbf{D}^\top P_\mathbf{W}\mathbf{Y}\).
- In the single-instrument, no-covariate case, verify from the matrix formula that \(\hat\beta_{\mathrm{2SLS}} = \hat\beta_{\mathrm{IV}}\).
- The second-stage OLS uses \(\hat{\mathbf{D}}\) in place of \(\mathbf{D}\). Let \(\hat{\boldsymbol\varepsilon}_{\mathrm{2nd}} = \mathbf{Y} - \hat{\mathbf{D}}\hat\theta_{\mathrm{2SLS}}\). Show \(\hat{\boldsymbol\varepsilon}_{\mathrm{2nd}} \neq \mathbf{Y} - \mathbf{D}\hat\theta_{\mathrm{2SLS}}\) in general. Explain why this discrepancy makes the second-stage OLS standard errors invalid, and identify the correct residuals for the sandwich variance Equation 13.14.
3. Efficiency of GMM and the Sargan–Hansen \(J\)-statistic.
- Prove \(V_{\mathrm{GMM}}(\Omega) \succeq V_{\mathrm{eff}}\) for every positive-definite \(\Omega\), where \(V_{\mathrm{eff}} = (A^\top\Sigma^{-1}A)^{-1}\). (Hint: factor \(V_{\mathrm{GMM}}(\Omega) - V_{\mathrm{eff}}\) as \(C^\top\Sigma^{-1}C\) for a suitable matrix \(C\).)
- Under homoskedasticity, show that 2SLS is the efficient GMM estimator by verifying \(\Omega_{\mathrm{2SLS}}\) is a scalar multiple of \(\Sigma^{-1}\).
- Return to the two-instrument example. At \(\hat\beta \approx 0.764\) with \(\hat\Sigma = I_2\) and \(n = 200\), compute \(J = n\,\hat{U}(\hat\beta)^\top\hat\Sigma^{-1}\hat{U}(\hat\beta)\) and determine using the \(\chi^2_1\) critical value at the 5% level whether the overidentifying restriction is rejected.
4. GEL first-order conditions and the minimum-discrepancy dual.
- For EL, \(G(v) = -\log(1 - v)\). Write the first-order condition for the inner supremum and show it implies \(\sum_i \hat\pi_i U_i(\theta) = 0\) where \(\hat\pi_i \propto (1 - \hat\lambda^\top U_i(\theta))^{-1}\).
- For CUE, \(G(v) = v + v^2/2\). Solve the inner supremum explicitly at fixed \(\theta\) to show \(\hat\lambda = -[n^{-1}\sum_i U_i(\theta)U_i(\theta)^\top]^{-1}\bar{U}(\theta)\), and confirm the profile objective equals \(\frac{1}{2}\bar{U}^\top[n^{-1}\sum_i U_iU_i^\top]^{-1}\bar{U}\).
- In the exactly identified case (\(m = k\)), show that \(\hat\lambda = 0\) at any GEL solution \(\hat\theta\). Conclude that every GEL estimator coincides with the just-identified GMM estimator and the empirical probabilities all equal \(1/n\).
5. Control function, endogeneity testing, and the limits of instrument diagnostics.
Let \(\varepsilon = \rho\eta + \xi\) with \(\rho = \mathrm{Cov}(\varepsilon,\eta)/\mathrm{Var}(\eta)\).
- Show \(\E[\xi] = 0\) and \(\mathrm{Cov}(\xi, \eta) = 0\) by construction. Then assume \(\E[\varepsilon \mid \eta, Z] = \rho\eta\) and verify \(\E[\xi \mid \eta, Z] = 0\). Explain why this renders \(T\) exogenous in the augmented regression of \(Y\) on \((T, \eta)\).
- Show that the coefficient on \(\hat\eta\) in the augmented regression is a consistent estimator of \(\rho\), and connect the \(t\)-test on \(\hat\eta\) to the Hausman (1978) endogeneity test.
- Suppose an instrument \(Z\) affects wages both through education and through a direct network effect, but the model is exactly identified. Explain why neither the first-stage \(F\)-test nor the control function endogeneity test can detect this exclusion-restriction violation, and what additional information would be needed.