9  Sensitivity Analysis and Partial Identification

NoteLearning Objectives

By the end of this chapter, students should be able to:

  1. Distinguish sampling uncertainty, model misspecification, and identification uncertainty, and explain why confidence intervals do not measure the credibility of identifying assumptions.
  2. Formulate a sensitivity analysis by introducing a sensitivity parameter \(\lambda\) and reporting the sensitivity curve, bounds, or tipping point associated with it.
  3. Carry out a sensitivity analysis for unmeasured confounding under back-door adjustment, including the linear bias decomposition and the binary-confounder formula.
  4. State and use the three canonical sensitivity models: the Rosenbaum \(\Gamma\)-sensitivity model, the E-value of VanderWeele and Ding (2017), and the marginal sensitivity model of Tan (2006) and Zhao et al. (2019).
  5. Interpret sensitivity parameters through benchmarking with observed covariates.
  6. State and prove Manski’s no-assumption bound on the average treatment effect, and contrast point identification with partial identification.
  7. Recognize positivity and overlap violations as a form of identification failure, and interpret trimming as a change of estimand.
  8. Explain why weak-instrument diagnostics do not address exogeneity or exclusion violations, and apply the IV bias formula for a scalar instrument.
  9. Describe how sensitivity analysis applies to mediation assumptions through a residual-correlation parameter.
  10. Recognize sensitivity analysis as complementary to the modern estimation methods of Chapters 10–13: orthogonality and cross-fitting protect against nuisance-estimation error, not against violations of causal identification assumptions.

9.1 Why Sensitivity Analysis?

Chapters 5–8 developed a sequence of identification strategies: back-door adjustment, propensity-score re-weighting, instrumental variables, and front-door and mediation analysis. Each strategy identifies a causal parameter as a functional of the observed-data distribution under a corresponding set of assumptions. Each set of assumptions is, however, stated in terms of unobserved quantities — potential outcomes, hypothetical interventions, or unmeasured variables — and cannot be verified from the data alone.

The chapter ahead (Chapter 10) opens by noting that identification does not by itself provide a statistically reliable estimator: once a parameter is identified, a separate theory of estimation and inference is still needed. The present chapter makes the complementary point, which closes out Part II:

Statistical reliability does not by itself validate identification.

A confidence interval reports sampling variation conditional on a maintained identification assumption. It is silent about whether that assumption is correct. A causal analysis is credible only if both components — identification and estimation — are credible. Sensitivity analysis is the tool for reporting how much the conclusion depends on the first.

9.1.1 Sampling Uncertainty vs. Identification Uncertainty

Consider an analyst who reports \(\hat\tau = 2.0\), 95% CI \(= [1.2,\; 2.8]\). This interval answers one question and only one:

If the identifying assumptions are correct, what range of values of \(\tau\) is consistent with the observed sampling variation?

As \(n \to \infty\) the interval will shrink around \(\tau\), provided the identifying assumptions hold. If the assumptions do not hold, the interval will instead shrink around a biased target. More data drawn from the same observational regime do not remove this bias.

Formally, there are three distinct sources of error:

  • Sampling uncertainty: \(\hat P_n \neq P\). The empirical distribution differs from the population. Vanishes as \(n \to \infty\).
  • Model misspecification: the working parametric or semiparametric model for a nuisance function is incorrect. Partly addressable by flexible modeling, doubly robust construction, and cross-fitting (Chapters 10–12).
  • Identification uncertainty: the causal parameter \(\psi\) is not in fact identified by the functional \(\Psi(P)\) assumed by the analysis. Does not vanish merely by increasing the sample size from the same observational regime; can be reduced only by adding new assumptions, new design information, or new measurements.

The usual confidence interval addresses only the first; the estimation chapters to come address primarily the second; this chapter addresses the third.

WarningStatistical Significance and Causal Credibility

A narrow, highly significant confidence interval is not evidence that the identifying assumptions are correct. It is evidence that, if they are correct, \(\psi\) is close to the estimate. A sensitivity analysis is required to assess the second “if.”

9.1.2 The Role of Sensitivity Analysis

Sensitivity analysis is a structured way to vary the strength of violations of identifying assumptions and to report how the causal conclusion responds. It is not a test of the assumptions — they involve unobserved quantities and are therefore not testable. It is a statement of conditional robustness: how far can the assumptions be violated before the substantive conclusion changes?

Three reporting objects:

  • Sensitivity curve. A plot of the estimand against a single sensitivity parameter \(\lambda\), with \(\lambda=0\) recovering the baseline identifying assumption.
  • Sensitivity bounds. The range of estimand values consistent with any \(\lambda\) in a specified plausibility set \(\Lambda\).
  • Tipping point. The smallest violation magnitude sufficient to change the sign, magnitude, or statistical significance of the conclusion.

9.2 A General Framework for Sensitivity Analysis

The three views of sensitivity analysis — curves, bounds, tipping points — all arise from a single construction. Let \(\psi\) be the causal estimand and \(A_0\) the baseline identifying assumption with \(\psi = \Psi(P)\). A sensitivity model is a one-parameter relaxation \(\{A_\lambda : \lambda \in \Lambda\}\) of \(A_0\), with \(A_0\) recovered at \(\lambda = 0\). Under \(A_\lambda\): \[\psi(\lambda) = \Psi(P;\, \lambda), \qquad \Psi(P;\, 0) = \Psi(P). \tag{9.1}\]

The key reporting object is the set \(\{\psi(\lambda) : \lambda \in \Lambda\}\).

NoteDefinition: Sensitivity Model

A sensitivity model for the causal estimand \(\psi\) and baseline identifying assumption \(A_0\) is a pair \((\{A_\lambda\}_{\lambda \in \Lambda},\, \Psi(\cdot;\cdot))\) such that (i) \(A_0\) is recovered at \(\lambda = 0\); (ii) under \(A_\lambda\) and the observable restrictions, \(\psi = \Psi(P;\lambda)\); and (iii) \(\Psi(\cdot;\lambda)\) is a continuous function of \(\lambda\).

Sensitivity parameters used in this chapter:

  • \(\lambda\) = strength of unmeasured confounding (odds-ratio bound, E-value, or likelihood-ratio bound).
  • \(\delta\) = magnitude of a structural outcome-effect coefficient treated as zero under the baseline assumption (used both for \(U\)-outcome effect and for IV exclusion violations).
  • \(\alpha\) = propensity-score trimming threshold.
  • \(\rho\) = residual correlation between mediator and outcome disturbances.

Sensitivity curve: \(\lambda \mapsto \hat\psi(\lambda)\). Useful when the reader wants to see the effect deform as the assumption is relaxed.

Sensitivity bounds: \(\psi_L = \inf_\Lambda \psi(\lambda)\), \(\psi_U = \sup_\Lambda \psi(\lambda)\).

Tipping point: \(\lambda^\star = \inf\{\lambda \in \Lambda : \psi(\lambda) = 0\}\), or \(\lambda^\star_{\mathrm{CI}} = \inf\{\lambda : 0 \in \widehat{\mathrm{CI}}(\lambda)\}\).

NoteExample: Tipping-Point Reading

Suppose \(\hat\psi(0) = 1.8\) with 95% CI \([1.1, 2.5]\), and the sensitivity model produces \(\hat\psi(\lambda) = 1.8 - 2.0\lambda\). Then: the sensitivity curve is linear in \(\lambda\); the tipping point for the sign is \(\lambda^\star = 0.9\); and the tipping point for significance is \(\lambda^\star_{\mathrm{CI}} = 0.55\) (the value at which the lower CI endpoint \(1.1 - 2.0\lambda\) hits zero). The significance tipping point is always at least as strict as the sign tipping point.

9.3 Sensitivity to Unmeasured Confounding

The most common application is to unmeasured confounding under the back-door framework. The baseline assumption is conditional exchangeability \(Y(t) \indep T \mid X\) together with consistency and positivity. Sensitivity analysis contemplates a world with an unobserved \(U\) such that \(Y(t) \indep T \mid X, U\) but \(Y(t) \nindep T \mid X\).

X U T Y
The unmeasured-confounding DAG. Observed covariate $X$ is adjusted for; $U$ is an unobserved common cause (dashed). Back-door adjustment for $X$ alone closes the $X$ path but leaves $T \leftarrow U \to Y$ open. The causal effect $T \to Y$ (green) is the target.

9.3.1 The Bias Decomposition

Let \(\Delta_{\mathrm{obs}} = \E\{\E(Y \mid T=1, X) - \E(Y \mid T=0, X)\}\) denote the observed adjusted contrast. Under ignorability given \(X\) alone, \(\Delta_{\mathrm{obs}} = \tau\). Under the relaxed condition, the contrast departs from \(\tau\) by a bias \(B\).

TipTheorem: Bias Decomposition under Unmeasured Confounding

Suppose \(Y(t) \indep T \mid X, U\) holds along with consistency and positivity. Define \(m_t(x,u) = \E(Y \mid T=t, X=x, U=u)\) and \(g_t(x,u) = P(u \mid T=t, X=x) - P(u \mid X=x)\). Then: \[\Delta_{\mathrm{obs}} = \tau + B, \tag{9.2}\] where the confounding bias is: \[B = \E\!\left\{\sum_t (-1)^{1-t} \int m_t(X, u)\, g_t(X, u)\, du\right\}. \tag{9.3}\]

Under conditional exchangeability given \((X, U)\) and consistency: \[\E[Y(t)] = \E\!\left\{\int m_t(X, u)\, P(u \mid X)\, du\right\}.\] The conditional mean given \((T, X)\) marginalizes \(U\) over its distribution conditional on \(T\): \[\E(Y \mid T=t, X) = \int m_t(X, u)\, P(u \mid T=t, X)\, du.\] Taking expectations over \(X\) and subtracting: \[\Delta_{\mathrm{obs}} - \tau = \E\!\left\{\int m_1(X, u)\,[P(u \mid T=1, X) - P(u \mid X)]\, du\right\} - \E\!\left\{\int m_0(X, u)\,[P(u \mid T=0, X) - P(u \mid X)]\, du\right\} = B. \quad\square\]

NoteRemark: Bias in Words

The observed adjusted contrast is the true effect plus a term controlled by (the \(U\)-outcome association) \(\times\) (the \(U\)-treatment imbalance). Every sensitivity model in Section 9.4 is a formal version of this slogan.

9.3.2 A Simple Linear Sensitivity Model

In the fully linear model \(Y = \alpha + \tau T + \gamma^\top X + \delta U + \varepsilon\) with \(T = h(X, U, \eta)\):

TipLemma: Linear Sensitivity Bias

Under the linear outcome model and conditional exchangeability given \((X, U)\): \[B = \delta \cdot \E\!\left\{\E(U \mid T=1, X) - \E(U \mid T=0, X)\right\}. \tag{9.4}\]

Proof. In the linear model, \(m_t(x, u) = \alpha + \tau t + \gamma^\top x + \delta u\). The components of \(m_t\) not depending on \(u\) multiply integrals of \(g_t(X, u)\), which integrate to zero. The \(\delta u\) term gives Equation 9.4. \(\square\)

This gives the canonical teaching decomposition: \[\text{bias} \approx \underbrace{\delta}_{U\text{-outcome effect}} \;\times\; \underbrace{\E\{\E(U \mid T=1, X) - \E(U \mid T=0, X)\}}_{U\text{-treatment imbalance}}. \tag{9.5}\]

Both factors are sensitivity parameters because \(U\) is unobserved.

9.3.3 Binary Unmeasured Confounder

When \(U \in \{0,1\}\), define \(p_t(x) = P(U=1 \mid T=t, X=x)\).

TipLemma: Binary-Confounder Bias

If \(U \in \{0,1\}\) and the \(U\)-outcome contrast is constant across \((t, x)\) at value \(\delta\), then: \[B = \delta \cdot \E\{p_1(X) - p_0(X)\}. \tag{9.6}\]

Proof. With \(U \in \{0,1\}\), \(m_t(x,u) = m_t(x,0) + u\,\delta\). Substituting into Equation 9.3: \(B = \delta\,\E\{P(U{=}1 \mid T{=}1, X) - P(U{=}1 \mid T{=}0, X)\}\). \(\square\)

Formula Equation 9.6 is the workhorse of applied sensitivity analysis. A sensitivity analysis consists of computing \(\hat\tau_{\mathrm{sens}} = \hat\Delta_{\mathrm{obs}} - \hat B\) over a grid of \((\delta,\, \E\{p_1-p_0\})\) values.

9.3.4 A Sensitivity Table

For reporting, a table tabulates the adjusted effect over a grid. Here for observed adjusted effect \(\hat\Delta_{\mathrm{obs}} = 2.0\) (cells show \(\hat\tau_{\mathrm{sens}} = \hat\Delta_{\mathrm{obs}} - \delta(p_1-p_0)\)):

\(U\)-outcome effect \(\delta\) \(p_1 - p_0 = 0.10\) \(p_1 - p_0 = 0.30\) \(p_1 - p_0 = 0.50\)
weak (\(\delta = 2\)) 1.80 1.40 1.00
moderate (\(\delta = 4\)) 1.60 0.80 0.00
strong (\(\delta = 8\)) 1.20 −0.40 −2.00

The tipping point for the sign is the diagonal cell \((\delta=4,\, p_1-p_0=0.5)\).

9.4 Three Canonical Sensitivity Models

The bias decomposition Equation 9.3 is a general identity. Three canonical models specialize it by imposing a structural restriction on either \(g_t\) or on the induced observed-data relationships. They are presented in order of increasing generality of the estimator they support.

9.4.1 Rosenbaum’s \(\Gamma\)-Sensitivity Model

NoteDefinition: \(\Gamma\)-Sensitivity Model (Rosenbaum 2002)

The \(\Gamma\)-sensitivity model at level \(\Gamma \geq 1\) is the set of distributions \(P(U \mid T, X)\) satisfying: \[\frac{1}{\Gamma} \leq \frac{P(T=1 \mid X, U=u)/P(T=0 \mid X, U=u)}{P(T=1 \mid X, U=u')/P(T=0 \mid X, U=u')} \leq \Gamma. \tag{9.7}\]

\(\Gamma = 1\) recovers conditional exchangeability given \(X\) alone. \(\Gamma = 2\) states units with the same \(X\) may differ by up to a factor of 2 in treatment odds due to unmeasured \(U\).

TipTheorem: \(\Gamma\)-Bounds on the Rank Statistic (Rosenbaum 2002)

For a matched-pair design, the null distribution of the sign-rank statistic under the \(\Gamma\)-model is bounded by Binomial reference distributions with treatment probabilities \(\Gamma/(1+\Gamma)\) and \(1/(1+\Gamma)\).

The practical outcome is a tipping \(\Gamma^\star\): the smallest \(\Gamma\) at which the test fails to reject. Small \(\Gamma^\star\) (near 1) indicates a fragile conclusion; large \(\Gamma^\star\) indicates a robust one.

9.4.2 The E-Value and the VanderWeele–Ding Bound

Ding and VanderWeele (2016) and VanderWeele and Ding (2017) proposed a sensitivity summary that requires no matched design and takes the form of a single closed-form number. It has become the most widely reported sensitivity summary in the applied literature.

Define two risk-ratio sensitivity parameters: \(\mathrm{RR}_{TU} = \max_x P(U=1 \mid T=1, X=x)/P(U=1 \mid T=0, X=x)\) (maximum RR of \(U\) with \(T\)), and \(\mathrm{RR}_{UY} = \max_{t,x} P(Y=1 \mid T=t, X=x, U=1)/P(Y=1 \mid T=t, X=x, U=0)\) (maximum RR of \(U\) with \(Y\)). The bias factor is: \[B(\mathrm{RR}_{TU}, \mathrm{RR}_{UY}) = \frac{\mathrm{RR}_{TU}\, \mathrm{RR}_{UY}}{\mathrm{RR}_{TU} + \mathrm{RR}_{UY} - 1}. \tag{9.8}\]

TipTheorem: VanderWeele–Ding Bound

\(\mathrm{RR}_{TY \mid X}^{\mathrm{true}} \geq \mathrm{RR}_{TY \mid X}^{\mathrm{obs}} / B(\mathrm{RR}_{TU}, \mathrm{RR}_{UY})\).

The observed risk ratio can be explained away only if both \(\mathrm{RR}_{TU}\) and \(\mathrm{RR}_{UY}\) are at least as large as the E-value: \[\mathrm{EV} = \mathrm{RR}_{TY \mid X}^{\mathrm{obs}} + \sqrt{\mathrm{RR}_{TY \mid X}^{\mathrm{obs}}\,(\mathrm{RR}_{TY \mid X}^{\mathrm{obs}} - 1)}. \tag{9.9}\]

Fix \(x\) and write \(p_t = P(U{=}1 \mid T{=}t, X{=}x)\) and \(r_t\) for the outcome risk ratio of \(U\) at arm \(t\). The ratio of observed to causal risk ratio is \([1 + p_1(r_1 - 1)][1 + p(r_0 - 1)] / \{[1 + p(r_1 - 1)][1 + p_0(r_0 - 1)]\}\). Maximizing over admissible \((p_0, p_1, p, r_0, r_1)\) subject to \(\max p_1/p_0 \le \mathrm{RR}_{TU}\) and \(\max r_t \le \mathrm{RR}_{UY}\), the maximum is attained at \(p_0 = 0\), \(r_0 = 1\), giving the bound Equation 9.8.

For Equation 9.9: set \(\mathrm{RR}_{TU} = \mathrm{RR}_{UY} = e\) and ask for the smallest \(e\) such that \(B(e,e) = R\) (where \(R = \mathrm{RR}_{TY|X}^{\mathrm{obs}}\)). This gives \(e^2/(2e-1) = R\), so \(e = R + \sqrt{R(R-1)}\). \(\square\)

NoteExample: An E-Value Calculation

An observational study reports \(\mathrm{RR}_{TY|X}^{\mathrm{obs}} = 2.5\) with 95% CI \([1.7, 3.7]\).

\(\mathrm{EV} = 2.5 + \sqrt{2.5 \times 1.5} \approx 4.44\). To reduce the point estimate to the null, an unmeasured confounder must have both a RR association with treatment and an outcome-conditional RR association of at least 4.4.

\(\mathrm{EV}_{\mathrm{CL}} = 1.7 + \sqrt{1.7 \times 0.7} \approx 2.79\). To reduce the lower CI to the null, the associations must each be at least 2.8.

NoteRemark: Why a Single Number

The E-value collapses the two parameters to one by asking about the symmetric diagonal \(\mathrm{RR}_{TU} = \mathrm{RR}_{UY} = e\). This is conservative: asymmetric configurations can also explain away the effect. The E-value is reported because it is a single interpretable scalar, not because it is the tightest possible summary.

9.4.3 The Marginal Sensitivity Model

NoteDefinition: Marginal Sensitivity Model (Tan 2006)

The marginal sensitivity model at level \(\Lambda \geq 1\) bounds the odds ratio between the nominal propensity \(\pi(x) = P(T=1 \mid X=x)\) and the true propensity \(\pi^{\mathrm{true}}(x,u) = P(T=1 \mid X=x, U=u)\): \[\frac{1}{\Lambda} \leq \frac{\{1-\pi^{\mathrm{true}}(X,U)\}\,\pi(X)}{\pi^{\mathrm{true}}(X,U)\,\{1-\pi(X)\}} \leq \Lambda \quad\text{a.s.} \tag{9.10}\]

\(\Lambda = 1\) recovers no unmeasured confounding.

TipTheorem: MSM Bounds on the ATE (Zhao et al. 2019)

Under the marginal sensitivity model at level \(\Lambda\), the ATE satisfies \(\tau_L(\Lambda) \leq \tau \leq \tau_U(\Lambda)\), where the sharp bounds are obtained by solving linear programs over the admissible weight perturbations induced by Equation 9.10.

Proof sketch. The true IPW weights equal the nominal weights multiplied by a perturbation factor \(\phi_i \in [\Lambda^{-1}, \Lambda]\). Because the target is linear in \(\phi\), the extrema are attained at \(\phi_i \in \{\Lambda^{-1}, \Lambda\}\). Zhao et al. (2019) give closed-form percentile expressions. \(\square\)

NoteRemark: MSM as the Estimation-Oriented Sensitivity Model

The MSM is the natural model to pair with IPW, AIPW, TMLE, and DML (Chapters 10–12), because it bounds the violation on the weight scale rather than on the unobservable \(P(T \mid X, U)\). The sensitivity analysis reduces to a perturbation of the nominal weights by a factor in \([\Lambda^{-1}, \Lambda]\).

9.4.4 Comparing the Three Models

Rosenbaum \(\Gamma\) E-value MSM (\(\Lambda\))
Parameter bounds odds ratio of treatment given \(X, U\) pair of risk-ratio associations \(\mathrm{RR}_{TU}, \mathrm{RR}_{UY}\) odds ratio of nominal to true propensity
Natural estimator matched-pair and weighted rank tests any relative-effect summary IPW, AIPW
One-number summary tipping \(\Gamma^\star\) \(\mathrm{EV}\) (symmetric diagonal) bounds \([\tau_L, \tau_U]\) or tipping \(\Lambda^\star\)
Primary reference Rosenbaum (2002) VanderWeele and Ding (2017) Tan (2006); Zhao et al. (2019)
NoteRemark: Which Model to Report

Reporting more than one summary is common and useful. The E-value is almost always included because it is a single number that is widely understood; the \(\Gamma\)-value is reported in matched or rank-based analyses; and the MSM bounds are reported when the primary estimator is IPW or AIPW. The three are not substitutes: each bounds a different object.

9.5 Benchmarking Sensitivity Parameters

Every sensitivity model has parameters that are not identified by the data. A sensitivity curve of the form “at \(\lambda = 0.2\) the effect is halved” is mathematically precise but scientifically empty until \(\lambda = 0.2\) has been anchored to something concrete. Benchmarking gives sensitivity parameters an empirical referent by comparing them to the analogous quantities computed for the observed covariates (Cinelli and Hazlett 2020).

9.5.1 Benchmarking Against Observed Covariates

For the binary-confounder setup, compute for each observed covariate \(X_j\): the coefficient on \(X_j\) in a regression of \(Y\) on \((T, X_1, \ldots)\) (the role of \(\delta\)), and the difference in conditional means of \(X_j\) across treatment arms (the role of the imbalance). Plot these against the tipping contour.

For the E-value, for each \(X_j\) compute the bias factor: \[B_j = \frac{\mathrm{RR}_{TX_j}\,\mathrm{RR}_{X_jY}}{\mathrm{RR}_{TX_j} + \mathrm{RR}_{X_jY} - 1}.\] If \(B_j < \mathrm{EV}\) for every observed covariate, an unmeasured confounder inducing the observed effect would have to be at least as strong as some observed covariate. Cinelli and Hazlett (2020) develop partial-\(R^2\) benchmarks for the linear regression setting.

NoteExample: Benchmarked Reporting Language

The estimated effect of the job-training program on annual earnings is $1,800 with 95% CI \([\$900, \$2,700]\). The E-value for the lower confidence limit is 2.3. Among the observed covariates, the largest bias factor is that of prior-year earnings, at \(B = 2.0\); baseline education has \(B = 1.5\). The conclusion is robust to an unmeasured confounder as strong as any observed covariate, but could be overturned by a confounder approximately 15% stronger than the strongest observed one.

WarningSensitivity without Benchmarking

Reporting “the effect is robust up to \(\Gamma = 2\)” without a statement of whether \(\Gamma = 2\) is plausible in the study at hand is the sensitivity-analysis analog of reporting a confidence interval without stating the confidence level: technically complete, but uninformative.

9.6 Partial Identification and Bounds

The sensitivity models of Section 9.4 each restrict the magnitude of a violation by a single parameter. A more radical approach imposes no parametric restriction at all — asking what the observed data can say about the causal parameter without any untestable identifying assumption. The answer is a set of values: the theory of partial identification (Manski 1990, 2003).

9.6.1 Point Identification vs. Partial Identification

Under point identification, \(\psi = \Psi(P)\). Under partial identification, the assumptions pin down a set: \[\psi \in \Psi_A(P) = \{\text{values of }\psi\text{ consistent with assumption set }A\text{ and distribution }P\}. \tag{9.11}\]

NoteDefinition: Sharp and Conservative Bounds

A bound \([\psi_L, \psi_U]\) is sharp under assumption set \(A\) if it equals \([\inf \Psi_A(P),\, \sup \Psi_A(P)]\). It is conservative (valid) if it contains the identified set but may be wider. Sharp bounds cannot be improved without adding assumptions; conservative bounds can sometimes be tightened by a more careful derivation.

9.6.2 Manski’s No-Assumption Bound

TipTheorem: Manski’s No-Assumption Bound (Manski 1990)

Suppose \(Y \in [y_L, y_U]\) a.s. and \(p = P(T=1)\). Then \(\tau \in [\tau_L, \tau_U]\) with: \[\tau_L = [\E(Y \mid T=1)\,p + y_L\,(1-p)] - [\E(Y \mid T=0)\,(1-p) + y_U\,p], \tag{9.12}\] \[\tau_U = [\E(Y \mid T=1)\,p + y_U\,(1-p)] - [\E(Y \mid T=0)\,(1-p) + y_L\,p]. \tag{9.13}\]

These bounds are sharp, and the width of the identified set is: \[\tau_U - \tau_L = y_U - y_L. \tag{9.14}\]

By the law of total probability: \(\E\{Y(1)\} = \E\{Y(1) \mid T=1\}\,p + \E\{Y(1) \mid T=0\}\,(1-p)\) and \(\E\{Y(0)\} = \E\{Y(0) \mid T=1\}\,p + \E\{Y(0) \mid T=0\}\,(1-p)\). Consistency identifies \(\E\{Y(1) \mid T=1\} = \E(Y \mid T=1)\) and \(\E\{Y(0) \mid T=0\} = \E(Y \mid T=0)\). The unobserved counterfactual means \(\E\{Y(1) \mid T=0\}\) and \(\E\{Y(0) \mid T=1\}\) are restricted only to \([y_L, y_U]\). Taking the Minkowski difference of the resulting intervals for \(\E\{Y(1)\}\) and \(\E\{Y(0)\}\) gives Equation 9.12Equation 9.13. Sharpness follows from the existence of degenerate potential-outcome distributions attaining each extreme. Width: \(\tau_U - \tau_L = [y_U - y_L](1-p) + [y_U-y_L]p = y_U - y_L\). \(\square\)

NoteExample: Manski Bound in a Binary Outcome

Suppose \(Y \in \{0,1\}\), \(P(T=1) = 0.4\), \(\E(Y \mid T=1) = 0.6\), \(\E(Y \mid T=0) = 0.3\). Then: \(\tau_L = (0.6)(0.4) + (0)(0.6) - [(0.3)(0.6) + (1)(0.4)] = 0.24 - 0.58 = -0.34\); \(\tau_U = (0.6)(0.4) + (1)(0.6) - [(0.3)(0.6) + (0)(0.4)] = 0.84 - 0.18 = 0.66\). The identified set \(\tau \in [-0.34, 0.66]\) includes zero: the observed positive association of 0.3 is consistent with a causal effect anywhere from a meaningful harm to a substantial benefit.

The no-assumption bound is as wide as the support of the outcome — typically too wide to be informative. Productive partial identification therefore proceeds by adding shape restrictions that narrow the set.

Monotone treatment response (\(Y(1) \geq Y(0)\) a.s.) tightens the bounds. Monotone treatment selection (\(\E\{Y(t) \mid T=1\} \geq \E\{Y(t) \mid T=0\}\)) is another common restriction. Combining these produces informative bounds even when neither alone suffices (Manski 2003).

NoteBounds as Sensitivity Analysis

Partial identification and sensitivity analysis are two views of the same underlying object. A sensitivity model with bounds \([\psi_L(\Lambda), \psi_U(\Lambda)]\) is a parametric partial-identification analysis; Manski’s no-assumption bound is the limit when \(\Lambda\) permits arbitrary violations. Intermediate models (the \(\Gamma\)-model, the MSM) lie between these extremes, trading informativeness for robustness.

9.7 Sensitivity to Positivity and Overlap Violations

Chapter 6 introduced positivity as a structural requirement for the identification formulas underlying propensity-score methods. When \(\pi(x_0) = 0\), the counterfactual mean \(\E\{Y(1) \mid X=x_0\}\) is not identified, and the ATE may not be point-identified over the full covariate support. When positivity holds but weakly, IPW weights \(1/\pi(X)\) generate extreme values that destabilize estimates.

9.7.1 Trimming as a Sensitivity Analysis

NoteDefinition: Trimmed ATE

\[\tau_\alpha = \E\{Y(1) - Y(0) \mid \alpha \leq \pi(X) \leq 1-\alpha\}, \tag{9.15}\] for \(\alpha \in [0, 0.5)\). The choice \(\alpha = 0\) recovers the full-population ATE.

TipLemma: Trimmed Estimand Identity

Let \(S_\alpha = \{X : \alpha \leq \pi(X) \leq 1-\alpha\}\) and \(q_\alpha = P(X \in S_\alpha) > 0\). Under ignorability: \[\tau_\alpha = \frac{1}{q_\alpha}\,\E\!\left\{[\mu_1(X) - \mu_0(X)]\,\mathbf{1}\{X \in S_\alpha\}\right\}. \tag{9.16}\] On the trimmed population, the IPW weights are bounded: \(w(T,X) \leq 1/\alpha\) on \(S_\alpha\).

Proof. Conditional on \(X \in S_\alpha\), positivity holds with slack \(\alpha\), so the back-door formula applies pointwise. Integrating and normalizing gives Equation 9.16. The weight bound follows directly from \(\pi(X) \geq \alpha\) on \(S_\alpha\). \(\square\)

\(\tau_\alpha\) and \(\tau_0\) are different estimands: trimming is a retargeting strategy, not a variance-reduction trick. A trimmed analysis answers “what is the causal effect on the subpopulation for which the treatment decision is not already essentially forced by covariates?”

WarningTrimming Changes the Estimand

Reporting \(\hat\tau_\alpha\) as a stabilized estimate of \(\tau\) without acknowledging that the target population has changed is a common mistake. Whenever \(\alpha > 0\), the estimand is Equation 9.15, not the full-population ATE. A proper sensitivity analysis reports \(\hat\tau_\alpha\) and \(q_\alpha\) together for a grid of \(\alpha \in \{0, 0.01, 0.05, 0.10\}\).

9.8 Sensitivity to Invalid Instruments

Of the three IV assumptions, relevance is the most empirically diagnosable. Exogeneity and exclusion involve unobserved relationships and are not testable. A sensitivity analysis for IV concentrates on these two untestable assumptions.

Relevance is a statement about \((Z, T)\) given \(X\) — observable. Exogeneity and exclusion are statements about \((Z, Y)\) given \(X\) and the unobserved \(U\) — not observable. A complete robustness argument has two parts: (a) evaluation of first-stage strength and (b) sensitivity analysis for exogeneity and exclusion.

9.8.1 Direct-Effect Violation of Exclusion

Consider the scalar-IV model with a direct effect \(\delta\) of \(Z\) on \(Y\): \[Y = \alpha + \beta T + \delta Z + \varepsilon, \qquad \E(\varepsilon \mid Z) = 0. \tag{9.17}\]

TipTheorem: IV Bias under Exclusion Violation

Under Equation 9.17 and standard IV regularity conditions: \[\frac{\mathrm{Cov}(Y,Z)}{\mathrm{Cov}(T,Z)} = \beta + \delta\,\frac{\mathrm{Var}(Z)}{\mathrm{Cov}(T,Z)}. \tag{9.18}\]

Equivalently, the Wald estimand at posited \(\delta\) is: \[\beta(\delta) = \frac{\mathrm{Cov}(Y,Z) - \delta\,\mathrm{Var}(Z)}{\mathrm{Cov}(T,Z)}. \tag{9.19}\]

Proof. Take covariance of Equation 9.17 with \(Z\): \(\mathrm{Cov}(Y,Z) = \beta\,\mathrm{Cov}(T,Z) + \delta\,\mathrm{Var}(Z) + \mathrm{Cov}(\varepsilon,Z)\). By orthogonality, \(\mathrm{Cov}(\varepsilon,Z)=0\). Dividing and rearranging gives Equation 9.18 and Equation 9.19. \(\square\)

Key pedagogical consequence: the bias is proportional to \(\mathrm{Var}(Z)/\mathrm{Cov}(T,Z)\), the inverse of the first-stage slope. A weak first stage amplifies the bias from any exclusion violation.

NoteExample: IV Sensitivity Curve

\(\mathrm{Cov}(Y,Z) = 3\), \(\mathrm{Cov}(T,Z) = 1.5\), \(\mathrm{Var}(Z) = 1\). Wald estimate: \(\hat\beta(0) = 2.0\). Applying Equation 9.19: \(\hat\beta(\delta) = (3-\delta)/1.5\). Values: \(\hat\beta(0.5) = 1.67\), \(\hat\beta(1.0) = 1.33\), \(\hat\beta(2.0) = 0.67\), \(\hat\beta(3.0) = 0\). The tipping point is \(\delta^\star = 3\).

9.8.2 Connection to LATE and Monotonicity

Under heterogeneous treatment effects, sensitivity analysis can also address monotonicity violations, with the sensitivity parameter becoming the proportion of defiers (Angrist et al. 1996). The same conceptual framework — parameterize the violation, report a curve or bound — applies.

9.9 Sensitivity in Mediation Analysis

Sensitivity analysis is especially important in mediation analysis because the identifying assumptions are strictly stronger than those for total effects. Even in a randomized trial, Assumptions 3 and 4 of sequential ignorability cannot be guaranteed.

9.9.1 Residual-Correlation Sensitivity Parameter

Consider the linear mediation system \(M = a_0 + aT + a_X^\top X + \varepsilon_M\), \(Y = b_0 + \tau' T + bM + b_X^\top X + \varepsilon_Y\), with sensitivity parameter: \[\rho = \mathrm{Corr}(\varepsilon_M, \varepsilon_Y \mid T, X). \tag{9.20}\]

If sequential ignorability holds, \(\rho = 0\): disturbances are orthogonal because any shared source of variation has been conditioned out. A nonzero \(\rho\) represents unobserved \(M\)\(Y\) confounding. Imai et al. (2010) derive the bias in the estimated NIE as a smooth function of \(\rho\), giving a sensitivity curve \(\rho \mapsto \widehat{\mathrm{NIE}}(\rho)\) with tipping point \(\rho^\star\).

A well-reported mediation sensitivity analysis states: the point estimate of the indirect effect (with sampling CI), the tipping value \(\rho^\star\), and a benchmark for what magnitude of \(\rho\) is plausible.

9.10 Sensitivity Analysis and Modern Estimators

Chapters 10–13 develop doubly robust estimation, orthogonal scores, cross-fitting, and semiparametrically efficient IV estimation. These tools make estimators more robust — but robust to a specific class of perturbations: nuisance-model misspecification. They are silent about identification.

What robust estimation does. Neyman orthogonality makes AIPW consistent if either the outcome model or propensity model is correctly specified (double robustness). With cross-fitting, nuisance estimators need only converge at rate \(n^{-1/4}\) to yield \(\sqrt{n}\)-asymptotics. Both consequences address estimation under correctly maintained identifying assumptions.

What robust estimation does not do: Orthogonality and cross-fitting do not solve unmeasured confounding, positivity failure, IV invalidity, or mediation assumption failure.

Neyman orthogonality and cross-fitting protect the estimator against nuisance-estimation error. They do not protect against violations of causal identification assumptions.

This is the reason sensitivity analysis belongs in Part II (identification) rather than Part III (estimation). Every method in Chapters 10–13 assumes identification and refines the estimation.

NoteApplied Workflow (Recommended)
  1. State the estimand (potential-outcome or do-notation).
  2. State the identifying assumptions (back-door, IV, front-door, mediation, etc.).
  3. Estimate the causal parameter (outcome regression, IPW, AIPW, DML, 2SLS, etc.).
  4. Quantify sampling uncertainty (standard errors, confidence intervals).
  5. Conduct sensitivity analysis for the least credible assumption (sensitivity curve, tipping point, or bounds, with benchmarking).
  6. Report both the point estimate with sampling CI and the sensitivity summary, with plain-language interpretation of robustness.

9.11 Lab: A Tipping-Point Analysis for an Observational ATE

DGP. Let \(X, U \overset{\mathrm{i.i.d.}}{\sim} N(0,1)\) independently. Treatment: \(P(T=1 \mid X, U) = \mathrm{expit}(-0.3 + 0.6X + 1.0U)\). Outcome: \(Y(t) = 1.5t + 0.8X + 2.0U + \varepsilon\), \(\varepsilon \sim N(0,1)\). The analyst observes \((Y, T, X)\) but not \(U\). True ATE: \(\tau = 1.5\).

Naive adjusted estimator. OLS coefficient on \(T\) in the regression of \(Y\) on \((1, T, X)\). Running 500 Monte Carlo replicates (\(n = 2000\)): mean of \(\hat\tau_X\) is 3.172, SD is 0.099. Implied 95% CI \(\approx [2.98, 3.37]\). The naive estimator is badly biased, but the CI is narrow and does not contain the truth. This is the identification-uncertainty failure mode in concrete form.

Sensitivity adjustment. Using the binary-\(U\) decomposition, bias is \(B(\alpha_U, \delta) = \delta \cdot \mathrm{imbalance}(\alpha_U)\). Sensitivity-adjusted estimate \(\hat\tau_{\mathrm{sens}} = \hat\tau_X - B\) over a grid (\(\hat\tau_X = 3.172\); the true configuration is \(\alpha_U = 1.00, \delta = 2.0\)):

Posited \(\delta\) \(\alpha_U=0.25\) (imb=0.232) \(\alpha_U=0.50\) (imb=0.440) \(\alpha_U=1.00\) (imb=0.787) \(\alpha_U=1.50\) (imb=1.025)
0.5 3.056 2.952 2.779 2.660
1.0 2.940 2.732 2.385 2.147
2.0 2.708 2.292 1.598 ✓ 1.123
3.0 2.477 1.851 0.811 0.098

The true configuration \((\alpha_U=1.00, \delta=2.0)\) gives the sensitivity-adjusted value \(1.598 \approx \tau = 1.5\) (checkmark).

Tipping points. \(\delta^\star(\alpha_U) = \hat\tau_X / \mathrm{imbalance}(\alpha_U)\). At \(\alpha_U = 1.00\): \(\delta^\star = 3.172/0.787 \approx 4.03\).

Benchmarking against observed \(X\) (outcome coefficient \(\gamma_X = 0.8\)): at \(\alpha_U = 1.00\), the unmeasured confounder would need an outcome association about \(4.03/0.8 \approx 5\) times stronger than \(X\) to zero out the estimate. A confounder as strong as \(X\) (\(\delta = 0.8\), \(\alpha_U = 1.0\)) would adjust the estimate to \(\approx 2.54\) — still clearly positive.

Four lessons: (1) The naive CI \([2.98, 3.37]\) is narrow but misses the true ATE \(\tau = 1.5\). Sampling uncertainty \(\neq\) identification uncertainty. (2) Sensitivity adjustment at the correct configuration recovers the truth to within Monte Carlo error. (3) The tipping point depends jointly on both sensitivity parameters. (4) Benchmarking against observed \(X\) anchors the analysis: the conclusion “the effect is positive” is robust to unmeasured confounders up to about \(5\gamma_X\) in outcome association.

9.12 Practical Reporting Guidelines

At minimum, a report should include: the point estimate \(\hat\psi\) and its 95% CI; a sensitivity summary (curve, bound, E-value, or tipping point); a benchmarking statement; and a plain-language interpretation.

WarningMisleading Phrases to Avoid
  • “The result is causal because the estimate is statistically significant.” Statistical significance addresses sampling uncertainty, not identification credibility.
  • “The result is robust because we used machine learning.” Flexible nuisance estimation improves the estimator; it does not validate the identifying assumptions.
  • “Sensitivity analysis showed the result is not biased.” Sensitivity analysis quantifies how much bias the identifying assumptions could produce if violated — it does not test for bias.
  • “The E-value is \(X\), therefore the effect is robust.” An E-value is robust or not depending on whether \(X\) is plausible as an unmeasured-confounder strength. A large E-value without benchmarking is an unanchored statistic.

9.13 Chapter Summary

Symbol Meaning
\(\Delta_{\mathrm{obs}}\) Observed adjusted contrast
\(B\) Confounding bias Equation 9.3
\(\Gamma\) Rosenbaum odds-ratio bound
\(\mathrm{EV}\) E-value: \(R + \sqrt{R(R-1)}\) Equation 9.9
\(\Lambda\) MSM odds-ratio bound
\(\tau_\alpha\) Trimmed ATE Equation 9.15
\(\rho\) Residual correlation for mediation Equation 9.20
\(\lambda^\star\) Tipping point for sign
\(\lambda^\star_{\mathrm{CI}}\) Tipping point for statistical significance
  1. Sampling uncertainty, model misspecification, and identification uncertainty are distinct sources of error. A confidence interval addresses only the first, and identification uncertainty does not vanish merely by collecting more data from the same observational regime.
  2. A sensitivity analysis introduces a parameter \(\lambda\) that quantifies the strength of a violation, with \(\lambda = 0\) recovering the baseline assumption. The three reporting objects are the sensitivity curve, sensitivity bounds, and the tipping point.
  3. The master bias decomposition \(\Delta_{\mathrm{obs}} = \tau + B\) underwrites sensitivity analysis for unmeasured confounding, specializing cleanly in the linear and binary-\(U\) cases.
  4. Three canonical sensitivity models differ by what the sensitivity parameter bounds: Rosenbaum’s \(\Gamma\) bounds a treatment-odds ratio; the E-value \(\mathrm{EV} = R + \sqrt{R(R-1)}\) gives a closed-form symmetric threshold; and the MSM bounds an odds ratio of nominal to true propensity.
  5. Sensitivity parameters need benchmarking against observed covariates to be interpretable.
  6. Partial identification reports an identified set rather than a single value. Manski’s no-assumption bound is as wide as the support of the outcome; shape restrictions narrow it.
  7. Positivity violations are an identification failure, not a variance problem. Trimming changes the estimand to a region of covariate overlap; it is a retargeting strategy, not a correction for the original population ATE.
  8. Invalid instruments produce bias \(\delta \cdot \mathrm{Var}(Z)/\mathrm{Cov}(T,Z)\) amplified by weak first stages. Weak-instrument diagnostics do not address exclusion violations.
  9. Mediation sensitivity analysis proceeds by a residual-correlation parameter \(\rho\) that captures unmeasured \(M\)\(Y\) confounding.
  10. Modern estimation methods (AIPW, TMLE, DML, efficient GMM) address nuisance estimation, not identification. Neyman orthogonality protects the estimator; it does not protect the causal claim.

9.14 Problems

1. Sampling vs. identification uncertainty. Explain in your own words the difference between a narrow sampling confidence interval and a robust causal conclusion. Construct an example data-generating process, with numerical parameters, in which the CI for a back-door-adjusted ATE is very narrow (SD below 0.05) yet the true ATE lies outside it. Identify the identifying assumption that is violated and the magnitude of the violation.

2. Binary unmeasured confounder. Suppose \(\hat\Delta_{\mathrm{obs}} = 2.0\). Assume a binary unmeasured confounder \(U\) with constant outcome contrast \(\delta = 4\) and imbalance \(p_1 - p_0 = 0.3\).

  1. Use the Binary-Confounder Bias Lemma to compute \(B\) and the sensitivity-adjusted estimate \(\hat\tau_{\mathrm{sens}}\).
  2. What outcome contrast \(\delta\) would zero out the estimate at the same imbalance level?
  3. What imbalance \(p_1 - p_0\) would zero out the estimate at \(\delta = 4\)?

3. Tipping point in a linear bias model. For \(\hat\Delta_{\mathrm{obs}} = 1.5\) and bias model \(B(\lambda) = 0.4\lambda\):

  1. Find the tipping point \(\lambda^\star\) for the sign of the adjusted estimate.
  2. Suppose the sampling standard error is 0.3, independent of \(\lambda\). Find the tipping point \(\lambda^\star_{\mathrm{CI}}\) for the lower confidence limit to reach zero (95% level). Compare to \(\lambda^\star\).

4. Benchmarking. A researcher reports a sensitivity analysis in which the tipping point for the \(U\)-outcome effect is \(\delta^\star = 3.0\). Observed covariate outcome coefficients are \((\hat\gamma_{\mathrm{age}}, \hat\gamma_{\mathrm{income}}, \hat\gamma_{\mathrm{education}}) = (0.3, 1.8, 2.4)\). Identify the strongest benchmark. Is the result robust to an unmeasured confounder as strong as the strongest observed covariate? Write two sentences of interpretation suitable for an applied report.

5. E-value calculation. An observational study reports an adjusted risk ratio of \(\mathrm{RR}_{TY \mid X}^{\mathrm{obs}} = 1.9\) with 95% CI \([1.3, 2.8]\).

  1. Compute the E-value for the point estimate using Equation 9.9.
  2. Compute the E-value for the lower confidence limit.
  3. Write one sentence of interpretation for each.

6. IV sensitivity curve. In a scalar-IV model with \(\mathrm{Cov}(Y,Z) = 3\), \(\mathrm{Cov}(T,Z) = 1.5\), \(\mathrm{Var}(Z) = 1\):

  1. Use Equation 9.19 to compute \(\hat\beta(\delta)\) for \(\delta \in \{0, 0.5, 1.0, 1.5, 2.0\}\).
  2. Find the tipping point \(\delta^\star\).
  3. Repeat for the weaker-instrument case \(\mathrm{Cov}(T,Z) = 0.5\) (keeping other quantities fixed). Explain why the same \(\delta\) produces a larger bias.

7. Manski bound calculation. Suppose \(Y \in [0, 10]\), \(P(T=1) = 0.3\), \(\E(Y \mid T=1) = 6.5\), \(\E(Y \mid T=0) = 4.0\).

  1. Compute Manski’s no-assumption bound using Equation 9.12Equation 9.13.
  2. Compute the width and verify Equation 9.14.
  3. The observed association is 2.5. Does the no-assumption identified set include zero? Discuss what this implies about the informational content of the data alone.

8. Positivity sensitivity. Explain why the trimmed estimand \(\tau_\alpha\) of Equation 9.15 is generally not equal to the untrimmed ATE \(\tau\). In a dataset with \(\pi(X)\) distributed uniformly on \([0,1]\), what fraction of the population is retained at trimming levels \(\alpha \in \{0.01, 0.05, 0.10, 0.20\}\)? Under what scientific questions is \(\tau_\alpha\) preferable to \(\tau\) as a target?

9. Mediation sensitivity. In a mediation study of a behavioral intervention (\(T\)) on depression (\(Y\)) through sleep quality (\(M\)), explain why randomization of \(T\) alone does not eliminate the need for a sensitivity analysis for the indirect effect. Draw the DAG that describes the residual concern and identify which arrow corresponds to a nonzero \(\rho\) in Equation 9.20. Give a plausible scientific story in which the residual correlation could be large.

10. Modern estimators and identification. A colleague argues that because DML and AIPW are “doubly robust and orthogonalized,” they “automatically correct for unmeasured confounding provided the machine-learning models are good enough.” Explain why this is incorrect, referring to the bias decomposition of the Bias Decomposition Theorem. Identify which term in that decomposition machine-learning methods can estimate consistently, and which term they cannot estimate at all.

Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444–55.
Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society, Series B 82 (1): 39–67. https://doi.org/10.1111/rssb.12348.
Ding, Peng, and Tyler J. VanderWeele. 2016. “Sensitivity Analysis Without Assumptions.” Epidemiology 27 (3): 368–77. https://doi.org/10.1097/EDE.0000000000000457.
Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.” Statistical Science 25 (1): 51–71.
Manski, Charles F. 1990. “Nonparametric Bounds on Treatment Effects.” American Economic Review, Papers and Proceedings 80 (2): 319–23.
Manski, Charles F. 2003. Partial Identification of Probability Distributions. Springer. https://doi.org/10.1007/b97478.
Rosenbaum, Paul R. 2002. Observational Studies. 2nd ed. Springer. https://doi.org/10.1007/978-1-4757-3692-2.
Tan, Zhiqiang. 2006. “A Distributional Approach for Causal Inference Using Propensity Scores.” Journal of the American Statistical Association 101 (476): 1619–37. https://doi.org/10.1198/016214506000000023.
VanderWeele, Tyler J., and Peng Ding. 2017. “Sensitivity Analysis in Observational Research: Introducing the E-Value.” Annals of Internal Medicine 167 (4): 268–74. https://doi.org/10.7326/M16-2607.
Zhao, Qingyuan, Dylan S. Small, and Bhaswar B. Bhattacharya. 2019. “Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap.” Journal of the Royal Statistical Society, Series B 81 (4): 735–61. https://doi.org/10.1111/rssb.12327.