8  Mediation and Front-Door Identification

NoteLearning Objectives

By the end of this chapter, students should be able to:

  1. Explain the distinction between the total causal effect of \(T\) on \(Y\) and a pathway-specific effect that operates through a mediator \(M\), and describe why this distinction matters scientifically.
  2. Draw the prototype mediation DAG, write its structural equations, and identify the direct and indirect pathways.
  3. Define the controlled direct effect (CDE) using the do-operator, identify it via the back-door formula for the joint intervention \((T, M)\), and explain why it depends on the fixed level \(m\).
  4. Define the natural direct and indirect effects (NDE, NIE) using the potential outcomes notation \(Y(t, M(t'))\), state the NDE\(+\)NIE\(=\)TE decomposition, and explain why these quantities involve cross-world counterfactuals.
  5. State the four sequential ignorability assumptions for identification of natural effects, write the mediation formula, and identify which assumption is violated when there is an unmeasured treatment-induced mediator–outcome confounder.
  6. Set up the Baron–Kenny three-equation system, derive the product and difference formulas for the indirect effect, and explain why the decomposition fails in nonlinear or interaction models.
  7. State the three front-door conditions, derive the front-door formula using the do-calculus, and explain why the front-door graph enables identification despite unobserved \(T\)\(Y\) confounding.
  8. Contrast mediation analysis and instrumental variables on the dimensions of variable position, identification goal, and key assumption.

8.1 Motivation: Mechanisms

The identification results of Chapters 5–7 all answer the same question: what is the total causal effect of \(T\) on \(Y\)? Mediation analysis asks a finer question: through what mechanism does that effect operate?

The total effect of \(T\) on \(Y\) may flow along multiple causal pathways. Some of this effect passes through an intermediate variable \(M\) — the mediator — along the path \(T \to M \to Y\). The remainder flows directly along \(T \to Y\), bypassing the mediator entirely. Mediation analysis aims to study mechanisms by defining direct and indirect effect concepts that target each pathway; only some of these concepts yield an additive decomposition of the total effect.

This mechanism question matters for scientific and policy reasons. In a clinical trial of a behavioral intervention (\(T\)) on depression (\(Y\)), a researcher may want to know how much of the benefit operates through improved sleep quality (\(M\)) versus other pathways — because if sleep is the main channel, targeting sleep directly may be a more efficient intervention. In an economics study of education (\(T\)) on wages (\(Y\)), how much operates through occupation (\(M\)) versus cognitive skills? The answer determines whether a policy should target educational attainment or occupational access.

Running example. Throughout this chapter we anchor the abstract formulas to a single concrete scenario: \(T\) is a randomized behavioral intervention, \(M\) is self-reported sleep quality measured mid-trial, and \(Y\) is a depression score (e.g., on the PHQ-9 scale).

The challenge is that mediators are post-treatment variables: they are affected by the treatment, and may themselves be confounded with the outcome. Conditioning on a post-treatment variable creates exactly the collider and selection-bias problems studied in Chapters 2 and 3. A naïve approach — simply including \(M\) as a covariate in a regression of \(Y\) on \(T\) — conflates adjustment with mediation and can introduce bias even in a randomized experiment.

This chapter also develops the front-door criterion, a distinct identification strategy that uses the mediation structure of the DAG to identify causal effects even when treatment and outcome are confounded by an unobserved variable, making mediation analysis relevant not only to mechanism research but also to the core identification problem of earlier chapters.

8.2 The Mediation DAG

8.2.1 The Prototype Graph

T treatment M mediator Y outcome X covariates U unobserved
The prototype mediation DAG. The causal effect of $T$ on $Y$ operates through two pathways: the direct path $T \to Y$ and the indirect path $T \to M \to Y$ via the mediator. The unobserved variable $U$ confounds the treatment--outcome relationship; $\mathbf{X}$ denotes observed pre-treatment covariates.

The graph encodes two causal pathways: the direct pathway \(T \to Y\) (treatment affects outcome without passing through the mediator) and the indirect pathway \(T \to M \to Y\) (treatment first shifts the mediator, which in turn shifts the outcome).

8.2.2 Structural Equations

The prototype graph corresponds to the nonparametric SEM: \[T = f_T(\mathbf{X},\, U,\, \varepsilon_T), \qquad M = f_M(T,\, \mathbf{X},\, \varepsilon_M), \qquad Y = f_Y(T,\, M,\, \mathbf{X},\, U,\, \varepsilon_Y), \tag{8.1}\] where each arrow corresponds to the presence of the parent in the child’s structural equation. \(U\) enters both \(T\)’s and \(Y\)’s equations, making the confounding paths explicit; \(U\) is absent from \(M\)’s equation, reflecting the absence of an arrow \(U \to M\) in the DAG.

8.2.3 What Makes Mediation Harder Than Total Effect Estimation

Collider bias. Conditioning on \(M\) can open collider paths. Suppose \(U \to T\) and \(V \to M\) and \(V \to Y\), with \(V\) unobserved. The path \(T \to M \leftarrow V \to Y\) is blocked when \(M\) is not conditioned on, but opens as soon as \(M\) is included as a covariate. This is precisely why naively regressing \(Y\) on \((T, M)\) does not isolate the direct effect.

Mediator–outcome confounding. Even when treatment is randomized, the mediator \(M\) is never randomized. An unobserved variable \(V\) with \(V \to M\) and \(V \to Y\) creates a back-door path from \(M\) to \(Y\) that randomization of \(T\) does not close.

WarningPost-Treatment Variables Are Dangerous to Condition On

The mediator \(M\) is caused by the treatment \(T\). Conditioning on a post-treatment variable in a regression or matching procedure can open collider paths, introduce selection bias, and produce estimates of neither the total effect nor any well-defined direct effect. This chapter studies three regimes in which conditioning on or marginalizing over \(M\) is justified: (1) CDE (Section 8.4): both \(T\) and \(M\) are intervened on via \(\doop(T, M)\); (2) NDE/NIE (Section 8.5Section 8.6): \(M\) is marginalized over using a distribution from a different treatment arm, justified by sequential ignorability; (3) Front-door (Section 8.8): \(M\) is summed over in the front-door formula, justified by graphical conditions. Outside these three regimes, conditioning on a post-treatment variable should be treated as an error until proven otherwise.

8.2.4 A Working Graph for the Identification Sections

NoteWorking Assumption for Sections Section 8.3Section 8.7

Throughout Section 8.3Section 8.7, we work with the reduced prototype graph obtained from the prototype by absorbing all \(T\)\(Y\) and \(T\)\(M\) confounding into the observed \(\mathbf{X}\) (i.e., \(U\) is assumed absent or observed). The front-door Section 8.8 restores \(U\) as unobserved, deriving identification of the total effect without access to \(U\).

8.3 Total Causal Effect

NoteDefinition: Total Effect

The total effect (TE) of \(T\) on \(Y\) is: \[\mathrm{TE} = \E[Y(1) - Y(0)] = \E[Y \mid \doop(T{=}1)] - \E[Y \mid \doop(T{=}0)].\]

Under the working assumption, TE is identified by the back-door formula: \[\mathrm{TE} = \sum_{\mathbf{x}} [\E[Y \mid T{=}1, \mathbf{X}{=}\mathbf{x}] - \E[Y \mid T{=}0, \mathbf{X}{=}\mathbf{x}]]\, P(\mathbf{X}{=}\mathbf{x}).\]

Running example. The TE is the expected change in depression score if the entire study population were assigned to the intervention versus control. It combines the effect operating through sleep improvement with every other pathway. Mediation analysis asks: of this total, how much is due to sleep?

8.4 Controlled Direct Effect

NoteDefinition: Controlled Direct Effect (CDE)

The controlled direct effect of changing \(T\) from 0 to 1 while fixing \(M = m\) by intervention is: \[\mathrm{CDE}(m) = \E[Y \mid \doop(T{=}1), \doop(M{=}m)] - \E[Y \mid \doop(T{=}0), \doop(M{=}m)]. \tag{8.2}\]

Equation 8.2 involves two simultaneous interventions, corresponding to the mutilated graph \(\mathcal{G}_{\overline{TM}}\) (all edges into both \(T\) and \(M\) deleted). Fixing \(M = m\) for everyone shuts down the indirect pathway \(T \to M \to Y\): any remaining \(T\)-to-\(Y\) effect flows only through the direct edge.

Running example. \(\mathrm{CDE}(m)\) at $m = $ “poor sleep” is the expected change in depression score comparing intervention to control if every participant’s sleep quality were externally held at the poor-sleep level. Whether sleep can be externally fixed in a clinical trial is a separate question — which is why the policy interpretation of the CDE in this scenario is strained.

NoteRemark: CDE Depends on \(m\)

In general, the direct effect may vary across values of \(m\) — this is effect modification by the mediator. In a linear additive model, \(\mathrm{CDE}(m) = \tau'\) for all \(m\), a special property of linearity. When \(T\) and \(M\) interact, CDEs at different values of \(m\) differ.

8.4.1 Identification of the CDE

TipTheorem: Identification of the CDE

Suppose \(\mathbf{Z}\) satisfies the back-door criterion for the joint intervention \(\doop(T, M)\) on \(Y\): (i) \(\mathbf{Z}\) contains no descendant of \(T\) or \(M\), (ii) \(\mathbf{Z}\) blocks every back-door path from \(\{T, M\}\) to \(Y\), and (iii) joint positivity holds: \(P(T{=}t, M{=}m \mid \mathbf{Z}{=}\mathbf{z}) > 0\) for \(P\)-almost every \(\mathbf{z}\). Then: \[\E[Y \mid \doop(T{=}t), \doop(M{=}m)] = \sum_{\mathbf{z}} \E[Y \mid t, m, \mathbf{z}]\, P(\mathbf{z}). \tag{8.3}\]

Under the working assumption, \(\mathbf{Z} = \mathbf{X}\) is a valid adjustment set.

Graph surgery and back-door adjustment are distinct steps. Graph surgery defines the interventional target by deleting arrows into \(T\) and \(M\). Expressing that target as a functional of the observed distribution is a separate step requiring a valid adjustment set \(\mathbf{Z}\) in the original graph. If \(U\) is unobserved and \(\mathbf{X}\) alone cannot block the back-door path \(T \leftarrow U \to Y\), then Equation 8.3 with \(\mathbf{Z} = \mathbf{X}\) does not identify the CDE. Alternative strategies (front-door, IV) are required.

8.4.2 Physical Manipulability and the CDE Does Not Decompose

The CDE is only scientifically meaningful when an intervention to fix \(M\) at a specified level \(m\) is physically realizable. In many substantive settings, the mediator cannot be independently manipulated (education-occupation, behavioral intervention-sleep), making the CDE’s policy interpretation strained.

The CDE and some “controlled indirect effect” do not sum to the total effect in general. The residual \(\mathrm{TE} - \mathrm{CDE}(m)\) depends on \(m\) and has no clean do-calculus expression corresponding to “the indirect pathway.” The correct estimands for a pathway decomposition are the NDE and NIE, introduced next.

WarningNatural Effects Are a Strictly Harder Estimand Than the CDE

The CDE is a single-world estimand: \(\E[Y \mid \doop(T{=}t), \doop(M{=}m)]\) involves one joint intervention, and identification reduces to the back-door criterion for the pair \((T, M)\).

The NDE and NIE are cross-world estimands: \(Y(t, M(t'))\) requires an individual to simultaneously inhabit two worlds — the world where \(T = t\) (which determines \(Y\)) and the world where \(T = t'\) (which determines \(M\)). When \(t \neq t'\), no single experimental intervention can realize both at once. The do-operator alone cannot express this quantity.

The identification price is steep. The CDE requires only the back-door criterion. The NDE and NIE require four sequential ignorability assumptions, including Assumption 4 (no treatment-induced mediator–outcome confounder), which cannot be satisfied by randomizing \(T\) and cannot be verified from data.

8.5 Natural Direct and Indirect Effects

8.5.1 Cross-World Counterfactuals

The CDE fixes the mediator by external intervention. A more scientifically natural question is: what is the effect of \(T\) on \(Y\) that bypasses \(M\) when \(M\) is held at the value it would naturally take under the reference treatment \(T = 0\)? This requires the nested potential outcomes notation \(Y(t, M(t'))\), which denotes the outcome observed if \(T\) were set to \(t\) and \(M\) were simultaneously set to the value it would naturally take if \(T\) were \(t'\). These are called cross-world counterfactuals because \(t\) and \(t'\) may differ.

NoteDefinition: Natural Direct and Indirect Effects (Pearl 2001)

For binary \(T \in \{0,1\}\): \[\mathrm{NDE} = \E[Y(1, M(0)) - Y(0, M(0))], \tag{8.4}\] \[\mathrm{NIE} = \E[Y(1, M(1)) - Y(1, M(0))]. \tag{8.5}\]

The natural direct effect (NDE) is the expected change in the outcome when \(T\) shifts from 0 to 1, holding the mediator at the value it would naturally take under \(T = 0\). The natural indirect effect (NIE) is the expected change in the outcome due to the shift in the mediator from \(M(0)\) to \(M(1)\), holding \(T = 1\) fixed.

Running example. The NDE is the part of the depression reduction that comes from cognitive, behavioral, or therapeutic-alliance pathways, not from sleep. The NIE is the complementary piece: how much of the depression reduction is due to the sleep improvement that the intervention itself causes.

The TE = NDE + NIE decomposition: \[\mathrm{TE} = \mathrm{NDE} + \mathrm{NIE}. \tag{8.6}\]

Proof. \(\mathrm{NDE} + \mathrm{NIE} = \E[Y(1, M(0)) - Y(0, M(0))] + \E[Y(1, M(1)) - Y(1, M(0))] = \E[Y(1, M(1)) - Y(0, M(0))] = \E[Y(1) - Y(0)] = \mathrm{TE}\). \(\square\)

NoteRemark: Alternative Decomposition

The “starred” decomposition uses \(\mathrm{NDE}^* = \E[Y(1, M(1)) - Y(0, M(1))]\) and \(\mathrm{NIE}^* = \E[Y(0, M(1)) - Y(0, M(0))]\). Both sum to TE, and the two agree when there is no \(T \times M\) interaction. When interaction is present, the gap \(\mathrm{NDE} - \mathrm{NDE}^* = \mathrm{NIE}^* - \mathrm{NIE}\) quantifies the discrepancy. This textbook uses the Equation 8.4/Equation 8.5 pair (Pearl-style decomposition) throughout.

NoteRemark: CDE and NDE Coincide under Linearity

In the linear additive Baron–Kenny SEM with no \(T \times M\) interaction, the CDE and NDE coincide: \(\mathrm{NDE} = \tau'\) and \(\mathrm{NIE} = ab\). When there is a \(T \times M\) interaction, \(\mathrm{NDE} \neq \mathrm{CDE}\) and the two concepts address different scientific questions.

8.6 Identification of Natural Effects

8.6.1 Sequential Ignorability

NoteSequential Ignorability (Imai et al. 2010)
  1. No unmeasured treatment–outcome confounding (joint form). \(\{Y(t', m), M(t)\} \indep T \mid \mathbf{X}\) for all \(t, t', m\). Graphically: \(\mathbf{X}\) blocks all back-door paths from \(T\) to \(Y\) and from \(T\) to \(M\).
  2. No unmeasured treatment–mediator confounding. \(M(t) \indep T \mid \mathbf{X}\) for all \(t\). (Follows from the joint Assumption 1; listed separately to highlight the \(T \to M\) sub-problem.)
  3. No unmeasured mediator–outcome confounding given \(T\). \(Y(t', m) \indep M \mid T, \mathbf{X}\) for all \(t', m\). Graphically: \((T, \mathbf{X})\) blocks all back-door paths from \(M\) to \(Y\).
  4. No treatment-induced mediator–outcome confounder. There is no post-treatment variable \(L\) such that \(T \to L\), \(L \to M\), and \(L \to Y\).
NotePositivity Conditions

(P1) Treatment overlap. \(P(T{=}t \mid \mathbf{X}{=}\mathbf{x}) > 0\) for all \(t \in \{0,1\}\) and for \(P\)-almost every \(\mathbf{x}\). Randomization of \(T\) secures (P1) by design.

(P2) Mediator overlap across treatment arms. \(P(M{=}m \mid T{=}t, \mathbf{X}{=}\mathbf{x}) > 0\) whenever \(P(M{=}m \mid T{=}t', \mathbf{X}{=}\mathbf{x}) > 0\), for pairs \((t, t')\) in the mediation formula and \(P\)-almost every \(\mathbf{x}\). This is a data-dependent requirement that no aspect of experimental assignment guarantees.

Assumptions 1–3 are the natural extensions of the Baron–Kenny conditions to the potential outcomes setting. Assumption 4 is the critical new requirement: it rules out a variable \(L\) that is caused by the treatment and confounds the mediator–outcome relationship.

What randomization of \(T\) does and does not provide. Randomizing \(T\) satisfies Assumptions 1 and 2 by design. It does not satisfy Assumptions 3 or 4. The mediator \(M\) is a post-treatment variable that is never randomized; any unobserved variable \(V\) with \(V \to M\) and \(V \to Y\) violates Assumption 3 regardless of how \(T\) was assigned. Assumption 4 is even more demanding: it can be violated by a variable \(L\) that is itself caused by the treatment.

WarningRandomization of \(T\) Does Not Secure Assumption 4

A treatment-induced mediator–outcome confounder \(L\) is itself caused by the treatment: the path \(T \to L\) is set in motion by the intervention, so no aspect of how \(T\) is assigned closes the door on Assumption 4. The absence of such an \(L\) must be argued from design, timing, measurement, or subject-matter knowledge. Identification of natural effects is strictly harder than identification of the total effect or the CDE.

8.6.2 The Mediation Formula

TipTheorem: Mediation Formula (Pearl 2001)

Under sequential ignorability, positivity (P1)–(P2), and with discrete \(M\) and \(\mathbf{X}\): \[\E[Y(t, M(t'))] = \sum_{m}\sum_{\mathbf{x}} \E[Y \mid T{=}t, M{=}m, \mathbf{X}{=}\mathbf{x}]\, P(M{=}m \mid T{=}t', \mathbf{X}{=}\mathbf{x})\, P(\mathbf{X}{=}\mathbf{x}). \tag{8.7}\]

The derivation proceeds in five labeled steps.

Step 1. By the law of total expectation and the composition axiom \(Y(t, M(t')) = Y(t, m)\) on the event \(\{M(t') = m\}\): \[\E[Y(t, M(t'))] = \sum_{m,\mathbf{x}} \E[Y(t, m) \mid M(t'){=}m, \mathbf{X}{=}\mathbf{x}]\, P(M(t'){=}m \mid \mathbf{X}{=}\mathbf{x})\, P(\mathbf{X}{=}\mathbf{x}).\]

Step 2 (cross-world independence under NPSEM-IE). Assumptions 1–4 jointly imply \(Y(t, m) \indep M(t') \mid \mathbf{X}\). The role of Assumption 4 is to ensure \(M(t')\) shares no source of variation with \(Y(t,m)\) beyond \(\mathbf{X}\): if such an \(L\) existed, \(M(t')\) would inherit \(L\)-dependence and \(Y(t,m)\) would as well. With cross-world independence: \(\E[Y(t, m) \mid M(t'){=}m, \mathbf{X}{=}\mathbf{x}] = \E[Y(t, m) \mid \mathbf{X}{=}\mathbf{x}]\).

Step 3 (Assumptions 1 and 3). By \(Y(t, m) \indep T \mid \mathbf{X}\) and \(Y(t, m) \indep M \mid T, \mathbf{X}\): \(\E[Y(t, m) \mid \mathbf{X}{=}\mathbf{x}] = \E[Y(t, m) \mid T{=}t, M{=}m, \mathbf{X}{=}\mathbf{x}]\).

Step 4 (consistency for \(Y\)). On the event \(\{T = t, M = m\}\), \(Y(t, m) = Y\): \(\E[Y(t, m) \mid T{=}t, M{=}m, \mathbf{X}{=}\mathbf{x}] = \E[Y \mid T{=}t, M{=}m, \mathbf{X}{=}\mathbf{x}]\).

Step 5 (Assumption 2 and consistency for \(M\)). \(P(M(t'){=}m \mid \mathbf{X}{=}\mathbf{x}) = P(M{=}m \mid T{=}t', \mathbf{X}{=}\mathbf{x})\).

Substituting Steps 2–5 into Step 1 yields Equation 8.7. \(\square\)

Interpretation. The mediation formula “mixes” the outcome regression under \(T = t\) with the mediator distribution under \(T = t'\). To compute the NDE, set \(t = 1\) and \(t' = 0\): take the conditional mean of \(Y\) at treatment 1, but weight the mediator by its distribution under treatment 0. This counterfactual reweighting is what makes the formula non-trivial. The presence of the second treatment index \(t'\) on the right-hand side is the visible trace of the cross-world step.

The NDE and NIE from the formula: \[\mathrm{NDE} = \sum_{m,\mathbf{x}} [\E[Y \mid 1, m, \mathbf{x}] - \E[Y \mid 0, m, \mathbf{x}]]\, P(M{=}m \mid T{=}0, \mathbf{x})\, P(\mathbf{x}),\] \[\mathrm{NIE} = \sum_{m,\mathbf{x}} \E[Y \mid 1, m, \mathbf{x}]\, [P(M{=}m \mid T{=}1, \mathbf{x}) - P(M{=}m \mid T{=}0, \mathbf{x})]\, P(\mathbf{x}).\]

NoteEstimation Recipe: Plug-In for the Mediation Formula

Given data \(\{(Y_i, T_i, M_i, \mathbf{X}_i)\}_{i=1}^n\), a plug-in estimator requires two working models:

  1. Fit an outcome model. Regress \(Y\) on \((T, M, \mathbf{X})\) to obtain \(\hat\mu(t, m, \mathbf{x})\).
  2. Fit a mediator model. Regress \(M\) on \((T, \mathbf{X})\) to obtain \(\hat p(m \mid t, \mathbf{x})\).
  3. Predict outcomes at the evaluation arm. For each unit \(i\), compute \(\hat\mu(t, m, \mathbf{X}_i)\) at the evaluation treatment level \(t\).
  4. Reweight or simulate the mediator at the reference arm. For discrete \(M\), weight \(\hat\mu(t, m, \mathbf{X}_i)\) by \(\hat p(m \mid t', \mathbf{X}_i)\).
  5. Average over the empirical distribution of \(\mathbf{X}\). The plug-in estimate is \(n^{-1}\sum_i \sum_m \hat\mu(t, m, \mathbf{X}_i)\, \hat p(m \mid t', \mathbf{X}_i)\).
  6. Quantify uncertainty. Nonparametric bootstrap gives valid confidence intervals under correct model specification. Influence-function-based estimators (Chapters 10–11) deliver asymptotic normality under doubly-robust conditions.

The plug-in estimator is consistent only when both \(\hat\mu\) and \(\hat p\) are correctly specified. The semiparametric estimators developed later in the book are designed to partly shield inference from this sensitivity.

NoteRemark: Three Things to Keep in View
  1. Randomization handles two of the four assumptions, not all four. Randomizing \(T\) secures Assumptions 1 and 2 by design. It leaves Assumptions 3 and 4 entirely open.
  2. The formula looks like routine adjustment but is not. The expression \(\sum_m \E[Y \mid t, m, \mathbf{x}]\, P(M{=}m \mid t', \mathbf{x})\) resembles a standardization formula, but it computes a cross-world expectation \(\E[Y(t, M(t'))]\), not a do-expression.
  3. Sequential ignorability is an untestable assumption bundle. Unlike treatment ignorability, the mediator–outcome ignorability in Assumption 3 and the no-treatment-induced-confounder condition in Assumption 4 must be defended on subject-matter grounds in every application.

8.7 The Linear Mediation Model: A Historical Special Case

8.7.1 The Baron–Kenny Three-Equation System

The regression-based approach of Baron and Kenny (1986) restricts the reduced prototype graph to a linear SEM: \[Y = \alpha_1 + \tau T + \boldsymbol{\gamma}_1^\top\mathbf{X} + \varepsilon_1, \tag{8.8}\] \[M = \alpha_2 + a T + \boldsymbol{\gamma}_2^\top\mathbf{X} + \varepsilon_2, \tag{8.9}\] \[Y = \alpha_3 + \tau' T + b M + \boldsymbol{\gamma}_3^\top\mathbf{X} + \varepsilon_3. \tag{8.10}\]

The four coefficients: \(\tau\) (total effect), \(a\) (first-stage effect of \(T\) on \(M\)), \(\tau'\) (direct effect of \(T\) on \(Y\) controlling for \(M\)), \(b\) (second-stage effect of \(M\) on \(Y\) controlling for \(T\)).

8.7.2 The Component Pathways

First stage (\(T \to M\)): Equation Equation 8.9 implements the back-door formula for \(T\) on \(M\). Under Condition 2, conditioning on \(\mathbf{X}\) blocks all back-door paths from \(T\) to \(M\), and \(a\) identifies \(\E[M(1)] - \E[M(0)]\).

Second stage (\(M \to Y\) given \(T\)): Equation Equation 8.10 implements the back-door formula for \(M\) on \(Y\) given \(T\). Conditioning on \((T, \mathbf{X})\) blocks every back-door path from \(M\) to \(Y\) in the reduced prototype graph, provided Assumption 3 (no unmeasured \(M\)\(Y\) confounding given \(T\)) holds.

WarningThe Second Stage Is Harder Than It Looks

Even in a randomized experiment, the mediator \(M\) is never randomized. An unobserved variable \(V\) with \(V \to M\) and \(V \to Y\) creates a back-door path from \(M\) to \(Y\) that conditioning on \((T, \mathbf{X})\) cannot block. Identifying \(b\) requires no-unmeasured-confounding for the mediator–outcome relationship, an assumption that randomization of \(T\) does not provide.

8.7.3 The Product and Difference Formulas

TipProposition: Mediation Decomposition in the Linear Model (Baron and Kenny 1986)

Under equations Equation 8.8Equation 8.10: \(\tau = \tau' + ab\).

The indirect and direct effects are: \[\tau_{\mathrm{ind}} = ab \quad\text{(product method)}, \qquad \tau_{\mathrm{dir}} = \tau - ab = \tau' \quad\text{(difference method)}. \tag{8.11}\]

Proof. Substitute Equation 8.9 into Equation 8.10: \(Y = (\alpha_3 + b\alpha_2) + (\tau' + ab)T + (\boldsymbol{\gamma}_3 + b\boldsymbol{\gamma}_2)^\top\mathbf{X} + (b\varepsilon_2 + \varepsilon_3)\). Comparing with Equation 8.8 gives \(\tau = \tau' + ab\). \(\square\)

WarningThe Decomposition \(\tau = \tau' + ab\) Is an Algebraic Identity, Not a Causal Theorem

The equality holds because linearity makes the indirect effect separable and additive. It does not hold in general:

  • In nonlinear models (binary outcomes, count outcomes, survival models), the product and difference methods yield numerically different estimates. Neither equals the NIE in general.
  • When \(T\) and \(M\) interact in their effect on \(Y\), the CDE depends on \(m\), the NDE and CDE diverge, and the indirect effect cannot be summarized by a single number \(ab\).
  • For non-continuous mediators, the product \(ab\) has no simple causal interpretation outside the linear normal model.

The correct generalization is the mediation formula (Equation 8.7), which reduces to \(ab\) and \(\tau'\) only in the linear, no-interaction special case.

Running example. Suppose a randomized trial yields \(\hat\tau = 0.50\), \(\hat a = 0.40\), \(\hat b = 0.60\), \(\hat\tau' = 0.26\).

  • Indirect effect (product method): \(\hat a \hat b = 0.40 \times 0.60 = 0.24\).
  • Direct effect (difference method): \(\hat\tau - \hat a \hat b = 0.50 - 0.24 = 0.26 = \hat\tau'\).
  • Proportion mediated: \(\hat a \hat b / \hat\tau = 0.24/0.50 = 0.48\) — roughly 48% of the total effect operates through sleep improvement.
Effect Formula Path(s)
Total \(\tau\) \(T \to Y\) and \(T \to M \to Y\) combined
Direct \(\tau' = \tau - ab\) \(T \to Y\) only
Indirect \(ab\) \(T \to M \to Y\) only
Proportion mediated \(ab/\tau\) Share of total effect via \(M\)

8.7.4 Inference: The Sobel Test and Bootstrap

The delta method gives an approximate variance for the product \(\hat a \hat b\): \[\widehat{\mathrm{Var}}(\hat a \hat b) \approx \hat b^2\, \widehat{\mathrm{Var}}(\hat a) + \hat a^2\, \widehat{\mathrm{Var}}(\hat b) + 2\, \hat a\, \hat b\, \widehat{\mathrm{Cov}}(\hat a, \hat b). \tag{8.12}\]

The Sobel test (Sobel 1982) drops the cross-covariance: \[\widehat{\mathrm{Var}}_{\mathrm{Sobel}}(\hat a \hat b) = \hat b^2\, \widehat{\mathrm{Var}}(\hat a) + \hat a^2\, \widehat{\mathrm{Var}}(\hat b), \tag{8.13}\] yielding \(z = \hat a \hat b / \sqrt{\widehat{\mathrm{Var}}_{\mathrm{Sobel}}(\hat a \hat b)}\). In practice, bootstrap confidence intervals for \(ab\) are preferred over the Sobel test because the distribution of a product of estimates is skewed in finite samples.

WarningThe “Significance of Both Paths” Criterion Is Not a Test for Mediation

A common misuse declares mediation when (i) \(T \to Y\) is significant, (ii) \(T \to M\) is significant, (iii) \(M \to Y\) is significant. This approach has three defects: (1) statistical significance \(\neq\) mediation; (2) the Sobel test tests a regression product, not a causal quantity; (3) zero total effect does not preclude indirect effects (direct and indirect effects of opposite sign can cancel). The modern alternative is to estimate \(ab\) or the NIE directly, construct bootstrap confidence intervals, and interpret as a point estimate with uncertainty.

8.7.5 The Baron–Kenny Assumptions: Two Distinct Categories

Causal ignorability conditions (conditions 1–3): these are causal identification assumptions about unmeasured confounding. Violating them introduces bias that no amount of additional data can remove.

  1. No unmeasured \(T\)\(Y\) confounding: \(\varepsilon_1 \indep T \mid \mathbf{X}\).
  2. No unmeasured \(T\)\(M\) confounding: \(\varepsilon_2 \indep T \mid \mathbf{X}\).
  3. No unmeasured \(M\)\(Y\) confounding given \(T\): \(\varepsilon_3 \indep M \mid T, \mathbf{X}\).

Structural modeling restrictions (conditions 4–5): these are parametric assumptions about functional form. Violating them does not introduce identification bias in the causal sense, but \(ab\) and \(\tau'\) no longer equal the NDE and NIE.

  1. Linearity and additivity: equations Equation 8.8Equation 8.10 are correctly specified as linear and additive.
  2. No \(T \times M\) interaction: the coefficient \(b\) is the same for all values of \(T\).

Conditions 1–3 cannot be tested from observed data at all. Conditions 4–5 can be partially probed by residual diagnostics and interaction terms.

NoteFramework Map
Estimand Framework needed Section
Total effect \(P(y \mid \doop(t))\) Do-calculus Ch. 5 (back-door)
Controlled direct effect (CDE) Do-calculus Section 8.4
Natural direct effect (NDE) Potential outcomes Section 8.5
Natural indirect effect (NIE) Potential outcomes Section 8.5
Total effect via front-door Do-calculus Section 8.8

The first and last rows name the same estimand — the total effect — but identified by different routes.

8.8 Front-Door Identification

8.8.1 The Front-Door DAG

Every identification strategy in Section 8.4Section 8.6 assumed an observed covariate set \(\mathbf{X}\) that blocks the back-door paths from \(T\) to \(Y\) through \(U\). What if \(U\) is wholly unobserved and no such adjustment set exists? The front-door criterion turns this obstacle into an opportunity: under two additional restrictions on the prototype mediation graph, the mediation structure itself provides identification of the total effect without conditioning on \(U\).

The two restrictions are: (i) remove the direct \(T \to Y\) edge, so \(M\) fully mediates; and (ii) require \(U\) has no arrow into \(M\), so the \(T \to M\) sub-effect is unconfounded.

Why the running example does not apply here. The depression/sleep scenario fails Condition 1 (full mediation): a behavioral intervention plausibly operates through several non-sleep channels. The canonical front-door example is Pearl’s smoking–tar–cancer graph: \(T\) = smoking, \(M\) = tar deposits, \(Y\) = lung cancer, \(U\) = genetic susceptibility. If all of smoking’s carcinogenic effect flows through tar, and genetic susceptibility does not act on tar directly, the front-door formula identifies the causal effect without observing \(U\).

T treatment M mediator Y outcome U unobserved
The front-door graph. Compared with the prototype mediation DAG, two edges are absent: the direct $T \to Y$ edge and $U \to M$. These two omissions enable identification: $M$ fully mediates $T \to Y$, and the $T \to M$ sub-effect is unconfounded.
NoteTwo Different Questions About a Mediator-Like Variable
Ordinary mediation Front-door identification
Question How much of the total effect of \(T\) on \(Y\) flows through \(M\)? Can \(M\) be used to identify the total effect despite unmeasured \(T\)\(Y\) confounding?
Role of \(M\) Pathway: carries part of the causal effect Relay: routes around unmeasured \(T\)\(Y\) confounding
Target estimand NDE, NIE (decomposition) \(P(y \mid \doop(t))\) (total effect)
Direct \(T \to Y\) edge Present Absent (required)

The front-door formula does not decompose the total effect — it identifies the total effect as a whole, using \(M\) as a relay that is unconfounded on the \(T\)-side.

Warning“Instrument-Like” Does Not Mean Instrument

The front-door mediator \(M\) lies on the causal path from \(T\) to \(Y\) and enters the \(Y\) structural equation directly. The IV instrument \(Z\) satisfies the exclusion restriction — it affects \(Y\) only through \(T\). These are structurally opposite positions. The two strategies share the goal of identifying \(T\)’s total effect under unobserved confounding, but they place the third variable in fundamentally different roles.

8.8.2 The Three Front-Door Conditions

NoteFront-Door Conditions

A variable \(M\) satisfies the front-door criterion for the effect of \(T\) on \(Y\) if:

  1. Full mediation. All directed paths from \(T\) to \(Y\) pass through \(M\) (no direct \(T \to Y\) edge).
  2. No unblocked back-door path from \(T\) to \(M\). All back-door paths from \(T\) to \(M\) are blocked (no unobserved variables affect both \(T\) and \(M\)).
  3. No unblocked back-door path from \(M\) to \(Y\) given \(T\). All back-door paths from \(M\) to \(Y\) are blocked by conditioning on \(T\).

In the front-door DAG: Condition 1 holds (no \(T \to Y\) edge). Condition 2 holds (\(U\) has no arrow into \(M\)). Condition 3 holds: the only back-door path from \(M\) to \(Y\) is \(M \leftarrow T \leftarrow U \to Y\), which is blocked by conditioning on \(T\).

8.8.3 Derivation of the Front-Door Formula

TipTheorem: Front-Door Formula (Pearl 1995)

Suppose \(M\) satisfies the front-door criterion and positivity conditions (F1) \(P(T{=}t') > 0\) for every \(t'\) and (F2) for every \(m\) with \(P(M{=}m \mid T{=}t) > 0\) and \(t'\) with \(P(T{=}t') > 0\), \(P(M{=}m \mid T{=}t') > 0\). Then: \[P(y \mid \doop(T{=}t)) = \sum_m P(m \mid t) \sum_{t'} P(y \mid m, t')\, P(t'). \tag{8.14}\]

Step 1: Identify the effect of \(T\) on \(M\). By Condition 2, there are no unblocked back-door paths from \(T\) to \(M\). The empty set is a valid back-door adjustment set, so: \[P(m \mid \doop(T{=}t)) = P(m \mid t).\]

Step 2: Identify the effect of \(M\) on \(Y\). By Condition 3, conditioning on \(T\) blocks all back-door paths from \(M\) to \(Y\). The set \(\{T\}\) is a valid back-door adjustment set: \[P(y \mid \doop(M{=}m)) = \sum_{t'} P(y \mid m, t')\, P(t').\]

Step 3: Combine via full mediation. The law of total probability under \(\doop(T{=}t)\) gives: \[P(y \mid \doop(T{=}t)) = \sum_m P(m \mid \doop(T{=}t))\, P(y \mid \doop(T{=}t), \doop(M{=}m)).\] By Condition 1 (no direct \(T \to Y\) edge), once \(M = m\) is fixed by intervention, \(T\) is d-separated from \(Y\) in \(\mathcal{G}_{\overline{T}\,\overline{M}}\). By Rule 3 of the do-calculus, \(P(y \mid \doop(T{=}t), \doop(M{=}m)) = P(y \mid \doop(M{=}m))\).

Substituting Steps 1 and 2 gives Equation 8.14. \(\square\)

NoteRemark: Two Unconfounded Sub-Effects

The front-door formula achieves identification in two steps: (1) \(T\) to \(M\): there is no confounding on the \(T \to M\) edge (\(U\) does not affect \(M\)), so \(P(m \mid t)\) is the causal effect. (2) \(M\) to \(Y\): there is back-door confounding on \(M \to Y\) through \(M \leftarrow T \leftarrow U \to Y\), but \(T\) is a non-collider, so conditioning on \(T\) closes it. The result \(P(y \mid m, t')\) is then averaged over the marginal distribution of \(T\).

The key insight is that \(U\) confounds \(T\) and \(Y\) but not the \(T \to M\) edge. The front-door formula exploits this asymmetry without ever observing or conditioning on \(U\).

8.9 Mediation vs. Instrumental Variables

Feature Instrumental Variables Mediation Analysis
Position of third variable Pre-treatment (\(Z\) precedes \(T\)) Post-treatment (\(M\) follows \(T\))
Causal role Exogenous source of variation in \(T\) Pathway through which \(T\) affects \(Y\)
Primary goal Identification of \(T \to Y\) effect Mechanism analysis (decomposition)
Key assumption Exclusion: \(Z\) affects \(Y\) only through \(T\) Sequential ignorability: no unmeasured \(M\)\(Y\) confounding
Estimand LATE (Wald) or ATE (homogeneity) NDE, NIE, or CDE
Unobserved \(T\)\(Y\) confounders Permitted Must be addressed separately
Testability Relevance testable; exclusion untestable Sequential ignorability untestable
NoteThe Key Conceptual Distinction
IV Mediation
What the third variable does Generates clean variation in \(T\) (exogenous source) Carries part of \(T\)’s causal effect (pathway)
Exclusion vs. inclusion \(Z\) excluded from \(Y\)’s structural equation \(M\) included in \(Y\)’s structural equation
Question answered Does \(T\) cause \(Y\)? How does \(T\) cause \(Y\)?

Can the same variable be both? Not for the same treatment–outcome relation. Within a single causal question of how \(T\) affects \(Y\), the mediator role places \(M\) on the causal path (inclusion required), whereas the IV role demands the exclusion restriction. These are mutually incompatible structural assumptions. The front-door identification formula is the closest bridge between the two within a single \((T, Y)\) analysis: it uses the mediator \(M\) to identify the total effect of \(T\) on \(Y\) even when \(T\) is confounded — but the front-door \(M\) is not an instrument.

8.10 Chapter Summary

Symbol Meaning
TE Total effect \(\E[Y(1) - Y(0)]\)
\(\mathrm{CDE}(m)\) Controlled direct effect at mediator level \(m\) Equation 8.2
NDE Natural direct effect Equation 8.4
NIE Natural indirect effect Equation 8.5
\(Y(t, M(t'))\) Cross-world counterfactual (nested potential outcome)
\(\tau = \tau' + ab\) Baron–Kenny decomposition (linear model only)
Equation 8.7 Mediation formula (nonparametric)
Equation 8.14 Front-door formula
  1. Mediation studies mechanisms. Mediation analysis defines direct and indirect effect concepts that target the pathways \(T \to Y\) and \(T \to M \to Y\). Only natural effects yield an additive TE decomposition; the CDE does not.
  2. The total effect is the baseline estimand. TE \(= \E[Y(1) - Y(0)]\) captures all pathways. It may be identified by randomization, back-door adjustment, front-door identification, or IV.
  3. The CDE uses do-calculus. The CDE fixes \(M = m\) by joint intervention \(\doop(T, M)\) and is identified by the back-door formula applied to the pair \((T, M)\). The CDE depends on the fixed level \(m\) and has no natural “indirect” complement.
  4. Natural effects require potential outcomes. The NDE and NIE involve cross-world counterfactuals \(Y(t, M(t'))\) that cannot be expressed with the do-operator alone. They are identified by the mediation formula under sequential ignorability. The critical Assumption 4 (no treatment-induced mediator–outcome confounder) is not secured by randomization of \(T\) and must be defended on subject-matter grounds.
  5. The linear model simplifies but restricts. The Baron–Kenny system gives \(\tau_{\mathrm{ind}} = ab\) and \(\tau_{\mathrm{dir}} = \tau'\), with \(\tau = \tau' + ab\). This decomposition is purely algebraic and holds only under linearity and no interaction.
  6. Front-door identification uses mediation structure. When \(M\) fully mediates \(T \to Y\), no \(T \to M\) confounding exists, and \(T\) blocks the back-door paths from \(M\) to \(Y\), the front-door formula identifies the total effect despite unobserved \(T\)\(Y\) confounding, by composing two unconfounded sub-effects.
  7. Mediation and IV are complementary, not equivalent. IV uses a pre-treatment variable to generate exogenous variation in \(T\); mediation uses a post-treatment variable to study how the causal effect operates. The same variable cannot simultaneously serve as a mediator and a valid IV for the same treatment–outcome relation.

8.11 Problems

1. Identifying the CDE. Consider the DAG: \(T \to M\), \(T \to Y\), \(M \to Y\), \(X \to T\), \(X \to M\), \(X \to Y\), with all variables observed.

  1. Write the identification formula for \(\E[Y \mid \doop(T{=}1), \doop(M{=}m)]\) using the back-door criterion for the joint intervention \((T, M)\).
  2. Add an unobserved \(U\) with \(U \to T\) and \(U \to Y\). Does the back-door formula still identify the CDE? Explain which condition fails.
  3. Instead add \(U\) with \(U \to M\) and \(U \to Y\). Does the back-door formula still identify the CDE? Explain.

2. CDE vs. total effect. In the reduced prototype graph, let \(\mathbf{Z}\) satisfy the back-door criterion for both the total effect and the joint intervention \((T, M)\).

  1. Write expressions for the total effect and \(\mathrm{CDE}(m)\) using the back-door formula.
  2. Under what graphical condition does \(\mathrm{CDE}(m)\) equal the total effect for all \(m\)? Interpret this condition.

3. Natural direct and indirect effects. Verify the NDE + NIE = TE decomposition algebraically for the linear SEM \(M = \alpha T + \eta\), \(Y = \beta T + \gamma M + \varepsilon\) (no interaction).

  1. Compute \(Y(t, M(t'))\) in the linear model. Show that \(Y(t, M(t')) = \beta t + \gamma(\alpha t') + \text{noise}\).
  2. Derive \(\mathrm{NDE} = \beta\) and \(\mathrm{NIE} = \alpha\gamma\) from definitions Equation 8.4Equation 8.5.
  3. Confirm \(\mathrm{NDE} + \mathrm{NIE} = \beta + \alpha\gamma = \mathrm{TE}\).
  4. Now suppose a \(T \times M\) interaction is added: \(Y = \beta T + \gamma M + \delta (T \cdot M) + \varepsilon\). Compute \(\mathrm{CDE}(m)\) and \(\mathrm{NDE}\). Show that \(\mathrm{NDE} \neq \mathrm{CDE}(m)\) when \(\delta \neq 0\).

4. The Baron–Kenny three-equation system. In the reduced prototype graph with the linear SEM Equation 8.8Equation 8.10:

  1. State the three identification assumptions. For each, give the graphical condition in terms of back-door paths.
  2. Derive the equality \(\tau = \tau' + ab\) algebraically.
  3. Suppose \(\hat\tau = 0.50\), \(\hat a = 0.40\), \(\hat b = 0.60\), \(\hat\tau' = 0.26\). Compute the indirect effect by both the product and difference methods. Do they agree? Compute the proportion mediated.
  4. With \(\widehat{\mathrm{SE}}(\hat a) = 0.08\) and \(\widehat{\mathrm{SE}}(\hat b) = 0.10\), compute the Sobel standard error for \(\hat a \hat b\) using Equation 8.13 and construct an approximate 95% confidence interval.

5. The critical role of Assumption 3. Consider the graph where an unobserved \(V\) has \(V \to M\) and \(V \to Y\), with \(T\) randomized.

  1. Identify all back-door paths from \(M\) to \(Y\) in this graph.
  2. Can any combination of observed variables \((T, \mathbf{X})\) block all of these paths? Explain using d-separation.
  3. Suppose an analyst fits Equation 8.10 ignoring \(V\) and obtains \(\hat b = 0.80\). In which direction is \(\hat b\) biased if \(V\) has positive effects on both \(M\) and \(Y\)?
  4. State the additional data structure that would be needed to identify the second-stage effect nonparametrically.

6. Front-door identification. Consider the front-door graph.

  1. Verify that the three front-door conditions hold.
  2. Walk through the three-step proof of the Front-Door Formula (Equation 8.14): identify which do-calculus rule justifies each step.
  3. Add a direct edge \(T \to Y\) to the graph. Which front-door condition is violated? Does formula Equation 8.14 still hold?
  4. Explain why the front-door graph does not require no unmeasured \(M\)\(Y\) confounding given \(T\) as an identifying assumption.

7. Mediation vs. instrumental variables. A researcher studies the effect of a job training program (\(T\)) on wages (\(Y\)). She proposes two intermediate variables: (A) motivation (\(M_A\)), measured after the program starts; (B) a lottery that randomly selects applicants for admission (\(Z\)), measured before the program.

  1. For variable (A): draw the mediation DAG including \(M_A\), \(T\), \(Y\), and unobserved ability \(U\). State the sequential ignorability assumption needed to identify the NIE through \(M_A\). Explain why randomization of \(T\) does not automatically satisfy this assumption.
  2. For variable (B): draw the IV DAG with \(Z\), \(T\), \(Y\), and \(U\). State the three IV assumptions. Explain why the exclusion restriction and the “mediator inclusion” of mediation analysis are mutually incompatible conditions for the same intermediate variable.
  3. The researcher argues that \(M_A\) (motivation) and \(Z\) (lottery) are both “intermediate” variables and that the analyses are interchangeable. Write a one-paragraph critique of this argument, using the distinctions from Section 8.9.
  4. Can the front-door formula be applied if motivation \(M_A\) fully mediates the effect of \(T\) on \(Y\) and \(U\) (unobserved ability) does not directly affect \(M_A\)? State the three conditions and assess whether they hold.
Baron, Reuben M., and David A. Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51 (6): 1173–82.
Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.” Statistical Science 25 (1): 51–71.
Pearl, Judea. 1995. “Causal Diagrams for Empirical Research.” Biometrika 82 (4): 669–88.
Pearl, Judea. 2001. “Direct and Indirect Effects.” Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI 2001), 411–20.
Sobel, Michael E. 1982. “Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models.” Sociological Methodology 13: 290–312.