3 The Do-Calculus and Identification Criteria

Learning Objectives

By the end of this chapter, students should be able to:

Define the two intervention graph operations \(\mathcal{G}_{\overline{X}}\) and \(\mathcal{G}_{\underline{X}}\), explain the causal meaning of \(\mathcal{G}_{\overline{X}}\), and describe the technical role of \(\mathcal{G}_{\underline{X}}\).
Apply the back-door criterion to determine whether a set \(\mathbf{S}\) identifies a causal effect, and write the back-door adjustment formula.
Apply the front-door criterion in graphs where the confounder is unobserved but a suitable mediator exists.
State all three rules of do-calculus, identify the graph condition that licenses each rule, and use them to prove the back-door and front-door formulas.
State the completeness theorem of Shpitser and Pearl (2006) and explain its practical implication: if the do-calculus cannot identify a quantity from the observational distribution relative to the assumed graph, then no purely observational method can do so without additional assumptions or new data.

How to Read This Chapter

This chapter has two pedagogical layers. Section 3.1–Section 8.8 develop the two most important concrete identification criteria — back-door and front-door adjustment — using intervention graphs and direct graphical reasoning. These sections are the core material for a first reading.

Section 3.4–Section 3.6 then place these criteria inside the more general theory of the do-calculus and identifiability. These later sections are conceptually important but more abstract and may be read as a second pass after the concrete criteria are understood.

3.1 From d-Separation to Intervention: Intervention Graphs

In Chapter 2 we learned to read conditional independence from a DAG using d-separation. This chapter takes the next step: translating the graphical language into identification formulas — expressions that write the interventional distribution \(P(y \mid \doop(T{=}t))\) entirely in terms of quantities observable from data.

3.1.1 What Does Intervention Mean?

Conditioning versus intervening. The conditional distribution \(P(Y \mid T{=}t)\) describes the subpopulation of units for whom \(T\) was observed to equal \(t\). The interventional distribution \(P(Y \mid \doop(T{=}t))\), by contrast, describes the population that would result if \(T\) were externally set to \(t\) for everyone. The two coincide only when \(T\) has no unmeasured common causes with \(Y\).

A simple illustration: temperature \(Z\) causes both ice-cream sales \(T\) and crime \(Y\). \(P(Y \mid T{=}t)\) is the crime rate on days when sales happen to equal \(t\) — confounded by \(Z\). \(P(Y \mid \doop(T{=}t))\) is the crime rate we would observe if we fixed sales at level \(t\) by decree. If \(T\) has no direct causal path to \(Y\), then \(P(Y \mid \doop(T{=}t)) = P(Y)\) for all \(t\).

The structural basis of the do-operator. In the SEM framework, every variable is generated by a structural equation. For the treatment node: \(T = f_T(\mathrm{Pa}(T),\, U_T)\). An intervention \(\doop(T{=}t)\) replaces this entire equation with the constant \(T = t\), with two consequences: (1) \(T\) is no longer influenced by its parents; (2) all other structural equations remain unchanged.

Graph surgery as the graphical realization. Because \(\mathrm{Pa}(T)\) no longer affects \(T\) after the intervention, every arrow pointing into \(T\) in the DAG should be removed. The resulting graph — the intervention graph \(\mathcal{G}_{\overline{T}}\) — represents the post-intervention world. D-separation in \(\mathcal{G}_{\overline{T}}\) encodes conditional independence in \(P(\cdot \mid \doop(T{=}t))\), not in the original \(P\). This is the key link that allows graphical reasoning to answer causal questions.

3.1.2 The Two Graph Operations

Definition: Intervention Graphs \(\mathcal{G}_{\overline{X}}\) and \(\mathcal{G}_{\underline{X}}\)

Let \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\) be a DAG and \(X \subseteq \mathcal{V}\).

\(\mathcal{G}_{\overline{X}}\) (intervention graph, or mutilated graph) is obtained by deleting all arrows into \(X\). This represents \(\doop(X{=}x)\), severing \(X\)’s dependence on its former parents.
\(\mathcal{G}_{\underline{X}}\) (auxiliary observation graph) is obtained by deleting all arrows out of \(X\). This is a technical device used in graphical conditions for certain do-calculus steps. It does not represent conditioning on \(X\).

3.2 The Back-Door Criterion

Definition: Back-Door Criterion (Pearl 1993)

A set \(\mathbf{S}\) of observed variables satisfies the back-door criterion for the effect of \(T\) on \(Y\) in \(\mathcal{G}\) if:

No node in \(\mathbf{S}\) is a descendant of \(T\).
\(\mathbf{S}\) blocks every back-door path from \(T\) to \(Y\) — every path that begins with an arrow pointing into \(T\).

Remark

The term “back-door” reflects the geometry: a back-door path enters \(T\) from behind — it begins with an arrow pointing into \(T\) — and represents a confounding route. By contrast, directed causal paths \(T \to \cdots \to Y\) leave \(T\) through the front and carry the genuine effect.

Condition 2 is equivalent to requiring \((Y \indep T \mid \mathbf{S})_{\mathcal{G}_{\underline{T}}}\): \(\mathbf{S}\) d-separates \(T\) and \(Y\) in the graph obtained by deleting all arrows out of \(T\) (which eliminates causal paths, leaving only back-door paths). The graph \(\mathcal{G}_{\overline{T}}\), which deletes arrows into \(T\), is the wrong graph for this check.

Condition 1 adds an important constraint absent from Chapter 2: \(\mathbf{S}\) must contain no descendant of \(T\). Every valid back-door set is a d-separator in \(\mathcal{G}_{\underline{T}}\), but not vice versa.

Example: A Single Confounder

Consider the DAG \(T \leftarrow C \to Y\) with \(T \to Y\), where \(C\) is observed. Does \(\mathbf{S} = \{C\}\) satisfy the back-door criterion?

\(C\) is not a descendant of \(T\). ✓
\(\{C\}\) blocks \(T \leftarrow C \to Y\) (fork at \(C\)). ✓

The back-door formula gives: \(P(y \mid \doop(T{=}t)) = \sum_c P(y \mid t, c)\,P(c)\).

Example: Two Back-Door Paths — Both Must Be Blocked

Consider treatment \(T\), outcome \(Y\), observed \(C_1\) and \(C_2\), unobserved \(U\), with edges \(C_1 \to T\), \(C_1 \to Y\), \(U \to T\), \(U \to C_2\), \(C_2 \to Y\), \(T \to Y\). There are exactly two back-door paths:

\(T \leftarrow C_1 \to Y\) (direct confounding by \(C_1\))
\(T \leftarrow U \to C_2 \to Y\) (indirect confounding via \(U\) routing through proxy \(C_2\))

\(\mathbf{S} = \{C_1\}\) fails: blocks path 1 but leaves path 2 open. \(\mathbf{S} = \{C_2\}\) fails: blocks path 2 but leaves path 1 open. \(\mathbf{S} = \{C_1, C_2\}\) succeeds: \[P(y \mid \doop(T{=}t)) = \sum_{c_1, c_2} P(y \mid t, c_1, c_2)\,P(c_1, c_2).\] The unobserved \(U\) never appears: conditioning on \(C_2\) blocks the path at the chain node, exploiting the position of the observed variables, not direct measurement of the confounders.

Why Condition 1 Is Essential: Conditioning on a Mediator

Consider a drug \(T\) that affects recovery \(Y\) via two routes: direct (\(T \to Y\)) and indirect through blood pressure \(M\) (\(T \to M \to Y\)), with unobserved confounder \(U \to T\), \(U \to Y\).

Naively trying \(\mathbf{S} = \{M\}\) fails for two reasons: (1) \(M\) is a descendant of \(T\), violating Condition 1; (2) conditioning on \(M\) blocks the causal pathway \(T \to M \to Y\) while leaving the back-door path \(T \leftarrow U \to Y\) untouched. The adjusted quantity \(\sum_m P(y \mid t, m) P(m)\) is a causally ambiguous mixture of attenuated causal signal and uncontrolled confounding — neither the total causal effect nor the direct effect.

M-Bias: Conditioning on a Collider Opens a Closed Back-Door Path

Consider \(T \to Y\) with \(U_1 \to T\), \(U_1 \to C\), \(U_2 \to C\), \(U_2 \to Y\) (\(U_1\), \(U_2\) unobserved, \(C\) observed). The only back-door path is \(T \leftarrow U_1 \to C \leftarrow U_2 \to Y\), which has a collider at \(C\). With \(\mathbf{S} = \varnothing\), the path is blocked — the causal effect is already identified: \(P(y \mid \doop(t)) = P(y \mid t)\).

Including \(C\) in the adjustment set activates the collider, opening the previously dormant path. The set \(\{C\}\) fails Condition 2: it does not block the back-door path; it creates one. This is called M-bias (from the M-shaped skeleton). The practical lesson: Condition 1 alone (“include all pretreatment variables”) is not sufficient.

Theorem: Back-Door Adjustment Formula (Pearl 1993)

If \(\mathbf{S}\) satisfies the back-door criterion for the effect of \(T\) on \(Y\), and \(P(T{=}t \mid \mathbf{S}{=}\mathbf{s}) > 0\) for all \(\mathbf{s}\) with \(P(\mathbf{S}{=}\mathbf{s}) > 0\) (positivity), then: \[P\!\left(y \mid \doop(T{=}t)\right) = \int P(y \mid T{=}t,\;\mathbf{S}{=}\mathbf{s})\,dP(\mathbf{s}). \tag{3.1}\]

Equivalently: \(\E\!\left[h(Y) \mid \doop(T{=}t)\right] = \E_{\mathbf{S}}\!\bigl[\E\bigl[h(Y) \mid T{=}t,\;\mathbf{S}\bigr]\bigr]\).

Remark: Positivity Is a Data-Support Condition

The positivity condition requires every value of \(T\) to be observable within each stratum of \(\mathbf{S}\). It is a property of the joint distribution \(P\), not of the graph. Both the back-door criterion (graphical) and positivity (distributional) must hold simultaneously. When positivity fails on a set of measure zero, identification still holds in theory but sparse or empty cells pose practical difficulties.

3.2.1 The Adjustment Formula as Standardization

Equation 5.5 is also known as the standardization formula or g-formula (Robins 1986). The structure is important:

\(P(y \mid t, \mathbf{s})\) is the stratum-specific conditional distribution. Within each stratum, all back-door paths are blocked by \(\mathbf{S}\), so this conditional distribution is causal.
The outer integration \(\int \cdots\, dP(\mathbf{s})\) weights strata by the population marginal distribution of \(\mathbf{S}\), not the treatment-conditional \(dP(\mathbf{s} \mid t)\). In a randomized experiment, \(\mathbf{S} = \varnothing\) and Equation 5.5 reduces to \(P(y \mid \doop(t)) = P(y \mid t)\).

Example: Back-Door Adjustment — A Numerical Walkthrough

Setup. Single confounder graph (\(T \leftarrow C \to Y\), \(T \to Y\)), binary variables. \(C\) = disease severity (\(0\) = mild, \(1\) = severe); \(T\) = treatment; \(Y\) = recovery. \(P(C{=}1) = 0.5\), \(P(T{=}1 \mid C{=}0) = 0.20\), \(P(T{=}1 \mid C{=}1) = 0.80\).

Outcome probabilities. \(P(Y{=}1 \mid T{=}1, C{=}0) = 0.55\), \(P(Y{=}1 \mid T{=}0, C{=}0) = 0.45\), \(P(Y{=}1 \mid T{=}1, C{=}1) = 0.45\), \(P(Y{=}1 \mid T{=}0, C{=}1) = 0.35\).

Naïve (unadjusted) estimates. \(P(Y{=}1 \mid T{=}1) = 0.50\), \(P(Y{=}1 \mid T{=}0) = 0.39\), difference \(= +0.11\).

Back-door adjusted estimates. \[P(Y{=}1 \mid \doop(T{=}1)) = 0.55 \times 0.5 + 0.45 \times 0.5 = 0.50.\] \[P(Y{=}1 \mid \doop(T{=}0)) = 0.45 \times 0.5 + 0.35 \times 0.5 = 0.40.\] Causal risk difference \(= 0.50 - 0.40 = 0.10\). The naïve estimate (\(+0.11\)) is close here because the confounding is mild; in general it can differ substantially.

3.3 The Front-Door Criterion

The back-door criterion requires observed variables to block every confounding path. When the confounder \(U\) is unobserved and no observed variable lies on the back-door path, a different strategy is needed: the front-door criterion intercepts the causal pathway at an observed mediator.

Definition: Front-Door Criterion

A set \(M\) of observed variables satisfies the front-door criterion for the effect of \(T\) on \(Y\) in \(\mathcal{G}\) if:

\(M\) intercepts all directed paths from \(T\) to \(Y\): every directed path from \(T\) to \(Y\) passes through some node in \(M\).
There are no unblocked back-door paths from \(T\) to \(M\): the path set \(\{T \leftarrow \cdots \to M\}\) is empty or blocked.
All back-door paths from \(M\) to \(Y\) are blocked by \(T\): conditioning on \(T\) blocks every path that begins with an arrow into \(M\) and ends at \(Y\).

The front-door strategy avoids unobserved confounding by splitting the total effect into two identifiable pieces: (1) the effect of \(T\) on \(M\) (unconfounded by Condition 2), and (2) the effect of \(M\) on \(Y\) (made identifiable after conditioning on \(T\) by Condition 3).

Example: Smoking, Tar, and Cancer (Pearl 1995)

\(T\) = smoking, \(M\) = tar deposits, \(Y\) = lung cancer, \(U\) = genetic predisposition (unobserved). Graph: \(T \to M \to Y\) with \(U \to T\) and \(U \to Y\), no direct \(T \to Y\) edge.

Conditions 1–3 hold: (1) the only directed path \(T \to M \to Y\) passes through \(M\); (2) no back-door path from \(T\) to \(M\) exists (no \(U \to M\) edge); (3) the only back-door path from \(M\) to \(Y\) is \(M \leftarrow T \leftarrow U \to Y\), which is blocked by conditioning on \(T\).

The front-door formula applies: \[P(y \mid \doop(T{=}t)) = \sum_m P(m \mid t) \sum_{t'} P(y \mid t', m)\,P(t').\]

Example: The Prototypical Front-Door Example

The simplest front-door graph is \(T \to M \to Y\) with \(U \to T\), \(U \to Y\) (no direct \(T \to Y\) edge). Checking the criterion:

The only directed path is \(T \to M \to Y\), which passes through \(M\). ✓
No back-door path from \(T\) to \(M\) exists (no \(U \to M\) edge). ✓
The only back-door path from \(M\) to \(Y\) is \(M \leftarrow T \leftarrow U \to Y\); conditioning on \(T\) blocks it. ✓

Example: When the Front-Door Criterion Fails

Case (a): Condition 2 fails — the \(T \to M\) link is confounded. Adding the edge \(U \to M\): now the path \(T \leftarrow U \to M\) is an unblocked back-door path from \(T\) to \(M\). Stage 1 uses \(P(M{=}m \mid T{=}t)\) as if the \(T \to M\) link were unconfounded, but \(U \to M\) means the association mixes cause and confounding.

Case (b): Condition 3 fails — the \(M \to Y\) link has an extra confounder. Adding \(V \to M\) and \(V \to Y\) (\(V\) unobserved): the path \(M \leftarrow V \to Y\) is not blocked by conditioning on \(T\). Stage 2 can no longer use \(T\) as a valid back-door adjustment for \(M \to Y\).

Modification	Condition violated	Stage broken
Add \(U \to M\)	Cond. 2: \(T \to M\) confounded	Stage 1
Add \(V \to M\), \(V \to Y\)	Cond. 3: \(M \to Y\) confounded by \(V\)	Stage 2

In both cases the criterion detects the failure before any formula is written down.

3.3.1 The Front-Door Formula

Theorem: Front-Door Adjustment Formula (Pearl 1995)

If \(M\) satisfies the front-door criterion for the effect of \(T\) on \(Y\), and \(P(t) > 0\) for all \(t\), then: \[P\!\left(y \mid \doop(T{=}t)\right) = \sum_m \underbrace{P(m \mid T{=}t)}_{\text{Stage 1}} \;\underbrace{\sum_{t'} P(y \mid T{=}t',\,M{=}m)\,P(t')}_{\text{Stage 2}}. \tag{3.2}\]

Equivalently, writing \(\mu(t', m) = \E[Y \mid T{=}t', M{=}m]\): \[\E\!\left[Y \mid \doop(T{=}t)\right] = \E_{M \mid T=t}\!\bigl[\E_T\bigl[\mu(T,\,M)\bigr]\bigr].\]

Reading the formula. Stage 1: \(P(m \mid T{=}t)\) is the distribution of \(M\) given \(T{=}t\). Condition 2 guarantees no confounding between \(T\) and \(M\), so this is directly identifiable. Stage 2: \(\sum_{t'} P(y \mid T{=}t', M{=}m)\,P(t')\) is the back-door-adjusted effect of \(M\) on \(Y\), with \(T\) as the adjustment variable. Condition 3 guarantees that conditioning on \(T\) blocks all back-door paths from \(M\) to \(Y\).

Heuristic derivation. The formula is the composition of two back-door applications. The role of each condition: Condition 1 together with Condition 2 allow decomposing through \(M\); Condition 2 licenses replacing \(\doop(T{=}t)\) with conditioning in the \(M\)-marginal; Condition 3 licenses back-door adjustment for \(M \to Y\).

3.4 The Three Rules of Do-Calculus

Both the back-door and front-door formulas are derivable from a single algebraic engine: the three rules of the do-calculus. Each rule is a conditional independence statement in a specific intervention graph.

Graph subscripts as bookkeeping. Each rule checks d-separation in a graph formed by specific edge deletions:

Subscript	Appears in	Arrows deleted
\(\mathcal{G}_{\overline{X}}\)	Rule 1	all arrows into \(X\)
\(\mathcal{G}_{\overline{X}\,\underline{Z}}\)	Rule 2	all into \(X\); all out of \(Z\)
\(\mathcal{G}_{\overline{X}\,\overline{Z(W)}}\)	Rule 3	all into \(X\); into \(Z\)-nodes not ancestors of \(W\) in \(\mathcal{G}_{\overline{X}}\)

When \(W = \varnothing\), Rule 3 uses \(\mathcal{G}_{\overline{X}\,\overline{Z}}\) (into both \(X\) and \(Z\) deleted). D-separation in the modified graph is checked by the same algorithm as Chapter 2.

Theorem: The Three Rules of Do-Calculus (Pearl 1995)

Let \(\mathcal{G}\) be a DAG over \(\mathcal{V}\), and let \(X, Y, Z, W \subseteq \mathcal{V}\) be disjoint. The rules hold for any distribution \(P\) Markov with respect to \(\mathcal{G}\).

Rule 1 (Insertion/Deletion of Observations). Remove or insert an observation when it becomes irrelevant in the post-intervention graph. \[P(y \mid \doop(x), z, w) = P(y \mid \doop(x), w) \quad \text{if} \quad (Y \indep Z \mid X, W)_{\mathcal{G}_{\overline{X}}}. \tag{3.3}\]

Rule 2 (Action/Observation Exchange). Replace an intervention by an observation, or vice versa, when the modified graph makes them equivalent. \[P(y \mid \doop(x), \doop(z), w) = P(y \mid \doop(x), z, w) \quad \text{if} \quad (Y \indep Z \mid X, W)_{\mathcal{G}_{\overline{X}\,\underline{Z}}}. \tag{3.4}\]

Rule 3 (Insertion/Deletion of Actions). Remove or insert an intervention when it becomes irrelevant in the modified graph. \[P(y \mid \doop(x), \doop(z), w) = P(y \mid \doop(x), w) \quad \text{if} \quad (Y \indep Z \mid X, W)_{\mathcal{G}_{\overline{X}\,\overline{Z(W)}}}. \tag{3.5}\]

Remark: Soundness of the Rules

Each rule is sound: it holds for every distribution Markov with respect to \(\mathcal{G}\). Rule 2’s intuition is instructive: deleting \(Z\)’s outgoing arrows in \(\mathcal{G}_{\overline{X}\,\underline{Z}}\) simulates what intervening on \(Z\) would remove from \(Y\)’s perspective. If \(Y\) is d-separated from \(Z\) after this deletion, then there is no causal channel whose behavior depends on whether \(Z\) was intervened on or merely observed.

3.4.1 Intuition for Each Rule

Rule 1 — Adding/Removing Observations. If \(Y\) and \(Z\) are independent given \(W\) in the post-\(\doop(x)\) world, then \(Z\) carries no additional information about \(Y\) and can be added or dropped from the conditioning set.

Rule 2 — Swapping Action for Observation. The most frequently used rule. Deleting \(Z\)’s outgoing arrows cuts \(Z\)’s causal paths to its descendants. If \(Y\) and \(Z\) are d-separated after this deletion, then intervention and observation on \(Z\) produce the same distribution of \(Y\) — because the only paths from \(Z\) to \(Y\) were \(Z\)’s causal paths, which both operations sever (intervention) or leave intact (observation) equally.

Rule 3 — Deleting an Intervention. After deleting arrows into \(X\) and into the relevant \(Z\)-nodes, \(Z\) has no remaining path to \(Y\). Setting \(Z\) to any value has no effect on \(Y\), so \(\doop(z)\) can be dropped.

Example: Rule 1 — Deleting a Redundant Observation

Graph: \(Z \to T \to Y\), no other edges. Intervene on \(T\). In \(\mathcal{G}_{\overline{T}}\): delete \(Z \to T\); node \(Z\) is disconnected from \(Y\). Hence \((Y \indep Z)_{\mathcal{G}_{\overline{T}}}\). Rule 1 gives \(P(y \mid \doop(t), z) = P(y \mid \doop(t))\): once \(T\) is fixed externally, its former cause \(Z\) carries no information about \(Y\).

Example: Rule 2 — Action/Observation Exchange

Graph: \(X \to Z \to Y\), no other edges. Form \(\mathcal{G}_{\overline{X}\,\underline{Z}}\) (delete arrows into \(X\) — none — and arrows out of \(Z\) — removes \(Z \to Y\)). Node \(Z\) has no outgoing arrows; no path from \(Z\) to \(Y\) exists. Hence \((Y \indep Z)_{\mathcal{G}_{\overline{X}\,\underline{Z}}}\). Rule 2 gives \(P(y \mid \doop(x), \doop(z)) = P(y \mid \doop(x), z)\).

Example: Rule 3 — Deleting an Irrelevant Intervention

Graph: \(Z \to X \to Y\), no other edges. Form \(\mathcal{G}_{\overline{X}\,\overline{Z}}\) (delete into \(X\) — removes \(Z \to X\) — and into \(Z\) — none). Node \(Z\) is isolated. Hence \((Y \indep Z)_{\mathcal{G}_{\overline{X}\,\overline{Z}}}\). Rule 3 gives \(P(y \mid \doop(x), \doop(z)) = P(y \mid \doop(x))\): once \(X\) is fixed, \(Z\)’s only route to \(Y\) is severed.

Example: A Two-Rule Derivation — The Proxy Variable Graph

Graph. \(T \to Y\), \(U \to T\) (unobserved), \(U \to S\) (observed), \(S \to Y\).

Goal. Simplify \(P(y \mid \doop(t))\) to an observational expression. \(S\) lies on the back-door path \(T \leftarrow U \to S \to Y\) and is observed.

Step 1. Introduce \(S\) by total probability: \(P(y \mid \doop(t)) = \sum_s P(y \mid \doop(t), s)\,P(s \mid \doop(t))\).

Step 2. Simplify \(P(s \mid \doop(t))\) by Rule 3. The required graph is \(\mathcal{G}_{\overline{T}}\). Because \(S\) is not a descendant of \(T\), no path from \(T\) to \(S\) exists in \(\mathcal{G}_{\overline{T}}\). Hence \((\mathcal{S} \indep T)_{\mathcal{G}_{\overline{T}}}\), and Rule 3 gives \(P(s \mid \doop(t)) = P(s)\).

Step 3. Simplify \(P(y \mid \doop(t), s)\) by Rule 2. The required graph is \(\mathcal{G}_{\underline{T}}\). In \(\mathcal{G}_{\underline{T}}\), the only paths between \(T\) and \(Y\) are back-door paths. The set \(\{S\}\) d-separates \(T\) and \(Y\) in \(\mathcal{G}_{\underline{T}}\) (blocking \(T \leftarrow U \to S \to Y\) at \(S\)). Rule 2 gives \(P(y \mid \doop(t), s) = P(y \mid t, s)\).

Combining: \(P(y \mid \doop(t)) = \sum_s P(y \mid t, s)\,P(s)\). \(\square\)

3.5 Do-Calculus Proofs of the Main Theorems

Proof of the Back-Door Formula via Do-Calculus

Graph: \(T \to Y\), \(\mathbf{S} \to T\), \(\mathbf{S} \to Y\) (with \(\mathbf{S}\) satisfying the back-door criterion).

Step 1. Introduce \(\mathbf{S}\): \(P(y \mid \doop(t)) = \int P(y \mid \doop(t), \mathbf{s})\,dP(\mathbf{s} \mid \doop(t))\).

Step 2. Rule 3 with \(X = \varnothing\), \(Z = T\), \(W = \varnothing\): required graph \(\mathcal{G}_{\overline{T}}\). Since \(\mathbf{S}\) contains no descendants of \(T\) (back-door condition 1), \((\mathbf{S} \indep T)_{\mathcal{G}_{\overline{T}}}\), so \(P(\mathbf{s} \mid \doop(t)) = P(\mathbf{s})\).

Step 3. Rule 2 with \(X = \varnothing\), \(Z = T\), \(W = \mathbf{S}\): required graph \(\mathcal{G}_{\underline{T}}\). Back-door condition 2 states \((Y \indep T \mid \mathbf{S})_{\mathcal{G}_{\underline{T}}}\), so \(P(y \mid \doop(t), \mathbf{s}) = P(y \mid t, \mathbf{s})\).

Combining: \(P(y \mid \doop(t)) = \int P(y \mid t, \mathbf{s})\,dP(\mathbf{s})\). \(\square\)

Proof of the Front-Door Formula via Do-Calculus

Graph: \(T \to M \to Y\), \(U \to T\) (unobserved), \(U \to Y\).

Step 1. Introduce \(M\): \(P(y \mid \doop(t)) = \int P(y \mid \doop(t), m)\,dP(m \mid \doop(t))\).

Step 2. Rule 2 to simplify \(P(m \mid \doop(t))\): required graph \(\mathcal{G}_{\underline{T}}\) (delete \(T \to M\)). In this graph no path from \(T\) to \(M\) remains (front-door condition 2 ensures no \(U \to M\) edge), so \((M \indep T)_{\mathcal{G}_{\underline{T}}}\). Rule 2 gives \(P(m \mid \doop(t)) = P(m \mid t)\).

Step 3a. Convert conditioning on \(m\) to intervention. Rule 2 with \(X = T\), \(Z = M\), \(W = \varnothing\): form \(\mathcal{G}_{\overline{T}\,\underline{M}}\) (delete \(U \to T\) and \(M \to Y\)). In this graph no path from \(M\) to \(Y\) exists, so \((Y \indep M \mid T)_{\mathcal{G}_{\overline{T}\,\underline{M}}}\). Rule 2 gives \(P(y \mid \doop(t), m) = P(y \mid \doop(t), \doop(m))\).

Step 3b. Drop the redundant \(\doop(t)\). Rule 3 with \(X = M\), \(Z = T\), \(W = \varnothing\): form \(\mathcal{G}_{\overline{M}\,\overline{T}}\) (delete \(T \to M\) and \(U \to T\)). Node \(T\) is isolated, so \((Y \indep T)_{\mathcal{G}_{\overline{M}\,\overline{T}}}\). Rule 3 gives \(P(y \mid \doop(t), \doop(m)) = P(y \mid \doop(m))\).

Step 3c. Back-door formula for \(M \to Y\). By front-door condition 3, \(\{T\}\) satisfies the back-door criterion for \(M \to Y\): \(P(y \mid \doop(m)) = \int P(y \mid t', m)\,dP(t')\).

Assembling: \(P(y \mid \doop(t)) = \int\!\left[\int P(y \mid t', m)\,dP(t')\right]\!dP(m \mid t)\). \(\square\)

3.6 The Do-Calculus Is Complete

A natural question: are the three rules enough? Could there be a graph where the causal effect is identifiable in principle, but no sequence of rules can derive it? Shpitser and Pearl (2006) answered this definitively.

Semi-Markovian DAGs encode latent common causes compactly as bidirected edges: \(X \leftrightarrow Y\) stands for an unobserved \(U\) with \(U \to X\) and \(U \to Y\).

Theorem: Completeness of the Do-Calculus (Shpitser and Pearl 2006)

Let \(\mathcal{G}\) be a semi-Markovian DAG. A causal quantity \(P(y \mid \doop(t))\) is identifiable from \(P\) relative to \(\mathcal{G}\) if and only if the do-calculus can derive a purely observational expression for it.

Example: The Bow Graph — The Simplest Non-Identifiable Structure

The bow graph has one directed edge \(T \to Y\) and one bidirected edge \(T \leftrightarrow Y\) (representing unobserved \(U\) with \(U \to T\), \(U \to Y\)). No observed variable can block the back-door path \(T \leftarrow U \to Y\); no observed mediator exists for front-door. We show directly that \(P(y \mid \doop(t))\) is not identified by constructing two models that agree on \(P(T, Y)\) but disagree on \(P(Y \mid \doop(T{=}1))\).

Let \(T, Y, U \in \{0,1\}\) with \(U \sim \mathrm{Bernoulli}(1/2)\).

Model \(\mathcal{M}_1\) (pure confounding; no causal effect): \(T = U\), \(Y = U\). Both \(T\) and \(Y\) are driven by \(U\). The arrow \(T \to Y\) carries no causal influence. Observed distribution: \(P(T{=}0, Y{=}0) = P(T{=}1, Y{=}1) = 1/2\). Under \(\doop(T{=}1)\): \(P_1(Y{=}1 \mid \doop(T{=}1)) = P(U{=}1) = 1/2\).

Model \(\mathcal{M}_2\) (full causal effect): \(T = U\), \(Y = T\). \(Y\) is determined entirely by \(T\). Since \(T = U\), the observed distribution is again \(P(T{=}0, Y{=}0) = P(T{=}1, Y{=}1) = 1/2\) — identical to \(\mathcal{M}_1\). Under \(\doop(T{=}1)\): \(P_2(Y{=}1 \mid \doop(T{=}1)) = 1\).

The two models produce identical observational distributions but assign different values (\(1/2\) vs \(1\)) to \(P(Y{=}1 \mid \doop(T{=}1))\). No data set, however large, can distinguish them. The causal effect is not identified.

Proof Sketch and the ID Algorithm

Shpitser and Pearl (2006) introduced the ID algorithm: it takes a semi-Markovian DAG \(\mathcal{G}\) and either returns an observational formula or reports non-identification.

Sufficiency (do-calculus identifies whenever possible): proved constructively. The key concept is the c-component — a maximal set of nodes connected by bidirected edges. The ID algorithm: (1) removes non-ancestors of \(Y\); (2) decomposes over c-components of \(\mathcal{G}_{\overline{T}}\); (3) recursively identifies each factor; (4) reports FAIL if any subproblem cannot be reduced.

Necessity (do-calculus cannot identify non-identifiable quantities): proved via the hedge. A hedge for \(P(y \mid \doop(t))\) is a pair of subgraphs \((F, F')\) encoding a “loop” of confounding the do-calculus cannot break. When a hedge exists, one can construct two models \(\mathcal{M}_1\) and \(\mathcal{M}_2\) with identical observed \(P\) but different \(P(y \mid \doop(t))\). Combining: the do-calculus succeeds if and only if no hedge exists. \(\square\)

Remark: Practical Implication of Completeness

If the ID algorithm reports non-identification, then no estimation method — however clever — can recover the causal effect from observational data alone given the assumed graph structure. Non-identification is not a limitation of a particular technique; it is a fundamental property of the causal model. Three remedies: impose additional structural assumptions (e.g., linearity, monotonicity); collect additional data (e.g., an instrument or a randomized experiment); or target a different, identified causal quantity (e.g., partial identification bounds, or the effect among the treated).

3.7 The Big Picture

This chapter completes the theoretical identification machinery of the course. The full causal inference pipeline is:

\[\text{SEM/DAG} \;\xrightarrow{\text{d-separation}}\; \text{conditional independences} \;\xrightarrow{\text{back-door, front-door, do-calculus}}\; P(y \mid \doop(t)) \;\xrightarrow{\text{estimation}}\; \hat\tau.\]

The three identification criteria relate as follows. The back-door criterion is applicable when observed covariates can block all confounding paths; it is the workhorse of regression adjustment and propensity score methods (Chapters 5–6). The front-door criterion is applicable when an observed mediator intercepts all causal paths while remaining unconfounded from treatment; it is the mechanism behind the front-door formula (Chapter 8). The do-calculus subsumes both: it is complete in the sense of the Completeness Theorem, and its three rules constitute a universal derivation engine for identification.

3.8 Summary

Symbol	Meaning
\(\mathcal{G}_{\overline{X}}\)	Mutilated graph: all arrows into \(X\) deleted
\(\mathcal{G}_{\underline{X}}\)	Auxiliary graph: all arrows out of \(X\) deleted
Back-door criterion	Observed \(\mathbf{S}\) not descending from \(T\); blocks all back-door paths
Front-door criterion	Observed \(M\) intercepting all causal paths; satisfying three conditions
Rule 1	Delete/insert observations if d-separated in \(\mathcal{G}_{\overline{X}}\)
Rule 2	Swap action for observation if d-separated in \(\mathcal{G}_{\overline{X}\,\underline{Z}}\)
Rule 3	Delete action if d-separated in \(\mathcal{G}_{\overline{X}\,\overline{Z(W)}}\)

Graph surgery formalizes intervention: \(\mathcal{G}_{\overline{T}}\) (delete arrows into \(T\)) represents \(\doop(T{=}t)\); d-separation in \(\mathcal{G}_{\overline{T}}\) encodes conditional independence in \(P(\cdot \mid \doop(t))\).
The back-door criterion identifies causal effects when observed variables block all confounding paths. Condition 1 (no descendants of \(T\)) and Condition 2 (d-separation in \(\mathcal{G}_{\underline{T}}\)) are both essential: the M-bias example shows that including a collider can introduce confounding.
The front-door criterion identifies causal effects via an observed mediator when direct confounding is unblockable. The two-stage structure of the formula corresponds to two unconfounded identification steps chained together.
The three rules of do-calculus — insertion/deletion of observations, action/observation exchange, and insertion/deletion of actions — are a sound and complete derivation engine for identification. The back-door and front-door formulas are special cases.
The completeness theorem of Shpitser and Pearl (2006) implies that non-identification is a property of the graph, not of the method: when the ID algorithm fails, no observational method can succeed.

3.9 Problems

1. Graph surgery and d-separation. Consider the DAG with edges \(Z \to T\), \(U \to T\), \(U \to Y\), \(T \to Y\) (\(U\) unobserved).

Draw \(\mathcal{G}_{\overline{T}}\) and \(\mathcal{G}_{\underline{T}}\). For each graph, state which edges were deleted and why.
In \(\mathcal{G}_{\overline{T}}\), is \((Y \indep Z)_{\mathcal{G}_{\overline{T}}}\)? Justify by listing all paths and determining whether each is blocked.
In the original \(\mathcal{G}\), is \((Y \indep Z \mid T)_{\mathcal{G}}\)? How does this relate to the instruction “never condition on a mediator to remove confounding”?
How does the absence of a direct edge \(Z \to Y\) in the DAG relate to the exclusion restriction (the assumption that \(Z\) affects \(Y\) only through \(T\))?

2. Back-door practice. Consider the DAG: \(X \to T\), \(X \to Y\), \(T \to M\), \(M \to Y\), \(T \to Y\), where \(X\) is observed.

List all back-door paths from \(T\) to \(Y\).
Does \(\{X\}\) satisfy the back-door criterion for the effect of \(T\) on \(Y\)? Write the resulting adjustment formula.
Does \(\{M\}\) satisfy the back-door criterion? Explain why or why not.
Does \(\{X, M\}\) satisfy the back-door criterion? Identify which condition of the criterion \(M\) violates.
Even if the formal criterion issue in (d) were set aside, explain the substantive bias that arises from conditioning on a mediator \(M\) that lies on a causal path \(T \to M \to Y\).

3. Front-door identification. Consider the DAG: \(T \to M \to Y\), with \(U \to T\) and \(U \to Y\) (unobserved \(U\), no direct \(T \to Y\) edge).

Verify that \(M\) satisfies all three front-door conditions.
Derive the front-door formula Equation 8.14 step by step, citing the do-calculus rule used at each step.
Now add a direct edge \(T \to Y\). Does \(M\) still satisfy the front-door criterion? Explain which condition fails.

4. Identification or non-identification? For each DAG, determine whether \(P(y \mid \doop(t))\) is identified non-parametrically. If identified, state the formula; if not, explain the obstruction.

\(T \to Y\), \(U \to T\), \(U \to Y\), \(U\) unobserved.
Same as (a), with instrument \(Z \to T\) and \(Z \indep U\) (no direct \(Z \to Y\) edge). Show \(P(y \mid \doop(t))\) is not identified by constructing two binary SEMs \(\mathcal{M}_1\) and \(\mathcal{M}_2\), each compatible with the IV graph, such that \(P_{\mathcal{M}_1}(Z, T, Y) = P_{\mathcal{M}_2}(Z, T, Y)\) but \(P_{\mathcal{M}_1}(Y{=}1 \mid \doop(T{=}1)) \neq P_{\mathcal{M}_2}(Y{=}1 \mid \doop(T{=}1))\).
\(T \to M \to Y\), \(U \to T\), \(U \to Y\), \(U \to M\), \(U\) unobserved.
\(T \to Y\), \(X \to T\), \(X \to Y\), \(X\) observed.

5. Back-door formula: proof and uniqueness.

Under the conditions of ?thm-backdoor, prove Equation 5.5 using the three rules of do-calculus, following the proof sketch in Section 3.5.
Suppose \(\mathbf{S}_1\) and \(\mathbf{S}_2\) both satisfy the back-door criterion (with positivity). Deduce from (a) that \(\int P(y \mid t, \mathbf{s}_1)\,dP(\mathbf{s}_1) = \int P(y \mid t, \mathbf{s}_2)\,dP(\mathbf{s}_2)\). Interpret: the identification target \(P(y \mid \doop(t))\) is unique even when the adjustment set is not.

6. Rule sequencing on a graph with two observed confounders. Consider the DAG: \(C_1 \to T\), \(C_1 \to Y\), \(U \to T\), \(U \to C_2\), \(C_2 \to Y\), \(T \to Y\), with \(C_1\), \(C_2\) observed and \(U\) unobserved. Problem 2 established that \(\{C_1, C_2\}\) is a valid back-door set. Derive \(P(y \mid \doop(t)) = \sum_{c_1,c_2} P(y \mid t, c_1, c_2)\,P(c_1, c_2)\) from first principles using the three rules. For each step: (i) name the rule applied, (ii) state which arrows are deleted and what edges remain, (iii) verify the required d-separation condition.

7. Non-identifiability by construction. Consider the graph from Problem 4(c): \(T \to M \to Y\), \(U \to T\), \(U \to M\), \(U \to Y\) (front-door condition 2 violated). Let \(T, M, Y, U \in \{0,1\}\) with \(U \sim \mathrm{Bernoulli}(1/2)\).

Construct two SEMs \(\mathcal{M}_1\) and \(\mathcal{M}_2\), each compatible with the graph, such that \(P_{\mathcal{M}_1}(T, M, Y) = P_{\mathcal{M}_2}(T, M, Y)\) but \(P_{\mathcal{M}_1}(Y{=}1 \mid \doop(T{=}1)) \neq P_{\mathcal{M}_2}(Y{=}1 \mid \doop(T{=}1))\).
Explain which identification strategy fails: (i) back-door criterion, (ii) front-door criterion (which condition fails), (iii) completeness theorem (what does the existence of your two models imply?).