Appendix B — Single World Intervention Graphs
This appendix develops single world intervention graphs (SWIGs), a graphical device introduced by Richardson and Robins (2014) and given an accessible practical treatment by Bezuidenhout et al. (2025). A SWIG provides a formal bridge between the potential outcomes framework and the do-calculus by encoding both in a single diagram. A reader familiar with Chapters 2–4 will find that SWIGs require no new conceptual ingredients: they are ordinary DAGs in which the treatment node is split into two pieces, making potential outcomes and identifiability assumptions graphically explicit rather than leaving them implicit.
Chapter 4 used the back-door criterion and the adjustment formula to identify causal effects. SWIGs make that identification argument graphically explicit: potential outcomes appear as labeled nodes inside the graph, so ignorability becomes a routine d-separation statement rather than a free-standing probabilistic assumption. Section B.5 restates the back-door identification result of Chapter 4 as a graph-theoretic theorem inside the SWIG, and examines a partial converse: under faithfulness of the SWIG and a restriction of the adjustment set to nondescendants of the treatment, the single-world ignorability statement \(Y(t) \indep T \mid X\) entails the back-door criterion.
B.1 The SWIG Construction
In the original DAG \(\mathcal{G}\), the node \(T\) represents the treatment as it naturally occurs. Under \(\doop(T{=}t)\), however, the treatment is externally fixed. The SWIG separates these two roles by splitting \(T\) into two pieces displayed side by side in the graph:
- a random half (left side, capital letter \(T\)): represents the treatment value that would naturally occur, and retains all incoming edges from \(T\)’s causal parents;
- a fixed half (right side, lowercase letter \(t\)): represents the externally imposed intervention value, carries all outgoing edges that formerly left \(T\), and has no edge to or from the random half.
B.2 The Fixed Half as a Source Node
After the SWIG \(\mathcal{G}(t)\) is constructed, d-separation among its nodes is evaluated using the standard rules of Chapter 2. The key structural fact is:
Reading the back-door criterion off the SWIG. By the source-node analysis above, every path from the random half \(T\) to a potential outcome \(V(t)\) in \(\mathcal{G}(t)\) is a back-door path in \(\mathcal{G}\), so blocking all back-door paths is exactly what d-separation of \(T\) and \(V(t)\) in the SWIG asks for.
Recovering ignorability from the SWIG. In the SWIG of the confounded-treatment example, the only path from the random half \(A\) to \(Y(a)\) is \(A \leftarrow L \to Y(a)\): open, but blocked by conditioning on \(L\). Conditioning on \(L\) achieves d-separation, and the standard rules give \(A \indep Y(a) \mid L\) in \(\mathcal{G}(a)\), which is exactly the conditional ignorability of Chapter 4, now derived as a graph-structural consequence rather than asserted as an assumption.
B.4 What SWIGs Achieve
1. Potential outcomes as random variables in a DAG. In \(\mathcal{G}(t)\), the variable \(Y(t)\) is an ordinary node with well-defined parents: the fixed half \(t\) and any other causes of \(Y\) retained from \(\mathcal{G}\). Its marginal distribution equals the interventional distribution \(f(y \mid \doop(T{=}t))\).
2. The mutilated graph as a special case. Removing the random half \(T\) from \(\mathcal{G}(t)\), and suppressing the fixed-half labeling on its descendants, recovers the usual mutilated graph \(\mathcal{G}_{\overline{T}}\) of Chapter 1 representing \(\doop(T{=}t)\). The SWIG is more informative: it retains the random half alongside the fixed half, making it possible to ask questions about the relationship between the natural treatment \(T\) and the potential outcomes \(V(t)\) via d-separation.
3. Unconfoundedness as a d-separation statement. The ignorability condition \(Y(t) \indep T \mid X\) corresponds, under the Markov property of the SWIG, to a d-separation statement in \(\mathcal{G}(t)\) verifiable by path-tracing. Ignorability is no longer an assertion encoded in potential-outcome notation alone; it is a structural claim about the graph.
4. A single graph for both frameworks. Before SWIGs, combining the potential outcomes framework and the do-calculus required switching notation. The SWIG eliminates this translation step: one graph simultaneously hosts the natural treatment \(T\) (random half), the intervention value \(t\) (fixed half), the potential outcomes \(V(t)\), and all d-separation relationships among them.
B.5 Single-World Ignorability and the Back-Door Criterion
This section restates the back-door identification result of Chapter 4 graphically as a theorem inside the SWIG, and then examines a partial converse. The target conditional independence is the single-world (or weak) ignorability statement \(Y(t) \indep T \mid X\) (\(t = 0, 1\)). The cross-world joint statement \((Y(0), Y(1)) \indep T \mid X\) involves \(Y(0)\) and \(Y(1)\), which live in two distinct SWIGs; a single SWIG cannot encode it as a d-separation.
Faithfulness is invoked only for the converse direction below.
Proof. By the source-node analysis of Section B.2, every path from the random half \(T\) to the potential outcome \(Y(t)\) in \(\mathcal{G}(t)\) leaves \(T\) through one of \(T\)’s parents in \(\mathcal{G}\) and is therefore a back-door path. By hypothesis, \(X\) blocks every such path; the no-descendants clause ensures that conditioning on members of \(X\) does not open a previously blocked collider path. Hence \(T\) and \(Y(t)\) are d-separated by \(X\) in \(\mathcal{G}(t)\), and the Markov property of \(P_t\) delivers \(Y(t) \indep T \mid X\). \(\square\)
For binary treatment, applying the theorem separately at \(t = 0\) and \(t = 1\) yields the weak ignorability conditions \(Y(0) \indep T \mid X\) and \(Y(1) \indep T \mid X\) that justify the adjustment formula of Chapter 4.
From single-world to cross-world. The strong ignorability condition \((Y(0), Y(1)) \indep T \mid X\) is a statement about the joint distribution of \(Y(0)\) and \(Y(1)\) together with \(T\). Because \(Y(0)\) and \(Y(1)\) are nodes in two distinct SWIGs, no single SWIG represents this joint distribution as a d-separation, and the theorem above correspondingly speaks only about the single-world form. For ATE identification this is no obstacle: the single-world form \(Y(t) \indep T \mid X\) for each \(t \in \{0,1\}\) is enough. Identification of joint or distributional functionals of \((Y(0), Y(1))\) — for instance, the variance of the unit-level treatment effect, or the proportion of units harmed by treatment — requires the cross-world joint form and assumptions across multiple SWIGs that single-world graphical reasoning cannot supply.
Conceptual takeaway. The back-door criterion is a graphical sufficient condition for the single-world conditional exchangeability assumption needed for ATE identification. Under the additional restriction that the candidate adjustment set is composed of nondescendants of \(T\), and under faithfulness of the relevant counterfactual distribution to the SWIG, the absence of unblocked back-door paths corresponds to the single-world independence \(Y(t) \indep T \mid X\). In this sense the graphical and counterfactual languages describe the same identifying structure. Strict logical equivalence in full generality is more delicate: ignorability can hold in particular models for parametric reasons that no graphical criterion will detect.
Problems
1. SWIG construction. Consider the DAG: \(X \to T\), \(X \to Y\), \(T \to Y\), with \(X\) fully observed.
- Construct the SWIG \(\mathcal{G}(t)\) by splitting \(T\) into its random and fixed halves. Draw the result, labeling the random half, fixed half, and the potential outcome \(Y(t)\).
- In \(\mathcal{G}(t)\), identify all paths between the random half \(T\) and \(Y(t)\). Apply d-separation to determine which are blocked and which are open before conditioning.
- Verify that \(X\) satisfies the back-door criterion in \(\mathcal{G}\): show that \(X\) blocks the unique back-door path \(T \leftarrow X \to Y\) and that \(X\) contains no descendant of \(T\). Apply the theorem above to conclude \(Y(t) \indep T \mid X\), and confirm that the same conclusion can be read directly off \(\mathcal{G}(t)\) via d-separation.
- Now add a hidden common cause \(U \to T\), \(U \to Y\). Draw the revised SWIG. Does the ignorability argument still hold? State the correct conclusion and identify what additional structure (if any) would be needed for identification.
2. Sequential exchangeability in a longitudinal SWIG. Suppose the treatment is time-varying: \(T_0\) is assigned at baseline and \(T_1\) at a second time point, and a time-varying covariate \(L\) is measured between the two assignments. The DAG has edges \(X \to T_0\), \(X \to L\), \(X \to Y\), \(T_0 \to L\), \(T_0 \to T_1\), \(T_0 \to Y\), \(L \to T_1\), \(L \to Y\), and \(T_1 \to Y\).
- Construct the SWIG \(\mathcal{G}(t_0, t_1)\) by splitting both \(T_0\) and \(T_1\). Label the random halves, the fixed halves \(t_0\) and \(t_1\), and the potential outcomes. In particular, the random half of \(T_1\) should be labeled \(T_1(t_0)\) and the time-varying covariate should be labeled \(L(t_0)\).
- List the back-door paths in \(\mathcal{G}(t_0, t_1)\) from the random half \(T_0\) to \(Y(t_0, t_1)\), and from \(T_1(t_0)\) to \(Y(t_0, t_1)\). For each, identify the smallest subset of \(\{X,\, T_0,\, L(t_0)\}\) whose conditioning achieves d-separation.
- State the two sequential exchangeability conditions that the SWIG makes graphically transparent: \(Y(t_0, t_1) \indep T_0 \mid X\) and \(Y(t_0, t_1) \indep T_1(t_0) \mid X,\, T_0 = t_0,\, L(t_0)\). Explain in plain language what each says about treatment assignment at the corresponding time point.
- In observed-data shorthand, the second condition is often written \(Y(\bar t) \indep T_1 \mid X,\, T_0 = t_0,\, L\). State the implicit consistency step that licenses replacing \(L(t_0)\) with the observed \(L\) on the event \(\{T_0 = t_0\}\).
- What goes wrong if the analyst omits \(L\) from the conditioning set at the second stage? What goes wrong if the analyst conditions on \(L\) but treats it as a baseline covariate? Connect your answers to the role of \(L(t_0)\) as a treatment-induced confounder.
3. Why the no-descendants clause is a hypothesis. Consider the DAG with edges \(T \to Y\) and \(T \to D\), no confounders, and let \(X = \{D\}\).
- Construct the SWIG \(\mathcal{G}(t)\). Identify all paths between the random half \(T\) and \(Y(t)\), and determine whether any back-door paths exist.
- Argue that, in this graph, \(Y(t) \indep T \mid D\) holds in \(P_t\) even though \(X = \{D\}\) violates the back-door criterion.
- Explain in your own words why this example shows that the no-descendants clause must be imposed as a hypothesis on the adjustment set rather than derived from a conditional independence statement.