Appendix B — Single World Intervention Graphs

This appendix develops single world intervention graphs (SWIGs), a graphical device introduced by Richardson and Robins (2014) and given an accessible practical treatment by Bezuidenhout et al. (2025). A SWIG provides a formal bridge between the potential outcomes framework and the do-calculus by encoding both in a single diagram. A reader familiar with Chapters 2–4 will find that SWIGs require no new conceptual ingredients: they are ordinary DAGs in which the treatment node is split into two pieces, making potential outcomes and identifiability assumptions graphically explicit rather than leaving them implicit.

Chapter 4 used the back-door criterion and the adjustment formula to identify causal effects. SWIGs make that identification argument graphically explicit: potential outcomes appear as labeled nodes inside the graph, so ignorability becomes a routine d-separation statement rather than a free-standing probabilistic assumption. Section B.5 restates the back-door identification result of Chapter 4 as a graph-theoretic theorem inside the SWIG, and examines a partial converse: under faithfulness of the SWIG and a restriction of the adjustment set to nondescendants of the treatment, the single-world ignorability statement $Y(t) \indep T \mid X$ entails the back-door criterion.

B.1 The SWIG Construction

In the original DAG $\mathcal{G}$, the node $T$ represents the treatment as it naturally occurs. Under $\doop(T{=}t)$, however, the treatment is externally fixed. The SWIG separates these two roles by splitting $T$ into two pieces displayed side by side in the graph:

a random half (left side, capital letter $T$): represents the treatment value that would naturally occur, and retains all incoming edges from $T$’s causal parents;
a fixed half (right side, lowercase letter $t$): represents the externally imposed intervention value, carries all outgoing edges that formerly left $T$, and has no edge to or from the random half.

Definition: Single World Intervention Graph (Richardson and Robins 2014)

The SWIG $\mathcal{G}(t)$ is constructed from $\mathcal{G}$ by:

Replacing $T$ with two nodes: random half $T$ (all incoming edges retained) and fixed half $t$ (all outgoing edges retained).
Severing the direct connection between the two halves.
Redrawing causal arrows at the split node: all arrows into the split node land on the random half; all arrows departing the split node leave from the fixed half.
Relabeling every descendant $V$ of $T$ in $\mathcal{G}$ as $V(t)$, indicating it is a potential outcome under $\doop(T{=}t)$.

Converting the confounded-treatment DAG ($L$ is a confounder) into the SWIG for $\doop(A{=}a)$. The treatment node splits: the random half $A$ (circle) retains the incoming arrow from $L$; the fixed half $a$ (square) carries the outgoing arrow to $Y(a)$. D-separation in the SWIG gives $A \indep Y(a) \mid L$ (conditional ignorability).

B.2 The Fixed Half as a Source Node

After the SWIG $\mathcal{G}(t)$ is constructed, d-separation among its nodes is evaluated using the standard rules of Chapter 2. The key structural fact is:

The Fixed Half Is a Source Node

In $\mathcal{G}(t)$, no directed edge points into the fixed half $t$. Two structural consequences follow.

The route $T \to t \to Y(t)$ does not exist. Because no edge connects the random half $T$ to the fixed half $t$, the directed concatenation $T \to t \to Y(t)$ is not a path in the graph — there is nothing for d-separation to block.
Every path from $T$ to a potential outcome $V(t)$ leaves $T$ through one of $T$’s natural causes. The only edges incident to the random half $T$ in $\mathcal{G}(t)$ are the incoming edges from $T$’s parents in $\mathcal{G}$. Any path from $T$ to $V(t)$ must therefore start with an arrow pointing into $T$ — that is, every such path is a back-door path.

“Source Node” Means No Parents, Not “On No Path”

Calling $t$ a source node refers to the absence of directed edges into $t$, nothing more. An undirected path can still pass through a source node. What the source-node property delivers is the asymmetry that matters for the back-door argument: at the random half $T$, all edges point inward, so any path leaving $T$ must do so through one of $T$’s causes.

Reading the back-door criterion off the SWIG. By the source-node analysis above, every path from the random half $T$ to a potential outcome $V(t)$ in $\mathcal{G}(t)$ is a back-door path in $\mathcal{G}$, so blocking all back-door paths is exactly what d-separation of $T$ and $V(t)$ in the SWIG asks for.

Recovering ignorability from the SWIG. In the SWIG of the confounded-treatment example, the only path from the random half $A$ to $Y(a)$ is $A \leftarrow L \to Y(a)$: open, but blocked by conditioning on $L$. Conditioning on $L$ achieves d-separation, and the standard rules give $A \indep Y(a) \mid L$ in $\mathcal{G}(a)$, which is exactly the conditional ignorability of Chapter 4, now derived as a graph-structural consequence rather than asserted as an assumption.

B.3 When Ignorability Fails: Hidden Confounding

The SWIG is equally useful for diagnosing failures of ignorability. Suppose the DAG is augmented by a hidden common cause $U$ that affects both $A$ and $Y$ but is not recorded in the data.

Adding a hidden confounder $U$ (dashed circle) to the confounded-treatment DAG and its SWIG. In $\mathcal{G}(a)$, the hidden path $A \leftarrow U \to Y(a)$ cannot be blocked by conditioning on observed covariates. Ignorability fails.

In the SWIG $\mathcal{G}(a)$ with hidden $U$, the paths from $A$ to $Y(a)$ are: (1) $A \leftarrow L \to Y(a)$: blocked by conditioning on $L$; and (2) $A \leftarrow U \to Y(a)$: open regardless of $L$, since $U$ is unobserved and cannot be conditioned on. Even after conditioning on $L$, $A \not\perp\!\!\!\perp Y(a) \mid L$, so ignorability fails and no adjustment for observed covariates alone can identify the ATE. The failure is immediately visible from the SWIG: an unblocked dashed arrow entering the random half $A$ signals that selection into treatment carries information about the potential outcome that no set of observed covariates can remove.

Remark

This failure motivates the instrumental variable designs of Chapter 7. When a valid instrument $Z$ is available — one that affects $A$ but has no direct effect on $Y$ and is independent of $U$ — the IV strategy sidesteps the unblocked $U$-path rather than attempting to block it by covariate adjustment.

B.4 What SWIGs Achieve

1. Potential outcomes as random variables in a DAG. In $\mathcal{G}(t)$, the variable $Y(t)$ is an ordinary node with well-defined parents: the fixed half $t$ and any other causes of $Y$ retained from $\mathcal{G}$. Its marginal distribution equals the interventional distribution $f(y \mid \doop(T{=}t))$.

2. The mutilated graph as a special case. Removing the random half $T$ from $\mathcal{G}(t)$, and suppressing the fixed-half labeling on its descendants, recovers the usual mutilated graph $\mathcal{G}_{\overline{T}}$ of Chapter 1 representing $\doop(T{=}t)$. The SWIG is more informative: it retains the random half alongside the fixed half, making it possible to ask questions about the relationship between the natural treatment $T$ and the potential outcomes $V(t)$ via d-separation.

3. Unconfoundedness as a d-separation statement. The ignorability condition $Y(t) \indep T \mid X$ corresponds, under the Markov property of the SWIG, to a d-separation statement in $\mathcal{G}(t)$ verifiable by path-tracing. Ignorability is no longer an assertion encoded in potential-outcome notation alone; it is a structural claim about the graph.

4. A single graph for both frameworks. Before SWIGs, combining the potential outcomes framework and the do-calculus required switching notation. The SWIG eliminates this translation step: one graph simultaneously hosts the natural treatment $T$ (random half), the intervention value $t$ (fixed half), the potential outcomes $V(t)$, and all d-separation relationships among them.

Cross-World Independence and Its Limits

Consider $Y(1) \indep Y(0)$: $Y(1)$ lives in SWIG $\mathcal{G}(1)$ and $Y(0)$ lives in SWIG $\mathcal{G}(0)$. No unit is ever observed in both worlds simultaneously, so no data can directly test this independence. A more substantive cross-world example arises in mediation analysis (Chapter 8): identification of natural direct and indirect effects relies on assumptions of the form $Y(t, m) \indep M(t') \mid X$ ($t \neq t'$), which combine potential outcomes from different SWIGs. SWIGs make such assumptions explicit: the analyst must specify a joint distribution across multiple SWIGs. The single-world results of Section B.5 therefore cannot themselves supply cross-world assumptions.

Remark

More than one node may be split in a single SWIG. In longitudinal settings with time-varying treatments $(A_0, A_1)$, splitting both nodes yields the sequential-randomization SWIG, from which the two sequential exchangeability conditions needed to identify the g-formula follow directly by d-separation.

B.5 Single-World Ignorability and the Back-Door Criterion

This section restates the back-door identification result of Chapter 4 graphically as a theorem inside the SWIG, and then examines a partial converse. The target conditional independence is the single-world (or weak) ignorability statement $Y(t) \indep T \mid X$ ($t = 0, 1$). The cross-world joint statement $(Y(0), Y(1)) \indep T \mid X$ involves $Y(0)$ and $Y(1)$, which live in two distinct SWIGs; a single SWIG cannot encode it as a d-separation.

Definition: Faithfulness of the SWIG

Let $\mathcal{G}(t)$ be the SWIG induced by an intervention $\doop(T{=}t)$, and let $P_t$ denote the joint distribution it induces over the random nodes. The distribution $P_t$ is faithful to $\mathcal{G}(t)$ if every conditional independence that holds in $P_t$ corresponds to a d-separation in $\mathcal{G}(t)$: \[A \indep B \mid C \text{ in } P_t \;\Longrightarrow\; A \text{ and } B \text{ are d-separated by } C \text{ in } \mathcal{G}(t).\] Faithfulness rules out accidental cancellations of causal paths in the structural model. In smooth parametric families with a fixed graph, the parameter configurations that produce unfaithfulness typically form a lower-dimensional subset, so faithfulness is, in that sense, generic. The qualifier matters: deterministic relationships, context-specific independencies, and graph misspecification can all produce substantive failures of faithfulness.

Faithfulness is invoked only for the converse direction below.

Theorem: Back-Door Criterion Implies Single-World Ignorability

Fix an intervention value $t$. Suppose the structural causal model underlying $\mathcal{G}$ induces a distribution $P_t$ over the random nodes of the SWIG $\mathcal{G}(t)$ that is Markov with respect to $\mathcal{G}(t)$. If $X$ contains no descendants of $T$ in $\mathcal{G}$ and blocks every back-door path from $T$ to $Y$, then $Y(t) \indep T \mid X$.

Proof. By the source-node analysis of Section B.2, every path from the random half $T$ to the potential outcome $Y(t)$ in $\mathcal{G}(t)$ leaves $T$ through one of $T$’s parents in $\mathcal{G}$ and is therefore a back-door path. By hypothesis, $X$ blocks every such path; the no-descendants clause ensures that conditioning on members of $X$ does not open a previously blocked collider path. Hence $T$ and $Y(t)$ are d-separated by $X$ in $\mathcal{G}(t)$, and the Markov property of $P_t$ delivers $Y(t) \indep T \mid X$. $\square$

For binary treatment, applying the theorem separately at $t = 0$ and $t = 1$ yields the weak ignorability conditions $Y(0) \indep T \mid X$ and $Y(1) \indep T \mid X$ that justify the adjustment formula of Chapter 4.

Remark: A Partial Converse

If $X$ is restricted a priori to pre-treatment (nondescendant) variables in $\mathcal{G}$ and $P_t$ is faithful to $\mathcal{G}(t)$, the implication above can be partially reversed: \[Y(t) \indep T \mid X \text{ in } P_t \;\Longrightarrow\; X \text{ blocks every back-door path from } T \text{ to } Y.\] By faithfulness, the assumed independence corresponds to d-separation of the random half $T$ and $Y(t)$ by $X$ in $\mathcal{G}(t)$; since every path from $T$ to $Y(t)$ is a back-door path, $X$ must block all of them. Without the nondescendant restriction, however, conditional exchangeability alone does not imply the full back-door criterion.

Why “No Descendants of $T$” Is a Hypothesis, Not a Conclusion

The partial converse imposes “no descendants of $T$” as a hypothesis. This is essential: the conditional independence $Y(t) \indep T \mid X$ does not by itself imply that $X$ contains no descendant of $T$. For instance, take the DAG with edges $T \to Y$ and $T \to D$ and no confounding, and let $X = \{D\}$. $D$ is a descendant of $T$, so $X$ violates the back-door criterion. Nonetheless, $Y(t) \indep T \mid D$ may continue to hold because there is no back-door path from $T$ to $Y$ at all. Conditioning on a descendant of $T$ may open a previously blocked collider path, but it need not. The no-descendants clause belongs to the definition of an admissible graphical adjustment set and cannot be inferred from observed conditional independencies alone.

From single-world to cross-world. The strong ignorability condition $(Y(0), Y(1)) \indep T \mid X$ is a statement about the joint distribution of $Y(0)$ and $Y(1)$ together with $T$. Because $Y(0)$ and $Y(1)$ are nodes in two distinct SWIGs, no single SWIG represents this joint distribution as a d-separation, and the theorem above correspondingly speaks only about the single-world form. For ATE identification this is no obstacle: the single-world form $Y(t) \indep T \mid X$ for each $t \in \{0,1\}$ is enough. Identification of joint or distributional functionals of $(Y(0), Y(1))$ — for instance, the variance of the unit-level treatment effect, or the proportion of units harmed by treatment — requires the cross-world joint form and assumptions across multiple SWIGs that single-world graphical reasoning cannot supply.

Conceptual takeaway. The back-door criterion is a graphical sufficient condition for the single-world conditional exchangeability assumption needed for ATE identification. Under the additional restriction that the candidate adjustment set is composed of nondescendants of $T$, and under faithfulness of the relevant counterfactual distribution to the SWIG, the absence of unblocked back-door paths corresponds to the single-world independence $Y(t) \indep T \mid X$. In this sense the graphical and counterfactual languages describe the same identifying structure. Strict logical equivalence in full generality is more delicate: ignorability can hold in particular models for parametric reasons that no graphical criterion will detect.

Problems

1. SWIG construction. Consider the DAG: $X \to T$, $X \to Y$, $T \to Y$, with $X$ fully observed.

Construct the SWIG $\mathcal{G}(t)$ by splitting $T$ into its random and fixed halves. Draw the result, labeling the random half, fixed half, and the potential outcome $Y(t)$.
In $\mathcal{G}(t)$, identify all paths between the random half $T$ and $Y(t)$. Apply d-separation to determine which are blocked and which are open before conditioning.
Verify that $X$ satisfies the back-door criterion in $\mathcal{G}$: show that $X$ blocks the unique back-door path $T \leftarrow X \to Y$ and that $X$ contains no descendant of $T$. Apply the theorem above to conclude $Y(t) \indep T \mid X$, and confirm that the same conclusion can be read directly off $\mathcal{G}(t)$ via d-separation.
Now add a hidden common cause $U \to T$, $U \to Y$. Draw the revised SWIG. Does the ignorability argument still hold? State the correct conclusion and identify what additional structure (if any) would be needed for identification.

2. Sequential exchangeability in a longitudinal SWIG. Suppose the treatment is time-varying: $T_0$ is assigned at baseline and $T_1$ at a second time point, and a time-varying covariate $L$ is measured between the two assignments. The DAG has edges $X \to T_0$, $X \to L$, $X \to Y$, $T_0 \to L$, $T_0 \to T_1$, $T_0 \to Y$, $L \to T_1$, $L \to Y$, and $T_1 \to Y$.

Construct the SWIG $\mathcal{G}(t_0, t_1)$ by splitting both $T_0$ and $T_1$. Label the random halves, the fixed halves $t_0$ and $t_1$, and the potential outcomes. In particular, the random half of $T_1$ should be labeled $T_1(t_0)$ and the time-varying covariate should be labeled $L(t_0)$.
List the back-door paths in $\mathcal{G}(t_0, t_1)$ from the random half $T_0$ to $Y(t_0, t_1)$, and from $T_1(t_0)$ to $Y(t_0, t_1)$. For each, identify the smallest subset of $\{X,\, T_0,\, L(t_0)\}$ whose conditioning achieves d-separation.
State the two sequential exchangeability conditions that the SWIG makes graphically transparent: $Y(t_0, t_1) \indep T_0 \mid X$ and $Y(t_0, t_1) \indep T_1(t_0) \mid X,\, T_0 = t_0,\, L(t_0)$. Explain in plain language what each says about treatment assignment at the corresponding time point.
In observed-data shorthand, the second condition is often written $Y(\bar t) \indep T_1 \mid X,\, T_0 = t_0,\, L$. State the implicit consistency step that licenses replacing $L(t_0)$ with the observed $L$ on the event $\{T_0 = t_0\}$.
What goes wrong if the analyst omits $L$ from the conditioning set at the second stage? What goes wrong if the analyst conditions on $L$ but treats it as a baseline covariate? Connect your answers to the role of $L(t_0)$ as a treatment-induced confounder.

3. Why the no-descendants clause is a hypothesis. Consider the DAG with edges $T \to Y$ and $T \to D$, no confounders, and let $X = \{D\}$.

Construct the SWIG $\mathcal{G}(t)$. Identify all paths between the random half $T$ and $Y(t)$, and determine whether any back-door paths exist.
Argue that, in this graph, $Y(t) \indep T \mid D$ holds in $P_t$ even though $X = \{D\}$ violates the back-door criterion.
Explain in your own words why this example shows that the no-descendants clause must be imposed as a hypothesis on the adjustment set rather than derived from a conditional independence statement.

Bezuidenhout, Dana, Sarah Forthal, Kara Rudolph, and Matthew R. Lamb. 2025. “Single World Intervention Graphs (SWIGs): A Practical Guide.” American Journal of Epidemiology 194: 2047–52. https://doi.org/10.1093/aje/kwae353.

Richardson, Thomas S., and James M. Robins. 2014. ACE Bounds; Single World Intervention Graphs (SWIGs) and Identification of Causal Effects. University of Washington.