Appendix A — Graphical Intuition for Conditional Independence and \(d\)-Separation
This appendix provides a gentle introduction to the probabilistic and graphical ideas that underlie Chapters 2 and 3. Our goal is not to give a complete treatment of graphical models, but rather to develop enough intuition so that the formal machinery of \(d\)-separation, back-door adjustment, and intervention graphs does not appear abruptly.
A directed acyclic graph (DAG) is more than a picture. It is a compact language for expressing assumptions about how variables are related. Once those assumptions are represented graphically, the graph tells us which variables may be associated, which paths transmit dependence, and which variables should or should not be conditioned on. These ideas become central in Chapter 2, where we study \(d\)-separation, and in Chapter 3, where we use graph surgery to derive identification results.
The key pedagogical idea of this appendix is simple: before learning the full \(d\)-separation criterion, it is helpful to understand three elementary three-node patterns. Those local patterns explain most of what happens later in larger graphs.
A.1 Conditional Independence: The Probabilistic Language Behind Graphs
Before introducing graphs, we first recall the probabilistic notion that graphs are designed to encode.
Equivalently, once \(Z\) is known, learning \(X\) gives no additional information about \(Y\), and learning \(Y\) gives no additional information about \(X\).
In terms of densities (with respect to a dominating measure on \((X, Y, Z)\)), conditional independence admits the following equivalent characterizations, each interpreted almost everywhere: \[X \indep Y \mid Z \;\iff\; f(x,y,z)\,f(z) = f(x,z)\,f(y,z) \;\iff\; \exists\, a, b \colon f(x,y,z) = a(x,z)\,b(y,z).\] The last form is especially useful: it says that the joint density factors into one piece depending on \((x,z)\) and another depending on \((y,z)\), with no cross-term in \(x\) and \(y\).
Properties (C1)–(C4) hold for any probability distribution and are known as the semigraphoid axioms. Property (C5) additionally requires the joint density to be strictly positive; together with (C1)–(C4) it forms the graphoid axioms. These properties are used implicitly throughout the course whenever conditional independence statements are combined or simplified.
Conditional independence is not the same as marginal independence. Two variables may be dependent marginally but independent after conditioning on a third variable. Conversely, two variables may be independent marginally but become dependent after conditioning. Both phenomena occur repeatedly in causal inference.
This example illustrates a recurring theme in causal inference: an observed association may be induced by a third variable, and conditioning on that variable can remove the spurious dependence.
A.2 Three Basic Motifs: Chain, Fork, and Collider
Every path in a DAG is built from local three-node configurations. There are three fundamental types: a chain, a fork, and a collider. Their behavior under conditioning is the foundation of \(d\)-separation.
A.2.1 Chain
Consider the pattern \(X_1 \to X_2 \to X_3\), where the middle node \(X_2\) lies on a directed pathway from \(X_1\) to \(X_3\).
Once the middle variable is fixed, the chain no longer transmits additional information from \(X_1\) to \(X_3\).
A.2.2 Fork
Consider the pattern \(X_1 \leftarrow X_2 \to X_3\), where the middle node \(X_2\) is a common cause of \(X_1\) and \(X_3\).
A fork is the simplest graphical form of confounding: the middle node creates association, and conditioning on it blocks the path.
A.2.3 Collider
Consider the pattern \(X_1 \to X_2 \leftarrow X_3\), where the middle node \(X_2\) is a common effect of \(X_1\) and \(X_3\).
Unlike chains and forks, a collider blocks the path by default. Conditioning on the collider opens the path and may induce association that was not present marginally.
A.2.4 Summary of the Three Motifs
A chain (\(X_1 \to X_2 \to X_3\)) and a fork (\(X_1 \leftarrow X_2 \to X_3\)) are each open by default and blocked by conditioning on the middle node \(X_2\). A collider (\(X_1 \to X_2 \leftarrow X_3\)) is blocked by default and opened by conditioning on the middle node or any of its descendants.
Chapter 2 develops these three motifs into the full \(d\)-separation criterion; see in particular the blocking rules and the extended treatment of collider bias.
A.3 \(d\)-Separation: When Does Conditioning Block a Path?
The three-node motifs explain what happens locally on a path. The next step is to extend this logic to a general DAG, where two variables may be connected by many paths.
A.3.1 A Confounding Example
Consider the DAG with edges \(X \to T\), \(T \to Y\), and \(X \to Y\). Here \(T\) is the treatment, \(Y\) is the outcome, and \(X\) is a pre-treatment covariate that affects both. There are two paths from \(T\) to \(Y\): the directed causal path \(T \to Y\), and the back-door path \(T \leftarrow X \to Y\).
Without conditioning, the back-door path is open, so the observed association between \(T\) and \(Y\) mixes the causal effect with confounding. Conditioning on \(X\) blocks the fork \(T \leftarrow X \to Y\), thereby isolating the causal path. This confounding graph anticipates the back-door criterion of Chapter 3: the graphical condition that makes adjustment valid is precisely that all back-door paths are blocked.
A.3.2 A Collider Warning
Now consider the DAG with edges \(T \to C\), \(U \to C\), and \(U \to Y\). Here \(C\) is a collider on the path \(T \to C \leftarrow U \to Y\). Without conditioning on \(C\), the path is blocked at the collider. Conditioning on \(C\) opens the path, creating a spurious association between \(T\) and \(Y\) through \(U\).
Even more subtly, conditioning on a descendant of \(C\) can also open the path: if additionally \(C \to D\), then conditioning on \(D\) may also induce association between \(T\) and \(Y\) through the collider at \(C\).
A.3.3 A Practical Checklist
To decide whether \(X\) and \(Y\) are \(d\)-separated by \(S\), proceed as follows. First, list all paths between \(X\) and \(Y\). On each path, classify each interior node as part of a chain, fork, or collider. Then check whether each path is blocked by \(S\). Finally, conclude that \(X\) and \(Y\) are \(d\)-separated if and only if every path is blocked.
A.4 DAG Factorization and the Markov Property
Up to this point, we have used DAGs qualitatively, to decide which paths are open or blocked. We now connect the graph to probability algebra.
This factorization implies the local Markov property: once its parents are known, a node is conditionally independent of all variables that are neither its descendants nor its parents.
Optional: Moralization as an Alternative Criterion
There is an alternative graph-theoretic way to check \(d\)-separation based on constructing an undirected graph called the moral graph. To check whether \(X \indep Y \mid S\): First, take the induced subgraph on \(\mathrm{An}(X \cup Y \cup S)\), the ancestors of all variables under consideration. Second, connect any two parents of a common child by an undirected edge. Third, drop all arrow directions. Finally, check whether \(S\) separates \(X\) and \(Y\) in the resulting undirected graph. This procedure yields a criterion equivalent to \(d\)-separation.
Summary
This appendix introduced the graphical ideas that support Chapters 2 and 3. Conditional independence is the probabilistic language that graphs are designed to encode. The three local motifs — chain, fork, and collider — determine how conditioning affects association along a path. \(d\)-Separation extends these local rules to arbitrary graphs: two variables are \(d\)-separated by \(S\) if every path between them is blocked by \(S\). The most important application in causal inference is to distinguish confounding paths from causal paths and to identify valid adjustment sets. Finally, the Markov property gives a probabilistic interpretation to the graph by linking graphical structure to a factorization of the joint distribution.