2 DAGs and d-Separation
Readers who want a gentler introduction to conditional independence, the three basic path motifs, and the Markov-property interpretation may consult Appendix A before or alongside this chapter.
2.1 Directed Acyclic Graphs
DAGs allow us to translate qualitative causal assumptions into quantitative statistical restrictions. Once we draw a graph encoding our causal assumptions, d-separation tells us which conditional independences are implied by the graph and helps identify candidate adjustment sets for removing confounding.
2.1.1 Basic Definition
The acyclicity condition guarantees a topological ordering so that the joint density admits the recursive factorization \(p(v_1,\dots,v_k) = \prod_{i=1}^{k} p(v_i \mid \Pa(v_i))\).
2.1.2 Structural Relationships
2.2 Paths, Blocking, and d-Separation
A DAG encodes direct causal relationships through its edges, but variables can also be statistically related through longer routes. The concept of a path formalizes these routes, and d-separation answers whether a path transmits or blocks statistical dependence.
2.2.1 Path Structures
Every intermediate node \(V_m\) on a path plays one of three structural roles. In a chain (\(V_{m-1} \to V_m \to V_{m+1}\)), \(V_m\) is a causal intermediary. In a fork (\(V_{m-1} \leftarrow V_m \to V_{m+1}\)), \(V_m\) is a common cause. In a collider (\(V_{m-1} \to V_m \leftarrow V_{m+1}\)), both arrows point into \(V_m\); this asymmetric case has the opposite behavior under conditioning from the other two.
2.2.2 Blocking Rules
| Structure | Pattern | Effect of conditioning | Key intuition |
|---|---|---|---|
| Chain | \(A \to M \to B\) | Blocks the path | Causal transmission |
| Fork | \(A \leftarrow M \to B\) | Blocks the path | Common cause |
| Collider | \(A \to M \leftarrow B\) | Opens the path | Common effect |
The critical asymmetry: colliders are closed by default and opened by conditioning, while chains and forks are open by default and closed by conditioning.
Chain: Smoking \(\to\) Tar \(\to\) Cancer. Marginally, smokers have elevated cancer rates (path is open). Conditioning on tar level blocks the path: once tar is fixed, the causal channel is fully accounted for.
Fork: Poverty \(\to\) Poor Diet; Poverty \(\to\) Lack of Exercise. Unconditionally, diet quality and exercise are correlated through poverty. Conditioning on income level removes the spurious association.
Collider: Accident \(\to\) Hospitalization \(\leftarrow\) Cancer. In the general population, \(A \indep B\). Among hospitalized patients — conditioning on \(M\) — the two become negatively associated. This is Berkson’s bias (Berkson 1946).
2.2.3 The d-Separation Criterion
2.2.4 Practical Ways to Check d-Separation
1. Bayes-Ball Intuition. Imagine releasing a ball from \(X\) asking whether it can reach \(Y\), with nodes in \(\mathbf{S}\) shaded. The ball: passes through chain/fork nodes not in \(\mathbf{S}\); stops at chain/fork nodes in \(\mathbf{S}\); stops at colliders not in \(\mathbf{S}\) (with no descendant in \(\mathbf{S}\)); passes through colliders in \(\mathbf{S}\) (or with a descendant in \(\mathbf{S}\)). The algorithm is formalized in Shachter (1998).
2. Moral Graph Transformation. (1) Restrict to the ancestral set \(\An(X \cup Y \cup \mathbf{S})\). (2) Moralize: for every collider \(A \to C \leftarrow B\), add an undirected edge \(A - B\). (3) Remove all edge directions. (4) Delete all nodes in \(\mathbf{S}\). If \(X\) and \(Y\) are disconnected, then \((X \indep Y \mid \mathbf{S})_{\Gcal}\). See Lauritzen (1996, Ch. 3) for a full treatment.
2.3 Collider Bias and the IV DAG
2.3.1 Berkson’s Bias
2.3.2 Full d-Separation Analysis of the IV DAG
We work through the IV DAG with edges \(Z \to T\), \(T \to Y\), \(U \to T\), \(U \to Y\), with \(U\) unobserved. Because \(U\) cannot be conditioned on, the back-door path \(T \leftarrow U \to Y\) cannot be blocked by adjustment, making the instrument \(Z\) the only route to identification.
Classifying paths by type:
| Endpoints | Path | Type | Why |
|---|---|---|---|
| \(Z \leftrightarrow Y\) | \(Z \to T \to Y\) | Causal | Directed; carries the IV signal |
| \(Z \leftrightarrow Y\) | \(Z \to T \leftarrow U \to Y\) | Associational | Collider at \(T\); activates back-door via \(U\) |
| \(Z \leftrightarrow U\) | \(Z \to T \leftarrow U\) | Associational | Collider at \(T\); dormant unless \(T\) conditioned on |
Question 1. Is \(Z \indep Y\)? No. The directed path \(Z \to T \to Y\) is open (chain, not conditioned on \(T\)). The path \(Z \to T \leftarrow U \to Y\) has a collider at \(T\) with \(T \notin \varnothing\) — blocked.
Question 2. Is \(Z \indep Y \mid T\)? No. The causal path \(Z \to T \to Y\) is blocked (conditioning on \(T\) in the chain). But the collider at \(T\) in \(Z \to T \leftarrow U \to Y\) is now opened. Hence \(Z \nindep Y \mid T\).
Question 3. Is \(Z \indep Y \mid \{T, U\}\)? Yes. The causal path is blocked by conditioning on \(T\). The path through \(U\) has the collider at \(T\) opened (by conditioning on \(T\)), but then blocked at \(U\) (by conditioning on \(U\)). Both paths blocked.
Question 4. Is \(Z \indep U\)? Yes. The only path is \(Z \to T \leftarrow U\) with a collider at \(T\). Since \(T \notin \varnothing\) and no descendant of \(T\) is in \(\varnothing\), the path is blocked.
Interpretation. Question 1 establishes relevance: \(Z\) has an open causal path to \(Y\) through \(T\). Question 4 establishes exogeneity: the instrument is independent of unmeasured confounding. Question 3 establishes the exclusion restriction: once \(T\) and \(U\) are held fixed, \(Z\) carries no residual information about \(Y\). Note that this requires observing \(U\), which is unavailable by assumption; the IV strategy exploits the exclusion restriction indirectly through the moment condition \(\E[\varepsilon \cdot Z] = 0\) (Chapter 7). Question 2 is the warning: conditioning on \(T\) alone simultaneously closes the causal channel and opens the confounded channel — worse than no adjustment at all.
The IV DAG encodes the substantive assumptions behind instrument validity graphically, but does not make them testable in any strong sense.
2.4 The Markov Property and Factorization
Up to this point, we have used DAGs qualitatively. We now connect the graph to probability algebra: the same parent structure that governs d-separation also determines how the joint distribution factorizes.
Without structural assumptions, any joint distribution can always be written by repeated conditioning, but that generic representation is too high-dimensional to reveal much structure. A DAG becomes statistically meaningful because, together with the Markov property, it replaces the generic factorization by a sparse one involving only the parents of each node.
An immediate consequence is the Markov factorization: \[p(v_1, \dots, v_k) = \prod_{i=1}^{k} p\!\left(v_i \mid \Pa(v_i)\right). \tag{2.1}\]
2.5 Worked Example: The Education–Earnings DAG
This section is the template for how we will use DAGs throughout the course: specify the DAG, read off the factorization, use d-separation to identify the implied conditional independences, and interpret in causal terms.
Practice. Before working through the example below, return to the eight-node DAG example and rework each of the five queries from scratch, following three steps: (1) list every path; (2) classify every intermediate node as chain, fork, or collider; (3) determine which paths are blocked or open.
The causal story. Education (\(E\)) affects earnings (\(Y\)); family background (\(B\)) is a common cause of both; neighborhood (\(N\)) affects education but has no direct effect on earnings. All four variables are observed (no latent confounders).
Step 1 — Markov factorization. \(\Pa(N) = \Pa(B) = \varnothing\), \(\Pa(E) = \{N, B\}\), \(\Pa(Y) = \{E, B\}\): \[p(n, b, e, y) = p(n)\,p(b)\,p(e \mid n, b)\,p(y \mid e, b).\]
Step 2 — d-Separation. There are two paths between \(N\) and \(Y\):
- Path 1: \(N \to E \to Y\) (a chain through \(E\)).
- Path 2: \(N \to E \leftarrow B \to Y\) (a collider at \(E\), followed by the fork leg \(B \to Y\)).
(i) Is \((N \indep Y)_{\Gcal}\)? Path 1 is a chain with nothing conditioned on, so it is open. Path 2 has a collider at \(E\) (not conditioned on), so it is blocked. One open path: \(N \nindep Y\).
(ii) Is \((N \indep Y \mid E)_{\Gcal}\)? Path 1 is blocked (conditioning on \(E\) in the chain). Path 2 has a collider at \(E\) opened by conditioning, and the activated path continues through \(B\) (not conditioned on), so it is open. Hence \(N \nindep Y \mid E\) — a collider trap: conditioning on \(E\) introduces bias, not removes it.
(iii) Is \((N \indep Y \mid \{E, B\})_{\Gcal}\)? Path 1 is blocked by \(E\). Path 2 is opened at collider \(E\) but then blocked at \(B\). All paths blocked: \((N \indep Y \mid E, B)_{\Gcal}\).
(iv) Is \((N \indep B)_{\Gcal}\)? The only path \(N \to E \leftarrow B\) has a collider at \(E\) (not conditioned on). Path is blocked: \(N \indep B\).
Step 3 — Conditional Independence. By the Soundness theorem: \[N \indep B, \qquad N \indep Y \mid E, B, \qquad N \nindep Y, \qquad N \nindep Y \mid E.\] The statement \(N \nindep Y \mid E\) is an important warning: conditioning only on education when studying the neighborhood–earnings association introduces bias through the activated collider path.
Step 4 — Identification. We wish to identify \(P(y \mid \doop(E{=}e))\). The only back-door path is \(E \leftarrow B \to Y\) (a fork at \(B\)).
- \(\mathbf{S} = \{B\}\): blocks the fork \(E \leftarrow B \to Y\), and \(B\) is not a descendant of \(E\). Valid.
- \(\mathbf{S} = \{N\}\): does not block \(E \leftarrow B \to Y\). Invalid.
With \(\mathbf{S} = \{B\}\), the back-door formula (Chapter 3) gives: \[P(y \mid \doop(E{=}e)) = \sum_{b} P(y \mid e, b)\,P(b),\] identifying the causal effect entirely from observational data.
2.6 The Big Picture
The central logic runs from a qualitative causal graph to an identification formula:
\[\text{structural assumptions} \Longrightarrow \text{DAG} \Longrightarrow \text{d-separation relations} \Longrightarrow \text{conditional independences (Markov)} \Longrightarrow \text{identification formulas}.\]
Chapter 2 establishes the first four links in this chain. Chapters 3 and beyond use the same graphical machinery to derive specific identification results: back-door adjustment, front-door adjustment, and do-calculus formulas.
2.7 Summary
DAGs as causal structure. A DAG encodes which variables are causally connected, which are potential confounders, and which variables may block or open paths.
Three-node motifs. Every path is built from chains, forks, and colliders. Chains and forks are open by default and are blocked by conditioning on the middle node. Colliders are blocked by default and are opened by conditioning on the collider or on one of its descendants.
d-Separation and conditional independence. \(X\) and \(Y\) are d-separated by \(\mathbf{S}\) exactly when every path between them is blocked by \(\mathbf{S}\). Under the Markov property, d-separation implies the corresponding conditional independence. Some implications may be assessed empirically, but assumptions involving unobserved variables are generally not testable.
Markov factorization. The local Markov property yields \(p(v_1,\dots,v_k) = \prod_{i=1}^{k} p(v_i \mid \Pa(v_i))\), the bridge from graphical structure to probability calculus.
Collider bias. Conditioning on a collider — or on a descendant of a collider — can induce a spurious association between otherwise independent variables (Berkson 1946). This is the opposite of confounding adjustment.
A practical workflow. To analyze a DAG: identify the graph structure, determine which paths are open or blocked, translate d-separation statements into conditional independences under the Markov property, and interpret in light of the causal question.
2.8 Problems
1. Warm-up: a single collider. Consider the DAG \(X \to Y \leftarrow Z\).
- Identify the structural role of \(Y\) on the path \(X \to Y \leftarrow Z\).
- Is \(X \indep Z\)? Apply the d-separation criterion and state which blocking rule applies.
- Is \(X \indep Z \mid Y\)? Explain what happens to the path when \(Y\) is conditioned on, and describe in one sentence the real-world phenomenon this illustrates.
2. d-Separation practice. Consider the DAG: \(A \to B \to D\), \(A \to C \to D\), \(B \to E\), \(C \to E\).
- List all paths between \(A\) and \(E\). (Hint: there are four paths; two pass through \(D\).)
- For each path, identify the role (chain, fork, collider) of each intermediate node.
- Does \(\{B, C\}\) d-separate \(A\) and \(E\)?
- Does \(\{D\}\) d-separate \(B\) and \(C\)? What type of node is \(D\) on the path \(B \to D \leftarrow C\)?
3. Berkson’s bias. Suppose \(X\) and \(Y\) are independent standard normal variables, and let \(S = \mathbf{1}[X + Y > 0]\).
- Verify analytically that \(\mathrm{Cov}(X, Y \mid S{=}1) < 0\).
- Draw the DAG for \((X, Y, S)\) and identify \(S\) as a collider.
- Explain in one sentence why restricting the analysis to the subsample with \(S=1\) biases estimates of any association between \(X\) and \(Y\).
4. Markov factorization and collider activation. Consider the DAG with edges \(A \to E\), \(A \to W\), \(F \to E\), \(E \to W\), where \(A\) = ability, \(F\) = family income, \(E\) = education, \(W\) = wages.
- Write down the Markov factorization \(p(a, f, e, w)\).
- Is \((F \indep W)_{\Gcal}\)? List all paths between \(F\) and \(W\) and determine which are open.
- Is \((F \indep W \mid E)_{\Gcal}\)? Identify the role of \(E\) on each path.
- A researcher regresses \(W\) on \(E\) and \(F\), omitting \(A\). Is the coefficient on \(E\) a causal effect? Explain using the graph.
5. Soundness theorem for the collider. Consider the collider \(A \to M \leftarrow B\) with factorization \(p(a, m, b) = p(a)\,p(b)\,p(m \mid a, b)\).
- By marginalizing over \(M\), show that \(A \indep B\) in the joint distribution. This verifies the Soundness theorem for \(\mathbf{S} = \varnothing\).
- Show that conditioning on \(M\) breaks this independence: write out \(p(a, b \mid m)\) and explain why it does not factorize into \(p(a \mid m)\,p(b \mid m)\) in general.
- Explain in one sentence why parts (a) and (b) together are consistent with the Soundness theorem. (Hint: the theorem is a one-directional statement.)
6. Terminology check. Consider the DAG with edges \(U \to X\), \(U \to Z\), \(X \to W\), \(Z \to W\), \(W \to Y\).
- Identify the parents, children, ancestors, descendants, and non-descendants of node \(W\).
- List all pairs of adjacent nodes.
- Which pairs of nodes are connected by a directed path? List every such pair and the corresponding path.
- Write the Markov factorization \(p(u, x, z, w, y)\).
7. Markov factorization and local Markov property. Consider the DAG with edges \(X_1 \to X_2\), \(X_1 \to X_3\), \(X_2 \to X_4\), \(X_3 \to X_4\).
- Write the joint density \(p(x_1, x_2, x_3, x_4)\) implied by the Markov factorization.
- State the local Markov property for each of the four nodes.
- Is \((X_2 \indep X_3)_{\Gcal}\)? Is \((X_2 \indep X_3 \mid X_1)_{\Gcal}\)? Justify each answer by listing all paths.
8. Toy proof: conditional independence in the fork. Consider the fork \(X_1 \leftarrow X_2 \to X_3\) with factorization \(p(x_1, x_2, x_3) = p(x_2)\,p(x_1 \mid x_2)\,p(x_3 \mid x_2)\).
- Show directly, by conditioning on \(X_2 = x_2\), that \(X_1 \indep X_3 \mid X_2\).
- Is \(X_1 \indep X_3\) marginally? Justify both graphically and algebraically.
- Explain in one sentence what the fork represents substantively and why conditioning on the common cause removes the association.
9. d-Separation in the practice DAG. Refer to the eight-node DAG in the Practice DAG example (nodes \(W\), \(X_1\), \(T\), \(M_1\), \(M_2\), \(Y\), \(C\), \(D\); edges \(W \to X_1\), \(W \to T\), \(W \to Y\), \(T \to M_1\), \(M_1 \to Y\), \(M_1 \to M_2\), \(X_1 \to C\), \(M_2 \to C\), \(C \to D\)). For each query, state whether it is true or false and justify by listing all relevant paths.
- \(X_1 \indep Y\)
- \(X_1 \indep Y \mid W\)
- \(T \indep M_2 \mid M_1\)
- Does conditioning on \(C\) induce collider bias between \(X_1\) and \(M_2\)? Identify the specific path opened.
- Does conditioning on \(D\) (a descendant of \(C\)) open the path \(X_1 \to C \leftarrow M_2\)? State which clause of the d-blocking definition applies.