2  DAGs and d-Separation

NoteLearning Objectives

By the end of this chapter, students should be able to:

  1. Construct a DAG from a verbal description of a causal model and identify parents, descendants, ancestors, and non-descendants of any node.
  2. Classify every intermediate node on a path as a chain, fork, or collider, and apply the blocking rule for each.
  3. Apply the d-separation criterion to determine all conditional independence relationships implied by a DAG.
  4. Recognize and avoid collider bias, including the case where a descendant of a collider is conditioned upon.
  5. State the Markov factorization and use it to write the joint density of a DAG in terms of local conditional densities.
  6. Interpret d-separation results as conditional independence assumptions on observed variables, and recognize their role as the graphical expression of causal assumptions.

Readers who want a gentler introduction to conditional independence, the three basic path motifs, and the Markov-property interpretation may consult Appendix A before or alongside this chapter.

2.1 Directed Acyclic Graphs

DAGs allow us to translate qualitative causal assumptions into quantitative statistical restrictions. Once we draw a graph encoding our causal assumptions, d-separation tells us which conditional independences are implied by the graph and helps identify candidate adjustment sets for removing confounding.

NoteRemark

In this chapter, the word graph does not mean a plot of data or a graph of a function. It means a collection of nodes and edges used to represent relationships among variables. Our objective is to understand how such graphs encode statistical structure — especially conditional independence, confounding, and path blocking.

2.1.1 Basic Definition

NoteDefinition: Directed Acyclic Graph

A directed acyclic graph (DAG) \(\Gcal = (\mathcal{V}, E)\) consists of a finite set of nodes \(\mathcal{V}\) and a set of directed edges \(E \subseteq \mathcal{V} \times \mathcal{V}\) such that there is no directed cycle. Each node represents a variable and each directed edge \(A \to B\) encodes that \(A\) is a direct cause of \(B\) relative to the variables included in the graph.

The acyclicity condition guarantees a topological ordering so that the joint density admits the recursive factorization \(p(v_1,\dots,v_k) = \prod_{i=1}^{k} p(v_i \mid \Pa(v_i))\).

2.1.2 Structural Relationships

NoteDefinition: Structural Relationships in a DAG

For a node \(v \in \mathcal{V}\) in \(\Gcal\):

  • \(\Pa(v) = \{w \in \mathcal{V} : w \to v \in E\}\) are the parents of \(v\).
  • \(\De(v)\) = all nodes reachable from \(v\) by directed paths are the descendants of \(v\).
  • \(\An(v)\) = all nodes with a directed path to \(v\) are the ancestors of \(v\).
  • \(\Nd(v) = \mathcal{V} \setminus (\De(v) \cup \{v\})\) are the non-descendants of \(v\).

Note that \(\Nd(v)\) includes the parents of \(v\).

NoteQuick Terminology Summary

If \(A \to B\), then \(A\) is a parent of \(B\) and \(B\) is a child of \(A\). Two nodes are adjacent if connected by an edge. A path is any sequence of connected nodes regardless of arrow direction; a directed path has arrows all pointing in the same direction. A cycle is a directed path that returns to its starting node.

NoteExample: Education, Earnings, and Family Background

Education (\(E\)) affects earnings (\(Y\)); family background (\(B\)) is a common cause of both; neighborhood (\(N\)) affects education but has no direct effect on earnings. The Markov factorization is \(p(n,b,e,y) = p(n)\,p(b)\,p(e\mid n,b)\,p(y\mid e,b)\). Reading off: \(\Pa(E) = \{N,B\}\), \(\Pa(Y) = \{E,B\}\), \(\Pa(N)=\Pa(B)=\varnothing\).

NoteExample: The IV DAG

The canonical IV model involves \(Z\) (instrument), \(T\) (treatment), \(Y\) (outcome), \(U\) (unobserved confounder), with \(\Pa(T) = \{Z,U\}\), \(\Pa(Y) = \{T,U\}\), \(\Pa(Z) = \Pa(U) = \varnothing\). The descendants of \(Z\) are \(\{T, Y\}\); \(U\) and \(Z\) are non-descendants of each other.

NoteExample: The Mediation DAG

Treatment \(T\) affects outcome \(Y\) both directly and through mediator \(M\). An unobserved confounder \(U\) creates a back-door path \(T \leftarrow U \rightarrow Y\). \(\Pa(T) = \{U\}\), \(\Pa(M) = \{T\}\), \(\Pa(Y) = \{M, T, U\}\). The total effect of \(T\) on \(Y\) travels along two paths: \(T \to Y\) (direct) and \(T \to M \to Y\) (indirect). Note: this is a standard mediation DAG, not a front-door DAG. For the front-door criterion, every directed path from \(T\) to \(Y\) must pass through \(M\).

2.2 Paths, Blocking, and d-Separation

A DAG encodes direct causal relationships through its edges, but variables can also be statistically related through longer routes. The concept of a path formalizes these routes, and d-separation answers whether a path transmits or blocks statistical dependence.

2.2.1 Path Structures

NoteDefinition: Path

A path between nodes \(X\) and \(Y\) in \(\Gcal\) is any sequence of distinct nodes \(X = V_0, V_1, \dots, V_k = Y\) such that each consecutive pair is connected by an edge in either direction.

Every intermediate node \(V_m\) on a path plays one of three structural roles. In a chain (\(V_{m-1} \to V_m \to V_{m+1}\)), \(V_m\) is a causal intermediary. In a fork (\(V_{m-1} \leftarrow V_m \to V_{m+1}\)), \(V_m\) is a common cause. In a collider (\(V_{m-1} \to V_m \leftarrow V_{m+1}\)), both arrows point into \(V_m\); this asymmetric case has the opposite behavior under conditioning from the other two.

2.2.2 Blocking Rules

Structure Pattern Effect of conditioning Key intuition
Chain \(A \to M \to B\) Blocks the path Causal transmission
Fork \(A \leftarrow M \to B\) Blocks the path Common cause
Collider \(A \to M \leftarrow B\) Opens the path Common effect

The critical asymmetry: colliders are closed by default and opened by conditioning, while chains and forks are open by default and closed by conditioning.

NoteThree-Node Motifs at a Glance

In a chain \(A \to M \to B\): conditioning on \(M\) blocks the path. In a fork \(A \leftarrow M \to B\): conditioning on \(M\) removes the spurious association. In a collider \(A \to M \leftarrow B\): the path is blocked by default, but conditioning on \(M\) — or any descendant of \(M\) — opens it. These three motifs are the local building blocks of d-separation.

Chain: Smoking \(\to\) Tar \(\to\) Cancer. Marginally, smokers have elevated cancer rates (path is open). Conditioning on tar level blocks the path: once tar is fixed, the causal channel is fully accounted for.

Fork: Poverty \(\to\) Poor Diet; Poverty \(\to\) Lack of Exercise. Unconditionally, diet quality and exercise are correlated through poverty. Conditioning on income level removes the spurious association.

Collider: Accident \(\to\) Hospitalization \(\leftarrow\) Cancer. In the general population, \(A \indep B\). Among hospitalized patients — conditioning on \(M\) — the two become negatively associated. This is Berkson’s bias (Berkson 1946).

2.2.3 The d-Separation Criterion

NoteDefinition: d-Blocking and d-Separation

A path \(\pi\) is d-blocked by a set \(\mathbf{S}\) if:

  1. \(\pi\) contains a chain \(A \to M \to B\) or fork \(A \leftarrow M \to B\) with \(M \in \mathbf{S}\); or
  2. \(\pi\) contains a collider \(A \to C \leftarrow B\) with \(C \notin \mathbf{S}\) and no descendant of \(C\) is in \(\mathbf{S}\).

Nodes \(X\) and \(Y\) are d-separated by \(\mathbf{S}\), written \((X \indep Y \mid \mathbf{S})_{\Gcal}\), if every path between \(X\) and \(Y\) in \(\Gcal\) is d-blocked by \(\mathbf{S}\).

NoteExample: Applying the Criterion to the Three Toy Settings

(i) Chain. \(A \to M \to B\): \(\mathbf{S} = \varnothing\) gives \(A \nindep B\); \(\mathbf{S} = \{M\}\) gives \(A \indep B \mid M\).

(ii) Fork. \(A \leftarrow M \to B\): \(\mathbf{S} = \varnothing\) gives \(A \nindep B\); \(\mathbf{S} = \{M\}\) gives \(A \indep B \mid M\).

(iii) Collider. \(A \to M \leftarrow B\): \(\mathbf{S} = \varnothing\) gives \(A \indep B\) (collider blocks the path); \(\mathbf{S} = \{M\}\) gives \(A \nindep B \mid M\) (conditioning opens the path — Berkson’s bias).

NoteExample: A Direct Probability Proof for the Chain

Consider the chain \(X_1 \to X_2 \to X_3\) with factorization \(p(x_1, x_2, x_3) = p(x_1)\,p(x_2 \mid x_1)\,p(x_3 \mid x_2)\). Conditioning on \(X_2\): \[p(x_1, x_3 \mid x_2) = \frac{p(x_1)\,p(x_2 \mid x_1)\,p(x_3 \mid x_2)}{p(x_2)} = p(x_1 \mid x_2)\,p(x_3 \mid x_2).\] Hence \(X_1 \indep X_3 \mid X_2\). The analogous fork case \(X_1 \leftarrow X_2 \to X_3\) is left as an exercise.

TipTheorem: Soundness of d-Separation (Pearl 2009, Theorem 1.2.4)

If \((X \indep Y \mid \mathbf{S})_{\Gcal}\), then \(X \indep Y \mid \mathbf{S}\) in every distribution that is Markov with respect to \(\Gcal\).

NoteRemark: Soundness, Not Converse

The theorem states that d-separation implies conditional independence for every distribution Markov with respect to \(\Gcal\). The converse need not hold without additional assumptions such as faithfulness: a conditional independence may occur because of special parameter values even when the corresponding nodes are not d-separated in the graph.

2.2.4 Practical Ways to Check d-Separation

1. Bayes-Ball Intuition. Imagine releasing a ball from \(X\) asking whether it can reach \(Y\), with nodes in \(\mathbf{S}\) shaded. The ball: passes through chain/fork nodes not in \(\mathbf{S}\); stops at chain/fork nodes in \(\mathbf{S}\); stops at colliders not in \(\mathbf{S}\) (with no descendant in \(\mathbf{S}\)); passes through colliders in \(\mathbf{S}\) (or with a descendant in \(\mathbf{S}\)). The algorithm is formalized in Shachter (1998).

2. Moral Graph Transformation. (1) Restrict to the ancestral set \(\An(X \cup Y \cup \mathbf{S})\). (2) Moralize: for every collider \(A \to C \leftarrow B\), add an undirected edge \(A - B\). (3) Remove all edge directions. (4) Delete all nodes in \(\mathbf{S}\). If \(X\) and \(Y\) are disconnected, then \((X \indep Y \mid \mathbf{S})_{\Gcal}\). See Lauritzen (1996, Ch. 3) for a full treatment.

NoteExample: Checking d-Separation in a Four-Node DAG

Consider the DAG with edges \(Z \to T\), \(U \to T\), \(T \to Y\), \(U \to Y\).

Query 1. Is \((Z \indep U)_{\Gcal}\)?

Bayes-ball: The only path is \(Z \to T \leftarrow U\). Node \(T\) is a collider with \(T \notin \varnothing\), so the ball stops. Hence \((Z \indep U)_{\Gcal}\).

Moral graph: Ancestral set of \(\{Z, U\}\) is \(\{Z, U\}\) (neither has parents); \(T\) and \(Y\) are discarded. No edges, no colliders. \(Z\) and \(U\) are disconnected \(\Rightarrow (Z \indep U)_{\Gcal}\).

Query 2. Is \((Z \indep U \mid T)_{\Gcal}\)?

Bayes-ball: Same path \(Z \to T \leftarrow U\). Now \(T \in \{T\}\): the collider is observed, so the ball passes through. Hence \(Z \nindep U \mid T\).

Moral graph: Ancestral set of \(\{Z, U, T\}\) includes \(T\). Moralize: add edge \(Z - U\) for the collider \(Z \to T \leftarrow U\). After removing \(T\), the remaining graph has edge \(Z - U\). Connected \(\Rightarrow Z \nindep U \mid T\).

Comparing: \(Z\) and \(U\) are marginally independent (instrument is exogenous) but become dependent once we condition on \(T\) — Berkson’s bias.

NoteExample: Practice DAG — Eight-Node Graph

Consider the DAG with nodes \(W\), \(X_1\), \(T\), \(M_1\), \(M_2\), \(Y\), \(C\), \(D\) and edges \(W \to X_1\), \(W \to T\), \(W \to Y\), \(T \to M_1\), \(M_1 \to Y\), \(M_1 \to M_2\), \(X_1 \to C\), \(M_2 \to C\), \(C \to D\).

(1) Is \(X_1 \indep Y\)? No. Two open paths: \(X_1 \leftarrow W \to Y\) (fork at \(W\), unblocked) and \(X_1 \leftarrow W \to T \to M_1 \to Y\) (also open). The path \(X_1 \to C \leftarrow M_2\) ends at a collider \(C\) with \(C \notin \varnothing\), so it is blocked.

(2) Is \(X_1 \indep Y \mid W\)? Yes. Both open paths above pass through the fork \(W\); conditioning on \(W\) blocks them. The remaining path \(X_1 \to C \leftarrow M_2 \leftarrow \cdots\) has collider \(C \notin \{W\}\) — blocked. All paths blocked.

(3) Is \(T \indep M_2 \mid M_1\)? Yes. The direct path \(T \to M_1 \to M_2\) is a chain with \(M_1 \in \{M_1\}\) — blocked. Other paths through \(W\) or \(Y\) also contain blocked colliders. All paths blocked.

(4) Does conditioning on \(C\) create collider bias? Yes. \(C\) is a collider on \(X_1 \to C \leftarrow M_2\). Marginally \(X_1 \indep M_2\) (blocked by the collider). Conditioning on \(C\) opens the path, inducing spurious association.

(5) Does conditioning on \(D\) (a descendant of \(C\)) open the path \(X_1 \to C \leftarrow M_2\)? Yes. By clause (2) of the d-blocking definition, a collider path is unblocked whenever the collider or any of its descendants is in the conditioning set.

2.3 Collider Bias and the IV DAG

2.3.1 Berkson’s Bias

NoteDefinition: Collider Bias

Collider bias is the spurious association induced between two variables when one conditions on their common effect (a collider), or on a descendant of that collider. In a path \(A \to C \leftarrow B\), conditioning on \(C\) or any descendant of \(C\) can make \(A\) and \(B\) statistically dependent even if they are marginally independent.

WarningCollider Bias vs. Confounding

Unlike confounding, which is removed by conditioning on a common cause, collider bias is created by conditioning on a common effect. Adding a collider or a descendant of a collider to the adjustment set introduces bias rather than reducing it.

NoteExample: Collider Bias through Selection — Talent and Wealth

Suppose both talent (\(A\)) and wealth (\(B\)) increase the probability of admission to an elite school, so admission (\(S\)) is a collider: \(A \to S \leftarrow B\). In the general population, \(A \indep B\) (the collider blocks the path). Among admitted students — conditioning on \(S\) — lower talent makes higher wealth more likely, and vice versa. This is collider bias: restricting to admitted students is precisely conditioning on the common effect.

2.3.2 Full d-Separation Analysis of the IV DAG

We work through the IV DAG with edges \(Z \to T\), \(T \to Y\), \(U \to T\), \(U \to Y\), with \(U\) unobserved. Because \(U\) cannot be conditioned on, the back-door path \(T \leftarrow U \to Y\) cannot be blocked by adjustment, making the instrument \(Z\) the only route to identification.

Classifying paths by type:

Endpoints Path Type Why
\(Z \leftrightarrow Y\) \(Z \to T \to Y\) Causal Directed; carries the IV signal
\(Z \leftrightarrow Y\) \(Z \to T \leftarrow U \to Y\) Associational Collider at \(T\); activates back-door via \(U\)
\(Z \leftrightarrow U\) \(Z \to T \leftarrow U\) Associational Collider at \(T\); dormant unless \(T\) conditioned on

Question 1. Is \(Z \indep Y\)? No. The directed path \(Z \to T \to Y\) is open (chain, not conditioned on \(T\)). The path \(Z \to T \leftarrow U \to Y\) has a collider at \(T\) with \(T \notin \varnothing\) — blocked.

Question 2. Is \(Z \indep Y \mid T\)? No. The causal path \(Z \to T \to Y\) is blocked (conditioning on \(T\) in the chain). But the collider at \(T\) in \(Z \to T \leftarrow U \to Y\) is now opened. Hence \(Z \nindep Y \mid T\).

Question 3. Is \(Z \indep Y \mid \{T, U\}\)? Yes. The causal path is blocked by conditioning on \(T\). The path through \(U\) has the collider at \(T\) opened (by conditioning on \(T\)), but then blocked at \(U\) (by conditioning on \(U\)). Both paths blocked.

Question 4. Is \(Z \indep U\)? Yes. The only path is \(Z \to T \leftarrow U\) with a collider at \(T\). Since \(T \notin \varnothing\) and no descendant of \(T\) is in \(\varnothing\), the path is blocked.

Interpretation. Question 1 establishes relevance: \(Z\) has an open causal path to \(Y\) through \(T\). Question 4 establishes exogeneity: the instrument is independent of unmeasured confounding. Question 3 establishes the exclusion restriction: once \(T\) and \(U\) are held fixed, \(Z\) carries no residual information about \(Y\). Note that this requires observing \(U\), which is unavailable by assumption; the IV strategy exploits the exclusion restriction indirectly through the moment condition \(\E[\varepsilon \cdot Z] = 0\) (Chapter 7). Question 2 is the warning: conditioning on \(T\) alone simultaneously closes the causal channel and opens the confounded channel — worse than no adjustment at all.

The IV DAG encodes the substantive assumptions behind instrument validity graphically, but does not make them testable in any strong sense.

2.4 The Markov Property and Factorization

Up to this point, we have used DAGs qualitatively. We now connect the graph to probability algebra: the same parent structure that governs d-separation also determines how the joint distribution factorizes.

Without structural assumptions, any joint distribution can always be written by repeated conditioning, but that generic representation is too high-dimensional to reveal much structure. A DAG becomes statistically meaningful because, together with the Markov property, it replaces the generic factorization by a sparse one involving only the parents of each node.

TipProposition: Conditional Independence in the Three-Node Chain

Let \(X_1 \to X_2 \to X_3\) be a chain with Markov factorization \(p(x_1, x_2, x_3) = p(x_1)\,p(x_2 \mid x_1)\,p(x_3 \mid x_2)\). Then \(X_1 \indep X_3 \mid X_2\). (Proved in the chain example above; the fork case is analogous.)

NoteDefinition: Local Markov Property

A distribution \(P\) satisfies the local Markov property with respect to \(\Gcal\) if, for every node \(V_i \in \mathcal{V}\): \[V_i \indep \bigl[\, \Nd(V_i) \setminus \Pa(V_i) \,\bigr] \;\Big|\; \Pa(V_i).\] Once we condition on the direct causes of a node, that node is independent of all variables that are neither its descendants nor its parents.

NoteRemark

The global Markov property is the statement that every d-separation in \(\Gcal\) implies a conditional independence in \(P\) — precisely the content of the Soundness theorem. For DAGs, the local and global Markov properties are equivalent (Lauritzen 1996, Proposition 3.27): the local property implies the global one via the Markov factorization.

An immediate consequence is the Markov factorization: \[p(v_1, \dots, v_k) = \prod_{i=1}^{k} p\!\left(v_i \mid \Pa(v_i)\right). \tag{2.1}\]

NoteExample: Markov Factorization in the Education DAG

For the DAG \(N \to E\), \(B \to E\), \(B \to Y\), \(E \to Y\): \(p(n, b, e, y) = p(n)\,p(b)\,p(e \mid n, b)\,p(y \mid e, b)\). A regression of \(Y\) on \(E\) alone is confounded by \(B\). Conditioning on \(B\) blocks the back-door path \(E \leftarrow B \to Y\) and identifies \(P(y \mid \doop(E{=}e))\) via the back-door formula (Chapter 3).

NoteExample: Markov Factorization in the IV DAG

Including the latent \(U\): \(p(z, t, y, u) = p(z)\,p(u)\,p(t \mid z, u)\,p(y \mid t, u)\). The observable factorization is obtained by marginalizing over \(U\): \[p(z, t, y) = \int p(z)\,p(u)\,p(t \mid z, u)\,p(y \mid t, u)\,\mathrm{d}u.\] This cannot be simplified to \(p(z)\,p(t\mid z)\,p(y\mid t)\) when \(U\) is a common cause of \(T\) and \(Y\) — this is precisely the endogeneity problem.

2.5 Worked Example: The Education–Earnings DAG

This section is the template for how we will use DAGs throughout the course: specify the DAG, read off the factorization, use d-separation to identify the implied conditional independences, and interpret in causal terms.

Practice. Before working through the example below, return to the eight-node DAG example and rework each of the five queries from scratch, following three steps: (1) list every path; (2) classify every intermediate node as chain, fork, or collider; (3) determine which paths are blocked or open.

The causal story. Education (\(E\)) affects earnings (\(Y\)); family background (\(B\)) is a common cause of both; neighborhood (\(N\)) affects education but has no direct effect on earnings. All four variables are observed (no latent confounders).

Step 1 — Markov factorization. \(\Pa(N) = \Pa(B) = \varnothing\), \(\Pa(E) = \{N, B\}\), \(\Pa(Y) = \{E, B\}\): \[p(n, b, e, y) = p(n)\,p(b)\,p(e \mid n, b)\,p(y \mid e, b).\]

Step 2 — d-Separation. There are two paths between \(N\) and \(Y\):

  • Path 1: \(N \to E \to Y\) (a chain through \(E\)).
  • Path 2: \(N \to E \leftarrow B \to Y\) (a collider at \(E\), followed by the fork leg \(B \to Y\)).

(i) Is \((N \indep Y)_{\Gcal}\)? Path 1 is a chain with nothing conditioned on, so it is open. Path 2 has a collider at \(E\) (not conditioned on), so it is blocked. One open path: \(N \nindep Y\).

(ii) Is \((N \indep Y \mid E)_{\Gcal}\)? Path 1 is blocked (conditioning on \(E\) in the chain). Path 2 has a collider at \(E\) opened by conditioning, and the activated path continues through \(B\) (not conditioned on), so it is open. Hence \(N \nindep Y \mid E\) — a collider trap: conditioning on \(E\) introduces bias, not removes it.

(iii) Is \((N \indep Y \mid \{E, B\})_{\Gcal}\)? Path 1 is blocked by \(E\). Path 2 is opened at collider \(E\) but then blocked at \(B\). All paths blocked: \((N \indep Y \mid E, B)_{\Gcal}\).

(iv) Is \((N \indep B)_{\Gcal}\)? The only path \(N \to E \leftarrow B\) has a collider at \(E\) (not conditioned on). Path is blocked: \(N \indep B\).

Step 3 — Conditional Independence. By the Soundness theorem: \[N \indep B, \qquad N \indep Y \mid E, B, \qquad N \nindep Y, \qquad N \nindep Y \mid E.\] The statement \(N \nindep Y \mid E\) is an important warning: conditioning only on education when studying the neighborhood–earnings association introduces bias through the activated collider path.

Step 4 — Identification. We wish to identify \(P(y \mid \doop(E{=}e))\). The only back-door path is \(E \leftarrow B \to Y\) (a fork at \(B\)).

  • \(\mathbf{S} = \{B\}\): blocks the fork \(E \leftarrow B \to Y\), and \(B\) is not a descendant of \(E\). Valid.
  • \(\mathbf{S} = \{N\}\): does not block \(E \leftarrow B \to Y\). Invalid.

With \(\mathbf{S} = \{B\}\), the back-door formula (Chapter 3) gives: \[P(y \mid \doop(E{=}e)) = \sum_{b} P(y \mid e, b)\,P(b),\] identifying the causal effect entirely from observational data.

2.6 The Big Picture

The central logic runs from a qualitative causal graph to an identification formula:

\[\text{structural assumptions} \Longrightarrow \text{DAG} \Longrightarrow \text{d-separation relations} \Longrightarrow \text{conditional independences (Markov)} \Longrightarrow \text{identification formulas}.\]

Chapter 2 establishes the first four links in this chain. Chapters 3 and beyond use the same graphical machinery to derive specific identification results: back-door adjustment, front-door adjustment, and do-calculus formulas.

2.7 Summary

  1. DAGs as causal structure. A DAG encodes which variables are causally connected, which are potential confounders, and which variables may block or open paths.

  2. Three-node motifs. Every path is built from chains, forks, and colliders. Chains and forks are open by default and are blocked by conditioning on the middle node. Colliders are blocked by default and are opened by conditioning on the collider or on one of its descendants.

  3. d-Separation and conditional independence. \(X\) and \(Y\) are d-separated by \(\mathbf{S}\) exactly when every path between them is blocked by \(\mathbf{S}\). Under the Markov property, d-separation implies the corresponding conditional independence. Some implications may be assessed empirically, but assumptions involving unobserved variables are generally not testable.

  4. Markov factorization. The local Markov property yields \(p(v_1,\dots,v_k) = \prod_{i=1}^{k} p(v_i \mid \Pa(v_i))\), the bridge from graphical structure to probability calculus.

  5. Collider bias. Conditioning on a collider — or on a descendant of a collider — can induce a spurious association between otherwise independent variables (Berkson 1946). This is the opposite of confounding adjustment.

  6. A practical workflow. To analyze a DAG: identify the graph structure, determine which paths are open or blocked, translate d-separation statements into conditional independences under the Markov property, and interpret in light of the causal question.

2.8 Problems

1. Warm-up: a single collider. Consider the DAG \(X \to Y \leftarrow Z\).

  1. Identify the structural role of \(Y\) on the path \(X \to Y \leftarrow Z\).
  2. Is \(X \indep Z\)? Apply the d-separation criterion and state which blocking rule applies.
  3. Is \(X \indep Z \mid Y\)? Explain what happens to the path when \(Y\) is conditioned on, and describe in one sentence the real-world phenomenon this illustrates.

2. d-Separation practice. Consider the DAG: \(A \to B \to D\), \(A \to C \to D\), \(B \to E\), \(C \to E\).

  1. List all paths between \(A\) and \(E\). (Hint: there are four paths; two pass through \(D\).)
  2. For each path, identify the role (chain, fork, collider) of each intermediate node.
  3. Does \(\{B, C\}\) d-separate \(A\) and \(E\)?
  4. Does \(\{D\}\) d-separate \(B\) and \(C\)? What type of node is \(D\) on the path \(B \to D \leftarrow C\)?

3. Berkson’s bias. Suppose \(X\) and \(Y\) are independent standard normal variables, and let \(S = \mathbf{1}[X + Y > 0]\).

  1. Verify analytically that \(\mathrm{Cov}(X, Y \mid S{=}1) < 0\).
  2. Draw the DAG for \((X, Y, S)\) and identify \(S\) as a collider.
  3. Explain in one sentence why restricting the analysis to the subsample with \(S=1\) biases estimates of any association between \(X\) and \(Y\).

4. Markov factorization and collider activation. Consider the DAG with edges \(A \to E\), \(A \to W\), \(F \to E\), \(E \to W\), where \(A\) = ability, \(F\) = family income, \(E\) = education, \(W\) = wages.

  1. Write down the Markov factorization \(p(a, f, e, w)\).
  2. Is \((F \indep W)_{\Gcal}\)? List all paths between \(F\) and \(W\) and determine which are open.
  3. Is \((F \indep W \mid E)_{\Gcal}\)? Identify the role of \(E\) on each path.
  4. A researcher regresses \(W\) on \(E\) and \(F\), omitting \(A\). Is the coefficient on \(E\) a causal effect? Explain using the graph.

5. Soundness theorem for the collider. Consider the collider \(A \to M \leftarrow B\) with factorization \(p(a, m, b) = p(a)\,p(b)\,p(m \mid a, b)\).

  1. By marginalizing over \(M\), show that \(A \indep B\) in the joint distribution. This verifies the Soundness theorem for \(\mathbf{S} = \varnothing\).
  2. Show that conditioning on \(M\) breaks this independence: write out \(p(a, b \mid m)\) and explain why it does not factorize into \(p(a \mid m)\,p(b \mid m)\) in general.
  3. Explain in one sentence why parts (a) and (b) together are consistent with the Soundness theorem. (Hint: the theorem is a one-directional statement.)

6. Terminology check. Consider the DAG with edges \(U \to X\), \(U \to Z\), \(X \to W\), \(Z \to W\), \(W \to Y\).

  1. Identify the parents, children, ancestors, descendants, and non-descendants of node \(W\).
  2. List all pairs of adjacent nodes.
  3. Which pairs of nodes are connected by a directed path? List every such pair and the corresponding path.
  4. Write the Markov factorization \(p(u, x, z, w, y)\).

7. Markov factorization and local Markov property. Consider the DAG with edges \(X_1 \to X_2\), \(X_1 \to X_3\), \(X_2 \to X_4\), \(X_3 \to X_4\).

  1. Write the joint density \(p(x_1, x_2, x_3, x_4)\) implied by the Markov factorization.
  2. State the local Markov property for each of the four nodes.
  3. Is \((X_2 \indep X_3)_{\Gcal}\)? Is \((X_2 \indep X_3 \mid X_1)_{\Gcal}\)? Justify each answer by listing all paths.

8. Toy proof: conditional independence in the fork. Consider the fork \(X_1 \leftarrow X_2 \to X_3\) with factorization \(p(x_1, x_2, x_3) = p(x_2)\,p(x_1 \mid x_2)\,p(x_3 \mid x_2)\).

  1. Show directly, by conditioning on \(X_2 = x_2\), that \(X_1 \indep X_3 \mid X_2\).
  2. Is \(X_1 \indep X_3\) marginally? Justify both graphically and algebraically.
  3. Explain in one sentence what the fork represents substantively and why conditioning on the common cause removes the association.

9. d-Separation in the practice DAG. Refer to the eight-node DAG in the Practice DAG example (nodes \(W\), \(X_1\), \(T\), \(M_1\), \(M_2\), \(Y\), \(C\), \(D\); edges \(W \to X_1\), \(W \to T\), \(W \to Y\), \(T \to M_1\), \(M_1 \to Y\), \(M_1 \to M_2\), \(X_1 \to C\), \(M_2 \to C\), \(C \to D\)). For each query, state whether it is true or false and justify by listing all relevant paths.

  1. \(X_1 \indep Y\)
  2. \(X_1 \indep Y \mid W\)
  3. \(T \indep M_2 \mid M_1\)
  4. Does conditioning on \(C\) induce collider bias between \(X_1\) and \(M_2\)? Identify the specific path opened.
  5. Does conditioning on \(D\) (a descendant of \(C\)) open the path \(X_1 \to C \leftarrow M_2\)? State which clause of the d-blocking definition applies.