Theory and Methods in Causal Inference
Preface
These lecture notes accompany the graduate course STAT 5900B: Theory and Methods in Causal Inference at the Department of Statistics, Iowa State University.
The course targets first-year PhD students in statistics. It assumes familiarity with probability theory, mathematical statistics, and linear models at the level of a graduate sequence, but no prior exposure to causal inference.
π Download full PDF
Organization
The notes are organized in three parts, with a fourth part on policy learning deferred to a later release.
Part I β Foundations (Chapters 1β3) builds the graphical and conceptual foundation of causal inference. Chapter 1 introduces the three languages of causality β structural equation models (SEMs), directed acyclic graphs (DAGs), and potential outcomes β and explains how they relate. Chapter 2 develops DAGs and d-separation as tools for reading off conditional independence from causal structure. Chapter 3 introduces the do-calculus and the main identification criteria (back-door, front-door, do-calculus rules) that translate causal queries into statistical ones.
Part II β Identification (Chapters 4β9) bridges the graphical framework and observed data. Chapter 4 connects the potential outcomes framework to the do-calculus and establishes the consistency, exchangeability, and positivity assumptions. Chapter 5 covers randomization and the back-door adjustment formula. Chapter 6 develops propensity score methods β weighting, matching, and the balancing property. Chapter 7 develops instrumental variables: the Wald estimand, the local average treatment effect (LATE) for compliers, and the three IV assumptions. Chapter 8 covers mediation analysis and the front-door criterion, distinguishing the controlled direct effect from the natural direct and indirect effects. Chapter 9 addresses sensitivity analysis and partial identification: the three canonical sensitivity models (Rosenbaum \(\Gamma\), E-value, marginal sensitivity model), Manskiβs no-assumption bounds, and the connection between sensitivity analysis and modern estimation methods.
Part III β Estimation (Chapters 10β13) covers semiparametric efficiency theory and its modern applications. Chapter 10 introduces the estimation framework β estimating equations, asymptotic linearity, influence functions, Z-estimation with nuisance parameters, and the semiparametric efficiency bound β that underlies the rest of Part III. Chapter 11 develops the augmented inverse probability weighted (AIPW) estimator from three complementary perspectives β bias correction, optimal augmentation, and the efficient influence function β and establishes double robustness and semiparametric efficiency. Chapter 12 addresses flexible nuisance estimation with orthogonal scores and cross-fitting, culminating in the double machine learning (DML) estimator and its rate conditions. Chapter 13 covers estimation under instrumental variables: the Wald estimator, two-stage least squares, GMM and overidentification tests, weak instruments, generalized empirical likelihood, and the control function approach.
Part IV β Policy Learning and Sequential Decision Making (Chapters 14β16, deferred) will cover policy evaluation, optimal policy learning, and dynamic treatment regimes.
Appendices
Appendix A provides graphical intuition for conditional independence and d-separation. Appendix B introduces Single World Intervention Graphs (SWIGs). Appendix C develops the formal foundations of pathwise differentiability and efficient influence functions for readers who want the semiparametric geometry underlying Chapter 10.
Notation
Throughout these notes:
- \(\doop(T = t)\) denotes the do-operator (intervention).
- \(Y(t)\) denotes the potential outcome under \(T = t\).
- \(\E[\,\cdot\,]\) denotes expectation; \(\indep\) denotes conditional independence.
- \(\mathcal{G}\) denotes a DAG; \(\mathcal{G}_{\overline{T}}\) denotes the mutilated graph after intervening on \(T\).
- \(\pi(x) = P(T=1 \mid X=x)\) denotes the propensity score.
- \(\mu_t(x) = \E(Y \mid T=t, X=x)\) denotes the outcome regression.
Acknowledgements
I would like to thank professors Chan Park, Shu Yang, and Dylan Small for their constructive comments on the previous version of this lecture note.