Causal inference with random forests
Abstract Random forests, introduced by Breiman [2001], have become one of the most popular machine learning algorithms among practitioners, and reliably achieve good predictive performance across several application areas. This has led to considerable interest in using random forests for doing science, or drawing statistical inferences in problems that do not reduce immediately to prediction. As a step in this direction, this thesis studies how random forests can be used for understanding treatment effect heterogeneity as it may arise in, e.g., personalized medicine. Our main contributions are as follows: - We develop a causal forest algorithm for heterogeneous treatment effect estimation, and find our method to be substantially more powerful at identifying treatment heterogeneity than traditional methods based on nearest-neighbor matching, especially when the number of considered covariates is large. - We provide an asymptotic statistical analysis of causal forests, and prove a Gaussian limit result. We then propose a practical method for estimating the noise scale of causal forests, thus allowing for valid statistical inference with causal forests. - In a high-dimensional regime where the problem complexity and the number of observations jointly approach infinity, we identify the signal strength at which tree-based methods become able to accurately detect treatment heterogeneity. Perhaps strikingly, we find that the required signal strength only scales logarithmically in the dimension of the problem. Taken together, these results show that random forests -- despite often being understood as a mere black box predictive algorithm -- provide a powerful toolbox for heterogeneous treatment effect estimation in modern large-scale problems.
Description
Type of resource | text |
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2016 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Wager, Stefan |
Associated with | Stanford University, Department of Statistics. |
Primary advisor | Efron, Bradley |
Primary advisor | Walther, Guenther |
Thesis advisor | Efron, Bradley |
Thesis advisor | Walther, Guenther |
Thesis advisor | Hastie, Trevor |
Advisor | Hastie, Trevor |
Subjects
Bibliographic information
Statement of responsibility | Stefan Wager. |
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2016. |
Location | electronic resource |
Access conditions
Copyright © 2016 by Stefan De Treville Wager
Versions