Causal inference with random forests

Abstract Random forests, introduced by Breiman [2001], have become one of the most popular machine learning algorithms among practitioners, and reliably achieve good predictive performance across several application areas. This has led to considerable interest in using random forests for doing science, or drawing statistical inferences in problems that do not reduce immediately to prediction. As a step in this direction, this thesis studies how random forests can be used for understanding treatment effect heterogeneity as it may arise in, e.g., personalized medicine. Our main contributions are as follows: - We develop a causal forest algorithm for heterogeneous treatment effect estimation, and find our method to be substantially more powerful at identifying treatment heterogeneity than traditional methods based on nearest-neighbor matching, especially when the number of considered covariates is large. - We provide an asymptotic statistical analysis of causal forests, and prove a Gaussian limit result. We then propose a practical method for estimating the noise scale of causal forests, thus allowing for valid statistical inference with causal forests. - In a high-dimensional regime where the problem complexity and the number of observations jointly approach infinity, we identify the signal strength at which tree-based methods become able to accurately detect treatment heterogeneity. Perhaps strikingly, we find that the required signal strength only scales logarithmically in the dimension of the problem. Taken together, these results show that random forests -- despite often being understood as a mere black box predictive algorithm -- provide a powerful toolbox for heterogeneous treatment effect estimation in modern large-scale problems.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with Wager, Stefan
Associated with Stanford University, Department of Statistics.
Primary advisor Efron, Bradley
Primary advisor Walther, Guenther
Thesis advisor Efron, Bradley
Thesis advisor Walther, Guenther
Thesis advisor Hastie, Trevor
Advisor Hastie, Trevor

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Stefan Wager.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright © 2016 by Stefan De Treville Wager

Versions

Version 1 May 8, 2024 You are viewing this version | Copy URL