Machines are increasingly doing “intelligent” things: Facebook recognizes faces in photos, Siri understands voices, and Google translates websites. The fundamental insight behind these breakthroughs is as much statis- tical as computational. Face recognition algorithms, for example, use a large dataset of photos labeled as having a face or not to estimate a function f(x) that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that clarifies its place in the econometric toolbox. Machine learning not only provides new tools, it solves a specific problem. Machine learning revolves around prediction on new sample points from the same distribution, while many economic applications revolve around parameter estimation and counterfactual prognosis. So applying machine learning to economics requires finding relevant prediction tasks.
In this paper we present tools for applied researchers that re-purpose off-the-shelf methods from the computer-science field of machine learning to create a "discovery engine" for data from randomized controlled trials (RCTs). The applied problem we seek to solve is that economists invest vast resources into carrying out RCTs, including the collection of a rich set of candidate outcome measures. But given concerns about inference in the presence of multiple testing, economists usually wind up exploring just a small subset of the hypotheses that the available data could be used to test. This prevents us from extracting as much information as possible from each RCT, which in turn impairs our ability to develop new theories or strengthen the design of policy interventions. Our proposed solution combines the basic intuition of reverse regression, where the dependent variable of interest now becomes treatment assignment itself, with methods from machine learning that use the data themselves to flexibly identify whether there is any function of the outcomes that predicts (or has signal about) treatment group status. This leads to correctly-sized tests with appropriate p-values, which also have the important virtue of being easy to implement in practice. One open challenge that remains with our work is how to meaningfully interpret the signal that these methods find.
Nearest-neighbor matching is a popular nonparametric tool to create balance between treatment and control groups in observational studies. As a preprocessing step before regression, matching reduces the dependence on parametric modeling assumptions. In current empirical practice, however, the matching step is often ignored in the calculation of standard errors and confidence intervals. In this article, we show that ignoring the matching step results in asymptotically valid standard errors if matching is done without replacement and the regression model is correctly specified relative to the population regression function of the outcome variable on the treatment variable and all the covariates used for matching. However, standard errors that ignore the matching step are not valid if matching is conducted with replacement or, more crucially, if the second step regression model is misspecified in the sense indicated above. Moreover, correct specification of the regression model is not required for consistent estimation of treatment effects with matched data. We show that two easily implementable alternatives produce approximations to the distribution of the post-matching estimator that are robust to misspecification. A simulation study and an empirical example demonstrate the empirical relevance of our results.
Econometric analysis typically focuses on the statistical properties of fixed estimators and ignores researcher choices. In this article, I approach the analysis of experimental data as a mechanism-design problem that acknowledges that researchers choose between estimators, sometimes based on the data and often according to their own preferences. Specifically, I focus on covariate adjustments, which can increase the precision of a treatment-effect estimate, but open the door to bias when researchers engage in specification searches. First, I establish that unbiasedness is a requirement on the estimation of the average treatment effect that aligns researchers’ preferences with the minimization of the mean-squared error relative to the truth, and that fixing the bias can yield an optimal restriction in a minimax sense. Second, I provide a constructive characterization of all treatment-effect estimators with fixed bias as sample-splitting procedures. Third, I show how these results imply flexible pre-analysis plans for randomized experiments that include beneficial specification searches and offer an opportunity to leverage machine learning.
The two-stage least-squares (2SLS) estimator is known to be biased when its first-stage fit is poor. I show that better first-stage prediction can alleviate this bias. In a two-stage linear regression model with Normal noise, I consider shrinkage in the estimation of the first-stage instrumental variable coefficients. For at least four instrumental variables and a single endogenous regressor, I establish that the standard 2SLS estimator is dominated with respect to bias. The dominating IV estimator applies James–Stein type shrinkage in a first-stage high-dimensional Normal-means problem followed by a control-function approach in the second stage. It preserves invariances of the structural instrumental variable equations.
Shrinkage estimation usually reduces variance at the cost of bias. But when we care only about some parameters of a model, I show that we can reduce variance without incurring bias if we have additional information about the distribution of covariates. In a linear regression model with homoscedastic Normal noise, I consider shrinkage estimation of the nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I establish that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James–Stein shrinkage in a high-dimensional Normal-means problem. It can be interpreted as an invariant generalized Bayes estimator with an uninformative (improper) Jeffreys prior in the target parameter.