Standard approaches for causal inference in difference-in-differences and event-study designs are valid only under the assumption of parallel trends. Researchers are typically unsure whether the parallel trends assumption holds, and therefore gauge its plausibility by testing for pre-treatment differences in trends ("pre-trends") between the treated and untreated groups. This paper proposes robust inference methods that do not require that the parallel trends assumption holds exactly. Instead, we impose restrictions on the set of possible violations of parallel trends that formalize the logic motivating pre-trends testing -- namely, that the pre-trends are informative about what would have happened under the counterfactual. Under a wide class of restrictions on the possible differences in trends, the parameter of interest is set-identified and inference on the treatment effect of interest is equivalent to testing a set of moment inequalities with linear nuisance parameters. We derive computationally tractable confidence sets that are uniformly valid ("honest") so long as the difference in trends satisfies the imposed restrictions. Our proposed confidence sets are consistent, and have optimal local asymptotic power for many parameter configurations. We also introduce fixed length confidence intervals, which can offer finite-sample improvements for a subset of the cases we consider. We recommend that researchers conduct sensitivity analyses to show what conclusions can be drawn under various restrictions on the set of possible differences in trends. We conduct a simulation study and illustrate our recommended approach with applications to two recently published papers.
We provide an R package for implementation of our methods: HonestDiD
Tests for pre-existing trends ("pre-trends") are a common way of assessing the plausibility of the parallel trends assumption in difference-in-differences and related research designs. This paper highlights some important limitations of pre-trends testing. From a theoretical perspective, I analyze the distribution of conventional estimates and confidence intervals conditional on surviving a pre-test for pre-trends. I show that in non-pathological cases, the bias of conventional estimates conditional on passing a pre-test can be worse than the unconditional bias. Thus, pre-tests meant to mitigate bias and coverage issues in published work can in fact exacerbate them. I empirically investigate the practical relevance of these concerns in simulations based on a systematic review of recent papers in leading economics journals. I find that conventional pre-tests are often underpowered against plausible violations of parallel trends that produce bias of a similar magnitude as the estimated treatment effect. Distortions from pre-testing can also be substantial. Finally, I discuss alternative approaches that can improve upon the standard practice of relying on pre-trends testing.
We consider inference based on linear conditional moment inequalities, which arise in a wide variety of economic applications, including many structural models. We show that linear conditional structure greatly simplifies confidence set construction, allowing for computationally tractable projection inference in settings with nuisance parameters. Next, we derive least favorable critical values that avoid conservativeness due to projection. Finally, we introduce a conditional inference approach which ensures a strong form of insensitivity to slack moments, as well as a hybrid technique which combines the least favorable and conditional methods. Our conditional and hybrid approaches are new even in settings without nuisance parameters. We find good performance in simulations based on Wollmann (2018), especially for the hybrid approach.
This paper studies teacher attrition in Wisconsin following Act 10, a policy change which severely weakened teachers’ unions and capped wage growth for teachers. I document a sharp short-run increase in teacher turnover after the Act was passed, driven almost entirely by teachers over the minimum retirement age of 55, whose turnover rate doubled from 17 to 35 percent. Such teachers faced strong incentives to retire before the end of pre-existing collective bargaining agreements in order to secure collectively-bargained retirement benefits (e.g. healthcare), which no longer fell under the scope of collective bargaining after the Act. I find much more modest long-run increases in teacher turnover, consistent with previous estimates of labor supply elasticities. I then attempt to evaluate the effect of the wave of retirements following Act 10 on education quality using grade-level value-added metrics. I find suggestive evidence that student academic performance increased in grades with teachers who retired following the reform, and I obtain similar results when instrumenting for retirement using the pre-existing age distribution of teachers. Differences in value-added between retirees and their replacements can potentially explain some, but not all, of the observed academic improvements.
We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as bias reversal. We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.