Working Paper
Page, L., Feller, A., Grindal, T., Miratrix, L., & Sommers, M. - A. (Working Paper). Principal stratification:A tool for understanding variation in program effects across endogenous subgroups.Abstract

Increasingly, researchers are interested in questions regarding treatment-effect variation across partially or fully latent subgroups defined not by pre-treatment characteristics but by post-randomization actions. One promising approach to address such questions is principal stratification. Under this framework, a researcher defines endogenous subgroups, or principal strata, based on post-randomization behaviors under both the observed and counterfactual experimental conditions. This paper provides a non-technical primer to principal stratification. We review selected applications to highlight the breadth of substantive questions and methodological issues that this method can inform. We then discuss its relationship to instrumental variables analysis to address binary non-compliance in an experimental context and highlight how the framework can be generalized to handle more complex post-treatment patterns. We emphasize the counterfactual logic fundamental to principal stratification and the key assumptions that render analytic challenges more tractable. We briefly discuss technical aspects of estimation procedures, providing a short guide for interested readers.

Under review.

Barnes, L., Feller, A., Haselswerdt, J., & Porter, E. (Working Paper). Information and preferences over redistributive policy: A field experiment.Abstract

This paper presents results from a field experiment conducted during the rollout of the British government’s “taxpayer statements” in the fall of 2014. These statements present all taxpayers individualized information about the uses of their tax money. By leveraging information provided to us by the British government about the distribution of the statements—and matching that information to nationally representative survey panels—we are able to measure the causal effects of the statements on redistributive preferences, trust in government, political knowledge and consumer preferences. In a pilot study conducted prior to the rollout of the actual statements, mock versions of the statements caused subjects to express more “correct” redistributive preferences, with wealthier respondents preferring lower taxes and poorer respondents preferring higher taxes. The statements also caused subjects to indicate a decreased willingness to make a major consumer purchase in the near future. Together, the pilot results and the scope of the field experiment suggest that this project will help shed new light on longstanding questions about the effects of political information. 

Under review.

Ding, P., Feller, A., & Miratrix, L. (Working Paper). Decomposing treatment effect variation.Abstract

A recent literature focuses the critical role of treatment effect variation in estimating causal effects. This approach, however, is in contrast to much of the foundational research on causal inference; Neyman, for example, avoided such variation through his focus on the Average Treatment Effect (ATE) and his definition of the confidence interval. In this paper, we extend the Neymanian framework to explicitly allow both for treatment effect variation explained by covariates, known as the systematic component, and for unexplained treatment effect variation, known as the idiosyncratic component. This perspective enables estimation and testing of impact variation without imposing a model on the marginal distribution of potential outcomes; the workhorse regression with interaction terms is therefore a special case. Doing so leads to two practical results. First, we combine estimates of systematic impact variation with sharp bounds on overall treatment variation to obtain bounds on the proportion of total impact variation explained by observed covariates---essentially an R^2 for treatment effect variation. Second, we exploit this perspective to sharpen bounds on the variance of the ATE estimate itself. As long as the treatment effect varies across observed covariates, the resulting bounds are sharper than the current sharp bounds in the literature, collapsing to standard bounds in the absence of any systematic variation. We apply these ideas to a large randomized evaluation of a job training program, showing that these results are meaningful in practice.

Liebman, J., & Feller, A. (Working Paper). The economics and econometrics of Social Impact Bonds.Abstract

For the past three years, the Harvard Kennedy School Social Impact Bonds (SIB) lab team has been providing pro bono technical assistance to state and local governments around the country seeking to develop pay-for-success contracts using social impact bonds. We discuss the lessons learned from this “action research.” To what problems are SIBs potentially the answer? Which policy areas are a good fit for SIBs and which are not? Is the SIB model scalable? How should the risk inherent in social innovation be shared among government, philanthropies, service providers, and commercial investors? How should contracts be structured to provide good rather than perverse incentives? What level of evaluation rigor is required?

Feller, A., Grindal, T., Miratrix, L., & Page, L. (Working Paper). Compared to what? Variation in the impact of early childhood education by alternative care-type settings.Abstract

Early childhood education research often compares a group of children who receive the intervention of interest to a group of children who receive care in a range of different care settings. In this paper, we estimate differential impacts of an early childhood intervention by alternative care setting, using data from the Head Start Impact Study, a large-scale randomized evaluation. To do so, we utilize the principal stratification framework, a generalization of the instrumental variables approach, to estimate separate impacts for two types of Compliers: those children who would otherwise be in other center-based care when assigned to control and those who would otherwise be in home-based care. We find strong, positive short-term effects of Head Start on receptive vocabulary for those Compliers who would otherwise be in home-based care. By contrast, we find no meaningful impact of Head Start on vocabulary for those Compliers who would otherwise be in other center-based care. Our findings suggest that alternative care type is a potentially important source of variation in early childhood education interventions.

Working Paper

Under review.

Feller, A., & Miratrix, L. (Working Paper). Examining the foundations of methods that assess treatment effect heterogeneity.Abstract

A large and growing literature addresses the identification and estimation of causal effects for subgroups defined by post-treatment outcomes, known as principal strata or endogenous subgroups. This literature, however, is highly fractured, with different approaches using entirely different terminology. First, we review a range of methods that use covariates to identify and estimate these principal causal effects, including approaches that rely on latent independence assumptions and those that utilize cross-site variation. We then lay out the relevant identifying assumptions using the potential outcomes framework, which allows us to highlight the deep connections between these seemingly unrelated methods and to more easily understand their benefits and limitations.  We also provide some recommendations of use, giving guidance to the applied researcher for selecting suitable methods in a variety of settings. We compare these approaches using simulation studies and apply them to several large-scale randomized evaluations.    

Ding, P., Feller, A., & Miratrix, L. (Working Paper). Randomization inference for treatment effect variation.Abstract

Applied researchers are increasingly interested in whether and how treatment effects vary in randomized evaluations, especially variation not explained by observed covariates. We propose a model-free approach for testing for the presence of such unexplained variation. To use this randomization-based approach, we must address the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this set- ting. We explore potential solutions and advocate for an approach that guarantees exact tests in finite samples despite this nuisance. We also show how this approach readily extends to testing for heterogeneity beyond a given model, which can be useful for assessing the sufficiency of a given scientific theory. We finally apply our method to the National Head Start Impact Study, a large-scale randomized evaluation of a Federal preschool program, finding that there is indeed significant unexplained treatment effect variation.

Working Paper

Under revision.

Rogers, T., & Feller, A. (Working Paper). Randomly assigned peer grading and course completion in a Massive Open Online Course.Abstract

Students in Massively Open Online Courses (MOOC) often grade their classmates’ papers, called peer grading.  In a MOOC we study (over 150,000 enrolled students, around 10,000 students completing) students wrote a paper and graded three randomly chosen papers from among their peers. This means the portfolio of papers that students grade can vary from on average low quality to on average high quality. And since each paper was graded by at least three students, we can calculate a completely exogenous measure of portfolio quality by averaging the ratings of the other paper graders. We are interested in how the quality of the papers students grades affect their likelihood of completing the course, primarily by affecting how they believe their abilities compare to those of their classmates.  We made a linear prediction that the better the papers students grade, the less likely they will be to complete the class.  For the subset of 5,740 students who completed the writing assignment, the data supports this prediction (p = 0.03).  However, we find that this effect is driven almost entirely by students who grade the highest quality papers (F-test for spline fit, p < 0.01). In particular, students with paper portfolios in the top quintile of quality are 4 percentage points less likely to complete the course than all other quintiles (68 pp vs. 64 pp; p = 0.02), while students with paper portfolios in the other four quintiles have nearly identical levels of course completion.

In Preparation
Feller, A., Miratrix, L., & Pillai, N. (In Preparation). Causal inference in the Twilight Zone: Estimating principal stratification models with finite mixtures.
Morgan, K. L., Kennedy, C., Feller, A., Mann, C., & Nickerson, D. (In Preparation). Re-randomization in large, multi-arm experiments.Abstract

Political scientists have long recognized the role of careful experimental design in increasing the precision of treatment effect estimates. However, many advances in experimental design, such as improved matching methods, are infeasible to implement at the scale of modern field experiments. In this paper, we propose combining off-the-shelf methods, such as blocking, with re-randomization, in which the experimenter repeatedly generates new random assignments until covariate balance between the assigned treatment and control groups exceeds a pre-determined threshold. We outline existing theoretical results for re-randomization, extend these to the common setting of multi-armed trials, and show how the approach can scale to experiments with hundreds of thousands of units. We then describe the results of three large-scale voter mobilization experiments from the 2012 election that utilized both blocking and re-randomization. Through simulations, we show that the treatment effect confidence intervals from these experiments are significantly shorter than they would have been under blocking alone. Finally, we give practical guidance for implementing re-randomization using the Stata package rerandomize.

Rogers, T., & Feller, A. (In Preparation). A randomized experiment using absenteeism information to nudge attendance.
Ding, P., & Feller, A. (In Preparation). Ecological inference: Likelihood and Empirical Bayes perspectives.Abstract

In the ecological inference problem, a researcher observes the marginal counts for a series of contingency tables for two binary variables, such as race and partisanship, and wants to infer their joint distribution. This setting arises in a range of disciplines including law, political science, sociology, and epidemiology and has lead to a half-century of statistical methods searching for a solution. In this paper, we re-parameterize the ecological inference problem in terms of estimating the correlation for a constrained bivariate Normal variate. This leads to a natural interpretation of classic methods, like ecological regression, as well as to simple improvements on these approaches. More generally, we exploit this re-parameterization to explore full likelihood and Empirical Bayes models and extend these methods to the multi-level setting, allowing the bivariate associations to vary across tables. In a series of simulation studies and applications to benchmark data sets, we find that the resulting estimates perform comparably or better than existing approaches. Finally, we apply this method to a large randomized evaluation of a voter persuasion program, in which the outcome, candidate vote share, is only available aggregated to the precinct level.

Feller, A., & Gelman, A. (2014). Hierarchical models for causal effects. In Emerging Trends in the Behavioral and Social Sciences . Thousand Oaks, CA, Sage.Abstract

Hierarchical models play three important roles in modeling causal effects: (1) accounting for data collection, such as in stratified and split-plot experimental designs; (2) adjusting for unmeasured covariates, such as in panel studies; and (3) capturing treatment effect variation, such as in subgroup analyses. Across all three areas, hierarchical models, especially Bayesian hierarchical modeling, offer substantial benefits over classical, non-hierarchical approaches. After discussing each of these topics, we explore some recent developments in the use of hierarchical models for causal inference and conclude with some thoughts on new directions for this research area.

Working Paper
Lemieux, J., Kyes, S., Otto, T., Feller, A., Eastman, R., Pinches, R., Berriman, M., et al. (2013). Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation. Molecular microbiology , 90 (3), 519–537. Publisher's VersionAbstract

Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better under-standing of gene expression regulation and antigenic variation in malaria parasites.

Feller, A., & Airoldi, E. (2013). Comment on 'How to find an appropriate clustering for mixed type variables with application to socio-economic stratification' by Hennig and Liao. Journal of the Royal Statistical Society, Series C , 62 (3), 347–348. Publisher's VersionAbstract

In this brief comment, we highlight the dangers of over-interpreting the results of unsupervised learning algorithms like clustering. Based on exploratory analyses, we argue that the latent structure in the authors' example is better characterized by continuous, as opposed to discrete, latent variation.  

Mwai, L., Diriye, A., Masseno, V., Muriithi, S., Feltwell, T., Musyoki, J., Lemieux, J., et al. (2012). Genome wide adaptations of Plasmodium falciparum in response to lumefantrine selective drug pressure. PloS One , 7 (2), e31623. Publisher's VersionAbstract

The combination therapy of the Artemisinin-derivative Artemether (ART) with Lumefantrine (LM) (CoartemH) is an important malaria treatment regimen in many endemic countries. Resistance to Artemisinin has already been reported, and it is feared that LM resistance (LMR) could also evolve quickly. Therefore molecular markers which can be used to track CoartemH efficacy are urgently needed. Often, stable resistance arises from initial, unstable phenotypes that can be identified in vitro. Here we have used the Plasmodium falciparum multidrug resistant reference strain V1S to induce LMR in vitro by culturing the parasite under continuous drug pressure for 16 months. The initial IC50 (inhibitory concentration that kills 50% of the parasite population) was 24 nM. The resulting resistant strain V1SLM, obtained after culture for an estimated 166 cycles under LM pressure, grew steadily in 378 nM of LM, corresponding to 15 times the IC50 of the parental strain. However, after two weeks of culturing V1SLM in drug-free medium, the IC50 returned to that of the initial, parental strain V1S. This transient drug tolerance was associated with major changes in gene expression profiles: using the PFSANGER Affymetrix custom array, we identified 184 differentially expressed genes in V1SLM. Among those are 18 known and putative transporters including the multidrug resistance gene 1 (pfmdr1), the multidrug resistance associated protein and the V-type H+ pumping pyrophosphatase 2 (pfvp2) as well as genes associated with fatty acid metabolism. In addition we detected a clear selective advantage provided by two genomic loci in parasites grown under LM drug pressure, suggesting that all, or some of those genes contribute to development of LM tolerance—they may prove useful as molecular markers to monitor P. falciparum LM susceptibility.

Feller, A., Gelman, A., & Shor, B. (2012). Red state/blue state divisions in the 2012 election. The Forum , 10 (4), 127–131. Publisher's VersionAbstract

The so-called “red/blue paradox” is that rich individuals are more likely to vote Republican but rich states are more likely to support the Democrats. Previous research argued that this seeming paradox could be explained by comparing rich and poor voters within each state—the difference in the Republican vote share between rich and poor voters was much larger in low-income, conservative, middle-American states like Mississippi than in high-income, liberal, coastal states like Connecticut. We use exit poll and other survey data to assess whether this was still the case for the 2012 Presidential election. Based on this preliminary analysis, we find that, while the red/blue paradox is still strong, the explanation offered by Gelman et al. no longer appears to hold. We explore several empirical patterns from this election and suggest possible avenues for resolving the questions posed by the new data.

Lemieux, J. *, Feller, A. *, Holmes, C., & Newbold, C. (2009). Reply to Wirth et al.: In vivo profiles show continuous variation between 2 cellular populations. Proceedings of the National Academy of Sciences , 106 (27), E71–E72. Publisher's VersionAbstract

In this reply, we argue that seemingly discrete variation in the authors’ published gene expression profiles is actually a combination of continuous variation and technical biological errors in the original experiment.

* Joint first authors

Lemieux, J. *, Gomez-Escobar, N. *, Feller, A. *, Carret, C., Amambua-Ngwa, A., Pinches, R., Day, F., et al. (2009). Statistical estimation of cell-cycle progression and lineage commitment in Plasmodium falciparum reveals a homogeneous pattern of transcription in ex vivo culture. Proceedings of the National Academy of Sciences , 106 (18), 7559–7564. Publisher's VersionAbstract

We have cultured Plasmodium falciparum directly from the blood of infected individuals to examine patterns of mature-stage gene expression in patient isolates. Analysis of the transcriptome of P. falciparum is complicated by the highly periodic nature of gene expression because small variations in the stage of parasite development between samples can lead to an apparent difference in gene expression values. To address this issue, we have developed statistical likelihood-based methods to estimate cell cycle progression and commitment to asexual or sexual development lineages in our samples based on microscopy and gene expression patterns. In cases subsequently matched for temporal development, we find that transcriptional patterns in ex vivo culture display little variation across patients with diverse clinical profiles and closely resemble transcriptional profiles that occur in vitro. These statistical methods, available to the research community, assist in the design and interpretation of P. falciparum expression profiling experiments where it is difficult to separate true differential expression from cell-cycle dependent expression. We reanalyze an existing dataset of in vivo patient expression profiles and conclude that previously observed discrete variation is consistent with the commitment of a varying proportion of the parasite population to the sexual development lineage.

* Joint first authors