In this thought experiment, we explore how tutoring could be scaled nationally to address COVID-19 learning loss and become a permanent feature of the U.S. public education system. We outline a blueprint centered on ten core principles and a federal architecture to support adoption, while providing for local ownership over key implementation features. High school students would tutor in elementary schools via an elective class, college students in middle schools via federal work-study, and full time 2- and 4-year college graduates in high schools via AmeriCorps. We envision an incremental, demand-driven expansion process with priority given to high-needs schools. Our blueprint highlights a range of design tradeoffs and implementation challenges as well as estimates of program costs. Our estimates suggest that targeted approaches to scaling school-wide tutoring nationally, such as focusing on K-8 Title I schools, would cost between $5 and $15 billion annually.
COVID-19 shuttered schools across the United States, upending traditional approaches to education. We examine teachers’ experiences during emergency remote teaching in the spring of 2020 using responses to a working conditions survey from a sample of 7,841 teachers across 206 schools and 9 states. Teachers reported a range of challenges related to engaging students in remote learning and balancing their professional and personal responsibilities. Teachers in high-poverty and majority Black schools perceived these challenges to be the most severe, suggesting the pandemic further increased existing educational inequities. Using data from both pre-post and retrospective surveys, we find that the pandemic and pivot to emergency remote teaching resulted in a sudden, large drop in teachers’ sense of success. We also demonstrate how supportive working conditions in schools played a critical role in helping teachers to sustain their sense of success. Teachers who could depend on their district and school-based leadership for strong communication, targeted training, meaningful collaboration, fair expectations, and recognition of their efforts were least likely to experience declines in their sense of success.
We examine the dynamic nature of student-teacher match quality by studying the eect of having a teacher for more than one year. Using state-wide data from Tennessee and panel methods, we nd that having a repeat teacher improves achievement and decreases absences, truancy, and suspensions. These results are robust to a range of tests for teacher and student sorting. White girls benet most academically from repeat teachers and boys of color benet most behaviorally. Effects increase with the share of repeat students in a teachers class suggesting that intentional classroom assignments policies such as looping may have even larger benets.
Narrative accounts of classroom instruction suggest that external interruptions, such as intercom announcements and visits from staff, are a regular occurrence in U.S. public schools. We study the frequency, nature, and duration of external interruptions in the Providence Public School District (PPSD) using original data from a district-wide survey and classroom observations. We estimate that a typical classroom in PPSD is interrupted over 2,000 times per year, and that these interruptions and the disruptions they cause result in the loss of between 10 to 20 days of instructional time. Administrators appear to systematically underestimate the frequency and negative consequences of these interruptions. We propose several organizational approaches schools might adopt to reduce external interruptions to classroom instruction.
Starting in 2011, Boston Public Schools (BPS) implemented major reforms to its teacher evaluation system with a focus on promoting teacher development. We administered independent district-wide surveys in 2014 and 2015 to capture BPS teachers’ perceptions of the evaluation feedback they receive. Teachers generally reported that evaluators were fair and accurate, but that they struggled to provide high-quality feedback. We conduct a randomized controlled trial to evaluate the district’s efforts to improve this feedback through an intensive training program for evaluators. We find little evidence the program affected evaluators’ feedback, teacher retention, or student achievement. Our results suggest that improving the quality of evaluation feedback may require more fundamental changes to the design and implementation of teacher evaluation systems.
Numerous high-profile efforts have sought to “turn around” low-performing schools. Evidence of these programs’ effectiveness, however, is mixed, and research offers little guidance on which types of turnaround models are more likely to succeed. We present a case study of turnaround efforts led by the Blueprint Schools Network in three schools in Boston. Using a difference-in-differences framework, we find that Blueprint raised student achievement in mathematics and ELA by at least a quarter of a standard deviation, on average. We document qualitatively how differential impacts across the three Blueprint schools relate to contextual and implementation factors. In particular, Blueprint’s role as a turnaround partner (in two schools) versus school operator (in one school) shaped its ability to implement its model. As a partner, Blueprint provided expertise and guidance but had limited ability to fully implement its model. In its role as an operator, Blueprint had full authority to implement its turnaround model, but was also responsible for managing the day-to-day operations of the school, a role for which it had limited prior experience.
We study the adoption and implementation of a new mobile communication app among a sample of 132 New York City public schools. The app provides a platform for sharing general announcements and news as well as engaging in personalized two-way communication with individual parents. We provide participating schools with free access to the app and randomize schools to receive intensive support (training, guidance, monitoring, and encouragement) for maximizing the efficacy of the app. Although user supports led to higher levels of communication within the app in the treatment year, overall usage remained low and declined in the following year when treatment schools no longer received intensive supports. We find few subsequent effects on perceptions of communication quality or student outcomes. We leverage rich internal user data to explore how take-up and usage patterns varied across staff and school characteristics. These analyses help to identify early adopters and reluctant users, revealing both opportunities and obstacles to engaging parents through new communication technology.
Educators, I have a request. When you are finally able to return to your classroom this fall—or whenever it’s possible—keep a tally of every time learning is disrupted by interruptions coming from outside your class. Keep note: How often do you have to pause instruction because of intercom announcements, calls to the classroom phone, and teachers, administrators and staff knocking at your door? Five, ten—even 20 times a day?
Researchers commonly interpret effect sizes by applying benchmarks proposed by Cohen over a half century ago. However, effects that are small by Cohen’s standards are large relative to the impacts of most field-based interventions. These benchmarks also fail to consider important differences in study features, program costs, and scalability. In this paper, I present five broad guidelines for interpreting effect sizes that are applicable across the social sciences. I then propose a more structured schema with new empirical benchmarks for interpreting a specific class of studies: causal research on education interventions with standardized achievement outcomes. Together, these tools provide a practical approach for incorporating study features, cost, and scalability into the process of interpreting the policy importance of effect sizes.
This paper describes and evaluates a web-based coaching program designed to support teachers in implementing Common Core-aligned math instruction. Web-based coaching programs can be operated at relatively lower costs, are scalable, and make it more feasible to pair teachers with coaches who have expertise in their content area and grade level. Results from our randomized field trial document sizable and sustained effects on both teachers’ ability to analyze instruction and on their instructional practice, as measured the Mathematical Quality of Instruction (MQI) instrument and student surveys. However, these improvements in instruction did not result in corresponding increases in math test scores as measured by state standardized tests or interim assessments. We discuss several possible explanations for this pattern of results.
We examine the dynamic nature of teacher skill development using panel data on principals’ subjective performance ratings of teachers. Past research on teacher productivity improvement has focused primarily on one important but narrow measure of performance: teachers’ value-added to student achievement on standardized tests. Unlike value-added, subjective performance ratings provide detailed information about specific skill dimensions and are available for the many teachers in non-tested grades and subjects. Using a within-teacher returns to experience framework, we find, on average, large and rapid improvements in teachers’ instructional practices throughout their first ten years on the job as well as substantial differences in improvement rates across individual teachers. We also document that subjective performance ratings contain important information about teacher effectiveness. In the district we study, principals appear to differentiate teacher performance throughout the full distribution instead of just in the tails. Furthermore, prior performance ratings and gains in these ratings provide additional information about teachers’ ability to improve test scores that is not captured by prior value-added scores. Taken together, our study provides new insights on teacher performance improvement and variation in teacher development across instructional skills and individual teachers.
In recent years, states have sought to increase accountability for public school teachers by implementing a package of reforms centered on high-stakes evaluation systems. We examine the effect of these reforms on the supply and quality of new teachers. Leveraging variation across states and time, we find that accountability reforms reduced the number of newly licensed teacher candidates and increased the likelihood of unfilled teaching positions, particularly in hard-to-staff schools. Evidence also suggests that reforms increased the quality of new labor supply by reducing the likelihood new teachers attended unselective undergraduate institutions. Decreases in job security, satisfaction, and autonomy are likely mechanisms for these effects.
Over the past 15 years, the education research community has advocated for the application of more rigorous research designs that support causal inferences, for research that provides more generalizable results across settings, and for the value of research-practice partnerships that inform the design of local programs and policies. However, these goals are often in tension with each other. We propose a research design – the multi-cohort, longitudinal experimental (MCLE) design – as one approach to balancing these competing goals of high-quality research. We illustrate the uses and benefits of MCLEs with an example from a research-practice partnership aimed at evaluating the effect of a teacher coaching program. We find that the coaching program failed to replicate its effectiveness with an initial cohort, likely due to changes in personnel, duration, and content. Our analyses can help researchers weigh the tradeoffs of different design features of MCLEs.
I exploit the random assignment of class rosters in the MET Project to estimate teacher effects on students’ performance on complex open-ended tasks in math and reading, as well as their growth mindset, grit, and effort in class. I find large teacher effects across this expanded set of outcomes, but weak relationships between these effects and performance measures used in current teacher evaluation systems including value-added to state standardized tests. These findings suggest teacher effectiveness is multidimensional, and high-stakes evaluation decisions are only weakly informed by the degree to which teachers are developing students’ complex cognitive skills and social-emotional competencies.
Bush’s and Obama’s federal education reforms were remarkably similar in their goals and ambitions. Bush’s No Child Left Behind (NCLB) Act and Obama’s Race to the Top (RTTT) and NCLB state waiver programs leveraged federal funding and authority to address four broad areas: academic standards, data and accountability, teacher quality, and school turnarounds. This chapter focuses specifically on how these efforts have influenced the teaching profession. During Bush’s and Obama’s combined sixteen years in office, the federal government succeeded in fundamentally changing licensure requirements and evaluation systems for public school teachers. Reflecting on the successes and failures of these reforms provides important lessons about the potential and limitations of federal policy as a tool for improving the quality of the US teacher workforce.
Teacher coaching has emerged as a promising alternative to traditional models of professional development. We review the empirical literature on teacher coaching and conduct meta-analyses to estimate the mean effect of coaching programs on teachers’ instructional practice and students’ academic achievement. Combining results across 60 studies that employ causal research designs, we find pooled effect sizes of 0.49 standard deviations (SD) on instruction and 0.18 SD on achievement. Much of this evidence comes from literacy coaching programs for pre-kindergarten and elementary school teachers. Although these findings affirm the potential of coaching as a development tool, further analyses illustrate the challenges of taking coaching programs to scale while maintaining effectiveness. Average effects from effectiveness trials of larger programs are only a fraction of the effects found in efficacy trials of smaller programs. We conclude by discussing ways to address scale-up implementation challenges and providing guidance for future causal studies.
The vast differences in summer learning activities among children present a substantial challenge to providing equal educational opportunity in the United States. Most initiatives aimed at reversing summer learning loss focus on school- or center-based programs. This study explores the potential of enabling parents to provide literacy development opportunities at home as a low-cost alternative. We conduct a randomized field trial of a summer text-messaging pilot program for parents focused on promoting literacy skills among first through fourth graders. We find positive effects on reading comprehension among third and fourth graders, with effect sizes of .21 to .29 standard deviations, but no effects for first and second graders. Texts also increased attendance at parent-teacher conferences but not at other school-related activities. Evidence to inform future efforts to reverse summer learning loss is provided by parents’ responses to a follow-up survey.