Working Paper
Kraft MA, Christian A. In Search of High-Quality Evaluation Feedback: An Administrator Training Field Experiment. Working Paper.Abstract

Starting in 2011, Boston Public Schools (BPS) implemented major reforms to its teacher evaluation system with a focus on promoting teacher development.  We administered independent district-wide surveys in 2014 and 2015 to capture BPS teachers’ perceptions of the evaluation feedback they receive.  Teachers generally reported that evaluators were fair and accurate, but that they struggled to provide high-quality feedback.  We conduct a randomized controlled trial to evaluate the district’s efforts to improve this feedback through an intensive training program for evaluators.  We find little evidence the program affected evaluators’ feedback, teacher retention, or student achievement.  Our results suggest that improving the quality of evaluation feedback may require more fundamental changes to the design and implementation of teacher evaluation systems.

Download pdf here
Kraft MA. Interpreting Effect Sizes of Education Interventions. Working Paper.Abstract

Researchers commonly interpret effect sizes by applying benchmarks proposed by Cohen over a half century ago. However, effects that are small by Cohen’s standards are large relative to the impacts of most field-based interventions. These benchmarks also fail to consider important differences in study features, program costs, and scalability. In this paper, I present five broad guidelines for interpreting effect sizes that are applicable across the social sciences. I then propose a more structured schema with new empirical benchmarks for interpreting a specific class of studies: causal research on education interventions with standardized achievement outcomes. Together, these tools provide a practical approach for incorporating study features, cost, and scalability into the process of interpreting the policy importance of effect sizes.

Download pdf here
Kraft MK, Hill HC. Developing Ambitious Mathematics Instruction through Web-Based Coaching: A Randomized Field Trial. Working Paper.Abstract

This paper describes and evaluates a web-based coaching program designed to support teachers in implementing Common Core-aligned math instruction. Web-based coaching programs can be operated at relatively lower costs, are scalable, and make it more feasible to pair teachers with coaches who have expertise in their content area and grade level. Results from our randomized field trial document sizable and sustained effects on both teachers’ ability to analyze instruction
and on their instructional practice, as measured the Mathematical Quality of Instruction (MQI) instrument and student surveys. However, these improvements in instruction did not result in corresponding increases in math test scores as measured by state standardized tests or interim assessments. We discuss several possible explanations for this pattern of results. 

Download pdf here
Kraft MA, Papay JP, Chi OL. Teacher skill development: Evidence from performance ratings by principals. Working Paper.Abstract

We examine the dynamic nature of teacher skill development using panel data on principals’ subjective performance ratings of teachers. Past research on teacher productivity improvement has focused primarily on one important but narrow measure of performance: teachers’ value-added to student achievement on standardized tests. Unlike value-added, subjective performance ratings provide detailed information about specific skill dimensions and are available for the many teachers in non-tested grades and subjects. Using a within-teacher returns to experience framework, we find, on average, large and rapid improvements in teachers’ instructional practices throughout their first ten years on the job as well as substantial differences in improvement rates across individual teachers. We also document that subjective performance ratings contain important information about teacher effectiveness. In the district we study, principals appear to differentiate teacher performance throughout the full distribution instead of just in the tails. Furthermore, prior performance ratings and gains in these ratings provide additional information about teachers’ ability to improve test scores that is not captured by prior value-added scores. Taken together, our study provides new insights on teacher performance improvement and variation in teacher development across instructional skills and individual teachers.

Download pdf here
Kraft MA, Brunner EJ, Dougherty SM, Schwegman DJ. Teacher Evaluation Reforms and the Supply and Quality of New Teachers. Working Paper.Abstract

In recent years, states have sought to increase accountability for public school teachers by implementing high-stakes evaluation systems. We examine the effect of these reforms on the supply and quality of new teachers. Leveraging variation across states and time, we find that evaluation reforms reduced the supply of new teaching candidates by 17 percent and increased the likelihood of unfilled teaching positions, particularly in hard-to-staff schools. Reforms also increased the quality of newly hired teachers by shifting the lower tail of the distribution upward. We find evidence that decreased job security, satisfaction, and autonomy are likely mechanisms for these effects.

Download pdf here
Kraft MA, Brunner EJ, Dougherty SM, Schwegman D. Teacher Accountability Reforms and the Supply of New Teachers. Working Paper.Abstract

In recent years, states across the country have attempted to increase the accountability of public school teachers by implementing rigorous, high-stakes evaluation systems and in some cases repealing teacher tenure protections. We examine the effect of these reforms on the supply of new entrants into the teacher labor market by exploiting a unique panel dataset that includes the number of teaching licenses granted by states. Leveraging variation in the adoption of reforms across states and time, we find that evaluation reforms resulted in a steady decline in the statewide supply of new teachers, whereas tenure reforms produced a sharp but more temporary contraction. In exploratory analyses, we find mixed evidence of the effect of accountability on the selectivity of the institutions where prospective teachers earned their teaching degrees. There is little evidence evaluation reforms had any differential effect by university selectivity, while tenure reforms appear to have reduced supply more among candidates from less selective universities. We find no evidence that decreases in labor supply were concentrated in non-shortage or shortage licensure areas.

Download pdf here
Blazar DL, Kraft MA. Balancing Rigor, Replication, and Relevance: A Case for Multiple-Cohort, Longitudinal Experiments. AERA Open. Forthcoming.Abstract

Over the past 15 years, the education research community has advocated for the application of more rigorous research designs that support causal inferences, for research that provides more generalizable results across settings, and for the value of research-practice partnerships that inform the design of local programs and policies. However, these goals are often in tension with each other. We propose a research design – the multi-cohort, longitudinal experimental (MCLE) design – as one approach to balancing these competing goals of high-quality research. We illustrate the uses and benefits of MCLEs with an example from a research-practice partnership aimed at evaluating the effect of a teacher coaching program. We find that the coaching program failed to replicate its effectiveness with an initial cohort, likely due to changes in personnel, duration, and content. Our analyses can help researchers weigh the tradeoffs of different design features of MCLEs. 

Download pdf here
Kraft MA. Teacher Effects on Complex Cognitive Skills and Social-Emotional Competencies. Journal of Human Resources [Internet]. 2019;54 (1) :1-36. Publisher's VersionAbstract


I exploit the random assignment of class rosters in the MET Project to estimate teacher effects on students’ performance on complex open-ended tasks in math and reading, as well as their growth mindset, grit, and effort in class. I find large teacher effects across this expanded set of outcomes, but weak relationships between these effects and performance measures used in current teacher evaluation systems including value-added to state standardized tests. These findings suggest teacher effectiveness is multidimensional, and high-stakes evaluation decisions are only weakly informed by the degree to which teachers are developing students’ complex cognitive skills and social-emotional competencies.


Download pdf here
Kraft MA. Federal efforts to improve teacher quality. In: Bush-Obama School Reform: Lessons Learned. Harvard Education Press ; 2018. pp. 69-84.Abstract
Bush’s and Obama’s federal education reforms were remarkably similar in their goals and ambitions. Bush’s No Child Left Behind (NCLB) Act and Obama’s Race to the Top (RTTT) and NCLB state waiver programs leveraged federal funding and authority to address four broad areas: academic standards, data and accountability, teacher quality, and school turnarounds. This chapter focuses specifically on how these efforts have influenced the teaching profession. During Bush’s and Obama’s combined sixteen years in office, the federal government succeeded in fundamentally changing licensure requirements and evaluation systems for public school teachers. Reflecting on the successes and failures of these reforms provides important lessons about the potential and limitations of federal policy as a tool for improving the quality of the US teacher workforce.
download pdf here
Kraft MA, Blazar D. Taking Teacher Coaching To Scale: Can Personalized Training Become Standard Practice?. Education Next [Internet]. 2018;18 (4). Publisher's Version Download pdf here
Kraft MA, Blazar D, Hogan D. The Effect of Teacher Coaching on Instruction and Achievement: A Meta-Analysis of the Causal Evidence. Review of Educational Research [Internet]. 2018;88 (4) :547-588. Publisher's VersionAbstract

Teacher coaching has emerged as a promising alternative to traditional models of professional development.  We review the empirical literature on teacher coaching and conduct meta-analyses to estimate the mean effect of coaching programs on teachers’ instructional practice and students’ academic achievement.  Combining results across 60 studies that employ causal research designs, we find pooled effect sizes of 0.49 standard deviations (SD) on instruction and 0.18 SD on achievement.  Much of this evidence comes from literacy coaching programs for pre-kindergarten and elementary school teachers.  Although these findings affirm the potential of coaching as a development tool, further analyses illustrate the challenges of taking coaching programs to scale while maintaining effectiveness.  Average effects from effectiveness trials of larger programs are only a fraction of the effects found in efficacy trials of smaller programs. We conclude by discussing ways to address scale-up implementation challenges and providing guidance for future causal studies.

Download pdf here
Papay JP, Kraft MK. Developing Workplaces Where Teachers Stay, Improve, and Succeed. In: Teaching in Context: How Social Aspects of School and School Systems Shape Teachers’ Development & Effectiveness. Cambridge: Harvard Education Press ; 2017. pp. 15-35. Publisher's Version Download pdf here
Kraft MA. Engaging Parents Through Better Communication Systems. Educational Leadership [Internet]. 2017;71 (1) :58-62. Publisher's VersionAbstract

New research shows that frequent, personalized outreach to parents works. How can schools support the practice?

Download pdf here
Kraft MA, Monti-Nussbaum M. Can schools enable parents to prevent summer learning loss? A text messaging field experiment to promote literacy skills. The ANNALS of the American Academy of Political and Social Science [Internet]. 2017;674 (1) :85-112. Publisher's VersionAbstract

The vast differences in summer learning activities among children present a substantial challenge to providing equal educational opportunity in the United States. Most initiatives
aimed at reversing summer learning loss focus on school- or center-based programs. This study explores the potential of enabling parents to provide literacy development opportunities at home as a low-cost alternative. We conduct a randomized field trial of a summer text-messaging pilot program for parents focused on promoting literacy skills among first through fourth graders. We find positive effects on reading comprehension among third and fourth graders, with effect sizes of .21 to .29 standard deviations, but no effects for first and second graders. Texts also increased attendance at parent-teacher conferences but not at other school-related activities. Evidence to inform future efforts to reverse summer learning loss is
provided by parents’ responses to a follow-up survey.

Download pdf here
Steinberg M, Kraft MA. The Sensitivity of Teacher Performance Ratings to the Design of Teacher Evaluation Systems. Educational Researcher [Internet]. 2017;46 (7) :378-396. Publisher's VersionAbstract

In recent years, states and districts have responded to federal incentives by instituting major reforms to their teacher evaluation systems. The passage of the Every Student Succeeds Act in 2015 now provides policymakers with even greater autonomy to redesign existing evaluation systems. Yet, little evidence exists to inform decisions about two key system design features – teacher performance measure weights and performance ratings thresholds. Using data from the Measures of Effective Teaching study, we conduct simulation-based analyses that illustrate the critical role that performance measure weights and ratings thresholds play in determining teachers’ summative evaluation ratings and the distribution of teacher proficiency rates. These findings offer insights to policymakers and administrators as they refine and possibly remake teacher evaluation systems. 

Download pdf here
Blazar D, Kraft MA. Teacher and teaching effects on students’ attitudes and behaviors. Educational Evaluation and Policy Analysis [Internet]. 2017;39 (1) :146-170. Publisher's VersionAbstract


Research has focused predominantly on how teachers affect students’ achievement on tests despite evidence that a broad range of attitudes and behaviors are equally important to their long-term success. We find that upper-elementary teachers have large effects on self-reported measures of students’ self-efficacy in math, and happiness and behavior in class. Students’ attitudes and behaviors are predicted by teaching practices most proximal to these measures, including teachers’ emotional support and classroom organization. However, teachers who are effective at improving test scores often are not equally effective at improving students’ attitudes and behaviors. These findings lend empirical evidence to well-established theory on the multidimensional nature of teaching and the need to identify strategies for improving the full range of teachers’ skills.

Download pdf here
Charner-Laird M, Ng M, Johnson SM, Kraft MA, Papay JP, S.K R. Gauging goodness of fit: Teachers’ expectations for their instructional teams in high-poverty schools. American Journal of Education [Internet]. 2017;123 (4) :553-584. Publisher's VersionAbstract

Teacher teams are increasingly common in urban schools. Here we analyze teachers'
responses to teams in six high-poverty schools. Teachers used two criteria to assess
teams' "goodness of fit" in meeting the demands of their work—whether their team
helped them teach better and whether it contributed to a better school. Their responses
differed notably by school, depending largely on the principal's approach to
implementation. In the three schools where teachers assessed teams favorably,
principals set a meaningful purpose for teachers' collaborative work, contributed
structural and professional expertise for their deliberations, and established a safe
environment for teachers' on-the-job growth.

Kraft MA, Gilmour AF. Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness. Educational Researcher [Internet]. 2017;46 (5) :234-249. Publisher's VersionAbstract

In 2009, The New Teacher Project (TNTP)’s The Widget Effect documented the failure to recognize and act on differences in teacher effectiveness. We revisit these findings by compiling teacher performance ratings across 24 states that adopted major reforms to their teacher evaluation systems. In the vast majority of these states, the percentage of teachers rated Unsatisfactory remains less than 1%. However, the full distributions of ratings vary widely across states with 0.7% to 28.7% rated below Proficient and 6% to 62% rated above Proficient. We present original survey data from an urban district illustrating that evaluators perceive more than three times as many teachers in their schools to be below Proficient than they rate as such. Interviews with principals reveal several potential explanations for these patterns.

Download pdf here
Kraft MA, Blazar DL. Individualized Coaching to Improve Teacher Practice Across Grades and Subjects: New Experimental Evidence. Educational Policy [Internet]. 2017;31 (7) :1033-1068. Publisher's VersionAbstract

This paper analyzes a coaching model focused on classroom management skills and instructional practices across grade levels and subject areas. We describe the design and implementation of MATCH Teacher Coaching among an initial cohort of fifty-nine teachers working in New Orleans charter schools.  We evaluate the effect of the program on teachers’ instructional practices using a block randomized trial and find that coached teachers scored 0.59 standard deviations higher on an index of effective teaching practices comprised of observation scores, principal evaluations, and student surveys. We discuss implementation challenges and make recommendations for researcher-practitioner partnerships to address key remaining questions.

Download pdf here
Papay JP, Kraft MA. The Myth of the Teacher Performance Plateau. Educational Leadership [Internet]. 2016;73 (8) :36-42. Publisher's VersionAbstract

It’s almost accepted as fact that teachers don’t improve after their first few years on the job. New research challenges this common assumption.

Download pdf here