Kraft MA, Gilmour AF. Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness. Educational Researcher [Internet]. 2017;46 (5) :234-249. Publisher's VersionAbstract

In 2009, The New Teacher Project (TNTP)’s The Widget Effect documented the failure to recognize and act on differences in teacher effectiveness. We revisit these findings by compiling teacher performance ratings across 24 states that adopted major reforms to their teacher evaluation systems. In the vast majority of these states, the percentage of teachers rated Unsatisfactory remains less than 1%. However, the full distributions of ratings vary widely across states with 0.7% to 28.7% rated below Proficient and 6% to 62% rated above Proficient. We present original survey data from an urban district illustrating that evaluators perceive more than three times as many teachers in their schools to be below Proficient than they rate as such. Interviews with principals reveal several potential explanations for these patterns.

Download pdf here
Kraft MA, Blazar DL. Individualized Coaching to Improve Teacher Practice Across Grades and Subjects: New Experimental Evidence. Educational Policy [Internet]. 2017;31 (7) :1033-1068. Publisher's VersionAbstract

This paper analyzes a coaching model focused on classroom management skills and instructional practices across grade levels and subject areas. We describe the design and implementation of MATCH Teacher Coaching among an initial cohort of fifty-nine teachers working in New Orleans charter schools.  We evaluate the effect of the program on teachers’ instructional practices using a block randomized trial and find that coached teachers scored 0.59 standard deviations higher on an index of effective teaching practices comprised of observation scores, principal evaluations, and student surveys. We discuss implementation challenges and make recommendations for researcher-practitioner partnerships to address key remaining questions.

Download pdf here
Papay JP, Kraft MA. The Myth of the Teacher Performance Plateau. Educational Leadership [Internet]. 2016;73 (May) :36-42. Publisher's VersionAbstract

It’s almost accepted as fact that teachers don’t improve after their first few years on the job. New research challenges this common assumption.

Download pdf here
West MR, Kraft MA, Finn AS, Martin RE, Duckworth AL, Gabrieli CFO, Gabrieli JDE. Promise and Paradox: Measuring Students’ Non-cognitive Skills and the Impact of Schooling. Educational Evaluational and Policy Analysis [Internet]. 2016;38 (1) :148-170. Publisher's VersionAbstract

We used self-report surveys to gather information on a broad set of non-cognitive skills from 1,368 8th-graders. At the student level, scales measuring conscientiousness, self-control, grit, and growth mindset are positively correlated with attendance, behavior, and test-score gains between 4th- and 8th-grade. Conscientiousness, self-control, and grit are unrelated to test-score gains at the school level, however, and students attending over-subscribed charter schools score lower on these scales than do students attending district schools. Exploiting admissions lotteries, we find positive impacts of charter school attendance on achievement and attendance but negative impacts on these non-cognitive skills. We provide suggestive evidence that these paradoxical results are driven by reference bias, or the tendency for survey responses to be influenced by social context.

Download pdf here
Papay JP, Kraft MA. The productivity costs of inefficient hiring practices: Evidence from late teacher hiring. Journal of Policy Analysis and Management [Internet]. 2016;35 (4) :791-817. Publisher's VersionAbstract

We use matched employee-employer records from the teacher labor market to explore the trade-offs between the timing of hiring and match quality. Hiring teachers after the school year starts reduces student achievement by 0.042SD in mathematics and 0.026SD in reading. This reflects, in part, a temporary disruption effect in the first year. In mathematics, but not in reading, late-hired teachers remain persistently less effective, evidence of negative selection in the teacher labor market. Late hiring concentrates in schools that disproportionately serve disadvantaged student populations, contributing to challenges in ensuring an equitable distribution of educational resources across students.

Download pdf here
Kraft MA, Gilmour A. Can Principals Promote Teacher Development as Evaluators? A Case Study of Principals’ Views and Experiences. Educational Administration Quarterly [Internet]. 2016;52 (5) :711-753. Publisher's VersionAbstract

Purpose: New teacher evaluation systems have expanded the role of principals as instructional leaders, but little is known about principals’ ability to promote teacher development through the evaluation process. We conducted a case study of principals’ perspectives on evaluation and their experiences implementing observation and feedback cycles to better understand whether principals feel as though they are able to promote teacher development as evaluators.


Research Methods: We conducted interviews with a stratified random sample of 24 principals in an urban district that recently implemented major reforms to its teacher evaluation system. We analyzed these interviews by drafting thematic summaries, coding interview transcripts, creating data-analytic matrices, and writing analytic memos.


Findings: We found that the evaluation reforms provided a common framework and language that helped facilitate principals’ feedback conversations with teachers. However, we also found that tasking principals with primary responsibility for conducting evaluations resulted in a variety of unintended consequences which undercut the quality of evaluation feedback they provided. We analyze five broad solutions to these challenges: strategically targeting evaluations, reducing operational responsibilities, providing principal training, hiring instructional coaches, and developing peer evaluation systems.


Implications: The quality of feedback teachers receive through the evaluation process depends critically on the time and training evaluators have to provide individualized and actionable feedback. Districts that task principals with primary responsibility for conducting observation and feedback cycles must attend to the many implementation challenges associated with this approach in order for next-generation evaluation systems to successfully promote teacher development.

Download pdf here
Kraft MA, Marinell WM, Yee D. School organizational contexts, teacher turnover, and student achievement: Evidence from panel data. American Educational Research Journal [Internet]. 2016;53 (5) :1411-1499. Publisher's VersionAbstract

We study the relationship between school organizational contexts, teacher turnover, and student achievement in New York City (NYC) middle schools. Using factor analysis, we construct measures of four distinct dimensions of school contexts captured on the annual NYC School Survey. We identify credible estimates by isolating variation in organizational contexts within schools over time. We find that improvements in school leadership, academic expectations, teacher relationships, and school safety are all independently associated with corresponding reductions in teacher turnover. Increases in school safety and academic expectations for students also correspond to increases in student achievement. These results are robust to a range of potential threats to validity suggesting that our findings are likely driven by an underlying causal relationship.

Download pdf here
Blazar DL, Kraft MA. Exploring Mechanisms of Effective Teacher Coaching: A Tale of Two Cohorts From a Randomized Experiment. Educational Evaluational and Policy Analysis [Internet]. 2015;37 (4) :542–566. Publisher's VersionAbstract

Although previous research has shown that teacher coaching can improve teaching practices and student achievement, little is known about specific features of effective coaching programs. We estimate the impact of MATCH Teacher Coaching (MTC) on a range of teacher practices using a blocked randomized trial and explore how changes in the coaching model across two cohorts are related to program effects. Findings indicate large positive effects on teachers’ practices in cohort 1 but no effects in cohort 2. After ruling out explanations related to the research design, a set of exploratory analyses suggest that differential treatment effects may be attributable to differences in coach effectiveness and the focus of coaching across cohorts.

Download pdf here
Kraft MA. Teacher layoffs, teacher quality and student achievement: Evidence from a discretionary layoff policy. Education Finance and Policy [Internet]. 2015;10 (4) :467-507. Publisher's VersionAbstract

Most teacher layoffs during the Great Recession were implemented following inverse-seniority policies. In this paper, I examine the implementation of a discretionary layoff policy in Charlotte Mecklenburg Schools. Administrators did not uniformly lay off the most or least senior teachers but instead selected teachers who were previously retired, late-hired, unlicensed, low-performing, or nontenured. Using quasi-experimental variation within schools across grades, I then estimate the differential effects of teacher layoffs on student achievement based on teacher seniority and effectiveness. Mathematics achievement in grades that lost an effective teacher, as measured by principal evaluations or value-added scores, decreased 0.05 to 0.11 standard deviations more than in grades that lost an ineffective teacher. In contrast, teacher seniority has little predictive power on the effects of layoffs. Simulation analyses show that the district selected teachers who were, on average, less effective than those teachers identified under an inverse-seniority policy, and also reduced job losses.

Download pdf here
Kraft MA, Rogers T. The underutilized potential of teacher-to-parent communication: Evidence from a field experiment. Economics of Education Review [Internet]. 2015;47 :49-63. Publisher's VersionAbstract

We study an intervention designed to increase the effectiveness of parental involvement in their children’s education.  Each week we sent brief individualized messages from teachers to the parents of high school students in a credit recovery program.  This light-touch communication increased the probability students earned credits by 6.5 percentage points – a 41% reduction in the proportion failing to earn credit. This improvement resulted primarily from preventing drop-outs, rather than from reducing failure or dismissal rates.  The intervention shaped the content of parent-child conversations with messages emphasizing what students could improve, versus what students were doing well, producing the largest effects.  Our results illustrate the underutilized potential of communication policies with clear but reasonable expectations for teachers and program designs that make communication efficient and effective.

Download pdf here
Kraft MA, Papay JP, Charner-Laird M, Johnson SM, Ng M, Reinhorn S. Educating Amidst Uncertainty: The Organizational Supports Teachers Need to Serve Students in High-poverty, Urban Schools. Educational Administration Quarterly [Internet]. 2015;51 (5) :753-790. WebsiteAbstract


We examine how uncertainty, both about students and the context in which they are taught, remains a persistent condition of teachers’ work in high-poverty, urban schools. We describe six schools’ organizational responses to these uncertainties, analyze how these responses reflect more open- versus closed-system approaches, and examine how this orientation affects teachers’ work.


Research Methods

We draw on interviews with a diverse set of 95 teachers and administrators across a purposive sample of six high-poverty, urban schools in one district.  We analyzed these interviews by drafting thematic summaries, coding interview transcripts, creating data-analytic matrices, and writing analytic memos.



We find that students introduced considerable uncertainty into teachers’ work.  Although most teachers we spoke with embraced the challenges of their work and the expanded responsibilities that it entailed, they recognized that their individual efforts were not sufficient to succeed.  Teachers consistently spoke about the need for organizational responses that addressed the environmental uncertainty of working with students from low-income families whose experience in school often has been unsuccessful. We describe four types of organizational responses — coordinated instructional supports, systems to promote order and discipline, socio-emotional supports for students, and efforts to engage parents — and illustrate how these responses affected teachers’ ability to manage the uncertainty introduced by their environment. 



Traditional public schools are open systems, and require systematic organizational responses to address the uncertainty introduced by their environments.  Uncoordinated individual efforts alone are not sufficient to meet the needs of students in high-poverty urban communities.

Download pdf here
Papay JP, Kraft MA. Productivity returns to experience in the teacher labor market: Methodological challenges and new evidence on long-term career improvement. Journal of Public Economics [Internet]. 2015;130 :105-119. Publisher's VersionAbstract

We present new evidence on the relationship between teacher productivity and job experience. Econometric challenges require identifying assumptions to model the within-teacher returns to experience with teacher fixed effects. We describe the bias introduced by violations of different identifying assumptions, including a new approach that we propose. Consistent with past research, we find that teachers experience rapid productivity improvement early in their careers. However, we find suggestive evidence of returns to experience later in the career, indicating that teachers continue to build human capital beyond these first years.

Download pdf here
Kraft MA. How to Make Additional Time Matter: Integrating Individualized Tutorials into an Extended Day. Education Finance and Policy [Internet]. 2015;10 (1) :81-116. WebsiteAbstract

Support for extending the school day has gained substantial momentum despite limited causal evidence that it increases student achievement. Existing evidence is decidedly mixed, in part, because of the stark differences in how schools use additional time. In this paper, I focus on the effect of additional time in school when that time is used for individualized tutorials. In 2005, MATCH Charter Public High School integrated two hours of individualized tutorials throughout an extended school day. The unanticipated implementation of this initiative and the school’s lottery enrollment policy allow me to use two complementary quasi-experimental methods to estimate program effects. I find that providing students with two hours of daily tutorials that are integrated into the school day and taught by full-time, recent college graduates increased achievement on 10th grade English language arts exams by 0.15- 0.25 standard deviations per year. I find no average effect in mathematics beyond the large gains students were already achieving, although quantile regression estimates suggest that the tutorials raised the lowest end of the achievement distribution in mathematics.

Download copy of pdf
Fin A, Kraft MA, West MR, Leonard JA, Bisch CE, Martin RE, Sheridan MA, Gabrieli CF, Gabrieli JD. Cognitive Skills, Student Achievement Tests, and Schools. Psychological Science [Internet]. 2014;25 (3) :736-744. Publisher's VersionAbstract

Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement test scores to measures of cognitive skills in a large sample (N=1,367) of 8th-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after controlling for 4th-grade test scores. Random offers of enrollment to over-subscribed charter schools resulted in positive impacts of such school attendance on math achievement, but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement tests do so primarily through channels other than cognitive skills.

Johnson SM, Reinhorn SK, Charner-Laird M, Kraft MA, Ng M, Papay JP. Ready to Lead, But How? Teachers’ Experiences in High-poverty Urban Schools. Teachers College Record [Internet]. 2014;116 (10) :1-50. Publisher's VersionAbstract

We examine how teachers in six high-poverty, urban schools of one school district respond to and participate in leadership beyond their classrooms. We found that teachers wanted to participate in developing and implementing their school’s plans for change. When teachers believed that their principal took an inclusive approach to leadership, looking to them for ideas about how to improve the school, they were energized, committed to the joint effort and readily remained “in the game.” However, when teachers thought that their principal took an instrumental approach to leadership, expecting them to comply with fixed plans or to passively endorse administrative decisions, they expressed frustration and tended to withdraw to their classroom, sometimes intending to leave the school.

Download pdf here
Kraft MA, Papay JP. Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Effectiveness and Policy Analysis [Internet]. 2014;36 (4) :476-500. Publisher's VersionAbstract

Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.

Download pdf here
Herlihy C, Karger E, Pollard C, Hill HC, Kraft MA, Williams M, Howard S. State and local efforts to investigate the validity and reliability of scores from teacher evaluation systems. Teachers College Record [Internet]. 2014;116 (1) :1-28. Publisher's VersionAbstract

Context: In the past two years, states have implemented sweeping reforms to their teacher evaluation systems in response to Race to the Top legislation and, more recently, NCLB waivers. With these new systems, policy-makers hope to make teacher evaluation both more rigorous and more grounded in specific job performance domains such as teaching quality and contributions to student outcomes. Attaching high stakes to teacher scores has prompted an increased focus on the reliability and validity of these scores. Teachers unions have expressed strong concerns about the reliability and validity of using student achievement data to evaluate teachers and the potential for subjective ratings by classroom observers to be biased. The legislation enacted by many states also requires scores derived from teacher observations and the overall systems of teacher evaluation to be valid and reliable.

Focus of the study: In this paper, we explore how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on states’ efforts to investigate the reliability and validity of scores emerging from the observational component of these systems.

Research design: Through a document analysis and interviews with state education officials, we explore several issues that arise in observational systems, including the overall generalizability of teacher scores, the training, certification, and reliability of observers, and specifications regarding the sampling and number of lessons observed per teacher.

Findings: Respondents’ reports suggest that states are attending to the reliability and validity of scores, but inconsistently; in only a few states does there appear to be a coherent strategy regarding reliability and validity in place.

Conclusions: There remain a variety of system design and implementation decisions that states can optimize to increase the reliability and validity of their teacher evaluation scores. While a state may engage in auditing scores, for instance, it may miss the gains to reliability and validity that would accrue from periodic rater retraining and recertification, a stiff program of rater monitoring, and the use of multiple raters per teacher. Most troublesome are decisions about which and how many lessons to sample, which are either mandated legislatively, result from practical concerns or negotiations between stakeholders, or, at best case, rest on broad research not directly related to the state context. This suggests that states should more actively investigate the number of lessons and lesson sampling designs required to yield high-quality scores.

Download copy of pdf
Kraft MA, Dougherty SM. The Effect of Teacher-Family Communication on Student Engagement: Evidence from a Randomized Field Experiment. Journal of Research on Educational Effectiveness [Internet]. 2013;6 (3) :199-222. WebsiteAbstract

In this study, we seek to evaluate the efficacy of teacher communication with parents and students as a means of increasing student engagement. We estimate the causal effect of teacher communication by conducting a randomized field experiment in which 6th and 9th grade students were assigned to receive a daily phone call home and a text/written message during a mandatory summer school program. We find that frequent teacher-family communication immediately increased student engagement as measured by homework completion rates, on-task behavior and class participation. On average, teacher-family communication increased the odds a student completed their homework by 42% and decreased instances in which teachers had to redirect students’ attention to the task at hand by 25%. Class participation rates among 6th grade students increased by 49%, while communication appeared to have a small negative effect on 9th grade students’ willingness to participate. Drawing upon surveys and interviews with participating teachers and students, we identify three primary mechanisms through which communication likely affected engagement: stronger teacher-student relationships, expanded parental involvement, and increased student motivation.

Download pdf here
Hill HC, Charalambous CY, Blazar D, McGinn D, Kraft MA, Beisiegel M, Humez A, Litke E, Lynch K. Validating arguments for observational instruments: Attending to multiple sources of variation. Educational Assessment [Internet]. 2012;17 (2-3) :88-106. WebsiteAbstract

Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of
these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the 20 effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to 25 multiple sources of variation.

Download copy of pdf
Hill HC, Charalambous CY, Kraft MA. When rater reliability is not enough: Teacher observation systems and a case for the G-study. Educational Researcher [Internet]. 2012;41 (2) :56-64. WebsiteAbstract

In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluations of classroom-based interventions. While educational practitioners and researchers have developed numerous observational instruments for these purposes, many fail to specify important criteria regarding their use. In this paper, we argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments, but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training and certification. To illustrate how such a system might be developed and improved, we provide an empirical example that applies Generalizability Theory to data from a mathematics observational instrument.

Download copy of pdf