In 2009, The New Teacher Project (TNTP)’s The Widget Effect documented the failure to recognize and act on differences in teacher effectiveness. We revisit these findings by compiling teacher performance ratings across 24 states that adopted major reforms to their teacher evaluation systems. In the vast majority of these states, the percentage of teachers rated Unsatisfactory remains less than 1%. However, the full distributions of ratings vary widely across states with 0.7% to 28.7% rated below Proficient and 6% to 62% rated above Proficient. We present original survey data from an urban district illustrating that evaluators perceive more than three times as many teachers in their schools to be below Proficient than they rate as such. Interviews with principals reveal several potential explanations for these patterns.
This paper analyzes a coaching model focused on classroom management skills and instructional practices across grade levels and subject areas. We describe the design and implementation of MATCH Teacher Coaching among an initial cohort of fifty-nine teachers working in New Orleans charter schools. We evaluate the effect of the program on teachers’ instructional practices using a block randomized trial and find that coached teachers scored 0.59 standard deviations higher on an index of effective teaching practices comprised of observation scores, principal evaluations, and student surveys. We discuss implementation challenges and make recommendations for researcher-practitioner partnerships to address key remaining questions.
We used self-report surveys to gather information on a broad set of non-cognitive skills from 1,368 8th-graders. At the student level, scales measuring conscientiousness, self-control, grit, and growth mindset are positively correlated with attendance, behavior, and test-score gains between 4th- and 8th-grade. Conscientiousness, self-control, and grit are unrelated to test-score gains at the school level, however, and students attending over-subscribed charter schools score lower on these scales than do students attending district schools. Exploiting admissions lotteries, we find positive impacts of charter school attendance on achievement and attendance but negative impacts on these non-cognitive skills. We provide suggestive evidence that these paradoxical results are driven by reference bias, or the tendency for survey responses to be influenced by social context.
We use matched employee-employer records from the teacher labor market to explore the trade-offs between the timing of hiring and match quality. Hiring teachers after the school year starts reduces student achievement by 0.042SD in mathematics and 0.026SD in reading. This reflects, in part, a temporary disruption effect in the first year. In mathematics, but not in reading, late-hired teachers remain persistently less effective, evidence of negative selection in the teacher labor market. Late hiring concentrates in schools that disproportionately serve disadvantaged student populations, contributing to challenges in ensuring an equitable distribution of educational resources across students.
Purpose: New teacher evaluation systems have expanded the role of principals as instructional leaders, but little is known about principals’ ability to promote teacher development through the evaluation process. We conducted a case study of principals’ perspectives on evaluation and their experiences implementing observation and feedback cycles to better understand whether principals feel as though they are able to promote teacher development as evaluators.
Research Methods: We conducted interviews with a stratified random sample of 24 principals in an urban district that recently implemented major reforms to its teacher evaluation system. We analyzed these interviews by drafting thematic summaries, coding interview transcripts, creating data-analytic matrices, and writing analytic memos.
Findings: We found that the evaluation reforms provided a common framework and language that helped facilitate principals’ feedback conversations with teachers. However, we also found that tasking principals with primary responsibility for conducting evaluations resulted in a variety of unintended consequences which undercut the quality of evaluation feedback they provided. We analyze five broad solutions to these challenges: strategically targeting evaluations, reducing operational responsibilities, providing principal training, hiring instructional coaches, and developing peer evaluation systems.
Implications: The quality of feedback teachers receive through the evaluation process depends critically on the time and training evaluators have to provide individualized and actionable feedback. Districts that task principals with primary responsibility for conducting observation and feedback cycles must attend to the many implementation challenges associated with this approach in order for next-generation evaluation systems to successfully promote teacher development.
We study the relationship between school organizational contexts, teacher turnover, and student achievement in New York City (NYC) middle schools. Using factor analysis, we construct measures of four distinct dimensions of school contexts captured on the annual NYC School Survey. We identify credible estimates by isolating variation in organizational contexts within schools over time. We find that improvements in school leadership, academic expectations, teacher relationships, and school safety are all independently associated with corresponding reductions in teacher turnover. Increases in school safety and academic expectations for students also correspond to increases in student achievement. These results are robust to a range of potential threats to validity suggesting that our findings are likely driven by an underlying causal relationship.
Although previous research has shown that teacher coaching can improve teaching practices and student achievement, little is known about specific features of effective coaching programs. We estimate the impact of MATCH Teacher Coaching (MTC) on a range of teacher practices using a blocked randomized trial and explore how changes in the coaching model across two cohorts are related to program effects. Findings indicate large positive effects on teachers’ practices in cohort 1 but no effects in cohort 2. After ruling out explanations related to the research design, a set of exploratory analyses suggest that differential treatment effects may be attributable to differences in coach effectiveness and the focus of coaching across cohorts.
Most teacher layoffs during the Great Recession were implemented following inverse-seniority policies. In this paper, I examine the implementation of a discretionary layoff policy in Charlotte Mecklenburg Schools. Administrators did not uniformly lay off the most or least senior teachers but instead selected teachers who were previously retired, late-hired, unlicensed, low-performing, or nontenured. Using quasi-experimental variation within schools across grades, I then estimate the differential effects of teacher layoffs on student achievement based on teacher seniority and effectiveness. Mathematics achievement in grades that lost an effective teacher, as measured by principal evaluations or value-added scores, decreased 0.05 to 0.11 standard deviations more than in grades that lost an ineffective teacher. In contrast, teacher seniority has little predictive power on the effects of layoffs. Simulation analyses show that the district selected teachers who were, on average, less effective than those teachers identified under an inverse-seniority policy, and also reduced job losses.
We study an intervention designed to increase the effectiveness of parental involvement in their children’s education. Each week we sent brief individualized messages from teachers to the parents of high school students in a credit recovery program. This light-touch communication increased the probability students earned credits by 6.5 percentage points – a 41% reduction in the proportion failing to earn credit. This improvement resulted primarily from preventing drop-outs, rather than from reducing failure or dismissal rates. The intervention shaped the content of parent-child conversations with messages emphasizing what students could improve, versus what students were doing well, producing the largest effects. Our results illustrate the underutilized potential of communication policies with clear but reasonable expectations for teachers and program designs that make communication efficient and effective.
We examine how uncertainty, both about students and the context in which they are taught, remains a persistent condition of teachers’ work in high-poverty, urban schools. We describe six schools’ organizational responses to these uncertainties, analyze how these responses reflect more open- versus closed-system approaches, and examine how this orientation affects teachers’ work.
We draw on interviews with a diverse set of 95 teachers and administrators across a purposive sample of six high-poverty, urban schools in one district. We analyzed these interviews by drafting thematic summaries, coding interview transcripts, creating data-analytic matrices, and writing analytic memos.
We find that students introduced considerable uncertainty into teachers’ work. Although most teachers we spoke with embraced the challenges of their work and the expanded responsibilities that it entailed, they recognized that their individual efforts were not sufficient to succeed. Teachers consistently spoke about the need for organizational responses that addressed the environmental uncertainty of working with students from low-income families whose experience in school often has been unsuccessful. We describe four types of organizational responses — coordinated instructional supports, systems to promote order and discipline, socio-emotional supports for students, and efforts to engage parents — and illustrate how these responses affected teachers’ ability to manage the uncertainty introduced by their environment.
Traditional public schools are open systems, and require systematic organizational responses to address the uncertainty introduced by their environments. Uncoordinated individual efforts alone are not sufficient to meet the needs of students in high-poverty urban communities.
We present new evidence on the relationship between teacher productivity and job experience. Econometric challenges require identifying assumptions to model the within-teacher returns to experience with teacher fixed effects. We describe the bias introduced by violations of different identifying assumptions, including a new approach that we propose. Consistent with past research, we find that teachers experience rapid productivity improvement early in their careers. However, we find suggestive evidence of returns to experience later in the career, indicating that teachers continue to build human capital beyond these first years.
Support for extending the school day has gained substantial momentum despite limited causal evidence that it increases student achievement. Existing evidence is decidedly mixed, in part, because of the stark differences in how schools use additional time. In this paper, I focus on the effect of additional time in school when that time is used for individualized tutorials. In 2005, MATCH Charter Public High School integrated two hours of individualized tutorials throughout an extended school day. The unanticipated implementation of this initiative and the school’s lottery enrollment policy allow me to use two complementary quasi-experimental methods to estimate program effects. I find that providing students with two hours of daily tutorials that are integrated into the school day and taught by full-time, recent college graduates increased achievement on 10th grade English language arts exams by 0.15- 0.25 standard deviations per year. I find no average effect in mathematics beyond the large gains students were already achieving, although quantile regression estimates suggest that the tutorials raised the lowest end of the achievement distribution in mathematics.
Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement test scores to measures of cognitive skills in a large sample (N=1,367) of 8th-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after controlling for 4th-grade test scores. Random offers of enrollment to over-subscribed charter schools resulted in positive impacts of such school attendance on math achievement, but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement tests do so primarily through channels other than cognitive skills.
We examine how teachers in six high-poverty, urban schools of one school district respond to and participate in leadership beyond their classrooms. We found that teachers wanted to participate in developing and implementing their school’s plans for change. When teachers believed that their principal took an inclusive approach to leadership, looking to them for ideas about how to improve the school, they were energized, committed to the joint effort and readily remained “in the game.” However, when teachers thought that their principal took an instrumental approach to leadership, expecting them to comply with fixed plans or to passively endorse administrative decisions, they expressed frustration and tended to withdraw to their classroom, sometimes intending to leave the school.
Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.
Context: In the past two years, states have implemented sweeping reforms to their teacher evaluation systems in response to Race to the Top legislation and, more recently, NCLB waivers. With these new systems, policy-makers hope to make teacher evaluation both more rigorous and more grounded in specific job performance domains such as teaching quality and contributions to student outcomes. Attaching high stakes to teacher scores has prompted an increased focus on the reliability and validity of these scores. Teachers unions have expressed strong concerns about the reliability and validity of using student achievement data to evaluate teachers and the potential for subjective ratings by classroom observers to be biased. The legislation enacted by many states also requires scores derived from teacher observations and the overall systems of teacher evaluation to be valid and reliable.
Focus of the study: In this paper, we explore how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on states’ efforts to investigate the reliability and validity of scores emerging from the observational component of these systems.
Research design: Through a document analysis and interviews with state education officials, we explore several issues that arise in observational systems, including the overall generalizability of teacher scores, the training, certification, and reliability of observers, and specifications regarding the sampling and number of lessons observed per teacher.
Findings: Respondents’ reports suggest that states are attending to the reliability and validity of scores, but inconsistently; in only a few states does there appear to be a coherent strategy regarding reliability and validity in place.
Conclusions: There remain a variety of system design and implementation decisions that states can optimize to increase the reliability and validity of their teacher evaluation scores. While a state may engage in auditing scores, for instance, it may miss the gains to reliability and validity that would accrue from periodic rater retraining and recertification, a stiff program of rater monitoring, and the use of multiple raters per teacher. Most troublesome are decisions about which and how many lessons to sample, which are either mandated legislatively, result from practical concerns or negotiations between stakeholders, or, at best case, rest on broad research not directly related to the state context. This suggests that states should more actively investigate the number of lessons and lesson sampling designs required to yield high-quality scores.
In this study, we seek to evaluate the efficacy of teacher communication with parents and students as a means of increasing student engagement. We estimate the causal effect of teacher communication by conducting a randomized field experiment in which 6th and 9th grade students were assigned to receive a daily phone call home and a text/written message during a mandatory summer school program. We find that frequent teacher-family communication immediately increased student engagement as measured by homework completion rates, on-task behavior and class participation. On average, teacher-family communication increased the odds a student completed their homework by 42% and decreased instances in which teachers had to redirect students’ attention to the task at hand by 25%. Class participation rates among 6th grade students increased by 49%, while communication appeared to have a small negative effect on 9th grade students’ willingness to participate. Drawing upon surveys and interviews with participating teachers and students, we identify three primary mechanisms through which communication likely affected engagement: stronger teacher-student relationships, expanded parental involvement, and increased student motivation.
Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the 20 effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to 25 multiple sources of variation.
In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluations of classroom-based interventions. While educational practitioners and researchers have developed numerous observational instruments for these purposes, many fail to specify important criteria regarding their use. In this paper, we argue that for classroom observation to succeed in its aims, improved observational systems must be developed. These systems should include not only observational instruments, but also scoring designs capable of producing reliable and cost-efficient scores and processes for rater recruitment, training and certification. To illustrate how such a system might be developed and improved, we provide an empirical example that applies Generalizability Theory to data from a mathematics observational instrument.