Washington, D.C. – Classroom observations did not reliably identify individual teachers’ strengths and weaknesses, finds an American Institutes for Research (AIR) study examining performance feedback for teachers and principals. The Institute of Education Sciences at the U.S. Department of Education published the report.
“In theory, educator evaluation systems can be used as a tool to improve student achievement,” said Andrew Wayne, a report co-author. “We learned that tools designed to measure many aspects of a teacher’s classroom practice or a principal’s leadership do not reliably tell us which aspect they need most help with.”
Many states and school districts see evaluation systems as a way to improve educators’ instruction and student achievement. The AIR study is examining the impact of a two-year intervention that shares performance feedback with teachers and principals. Informed by recent research, the intervention gave teachers feedback on classroom practice four times a year, as well as annual feedback on their contributions to student achievement. Principals received feedback on their leadership.
More than 100 elementary and middle schools in eight school districts and five states were part of the randomized control trial. About 1,000 fourth- through eighth-grade math and English teachers participated in the study. It was designed to track the intervention’s implementation and its effect on teacher classroom practice, principal leadership and student achievement.
The new study report is the first of two. It focuses primarily on the implementation of the intervention in its first year. The authors found that:
- Teachers in intervention schools received more than four times as many rounds of feedback on their classroom performance that included ratings and a written narrative than teachers who were in schools without the intervention.
- Even though observers completed specialized training and passed tests on how to rate instruction, they gave teachers performance ratings that were above the mid-point on the rating scale, leaving them little room to grow.
The authors also examined the reliability of the ratings given to teachers and principals—that is, the degree to which the ratings provided a consistent message from occasion to occasion or rater to rater. The authors found that:
- The average scores from four classroom observations provided some reliable information about the quality of a teacher’s practice. A two-year average of their value-added scores (or a teacher’s contributions to their students’ growth) was also reasonably reliable. But scores from a single observation had limited reliability, meaning that teachers received different messages about their overall performance from one observation to the next.
- Teachers received a report with ratings on several dimensions of their practice each time they were observed. The classroom practice reports usually showed that teachers performed better on some dimensions and worse on others. However, the results varied from one observation to the next. Even after averaging the scores from four observations, the dimension ratings did not reliably identify an individual teacher’s weaknesses, so it was not clear which aspect of teaching they need the most help with.
The study’s second report will examine the impact of the two-year intervention on teacher classroom practice, principal leadership and student achievement.
Established in 1946, with headquarters in Washington, D.C., the American Institutes for Research (AIR) is a nonpartisan, not-for-profit organization that conducts behavioral and social science research and delivers technical assistance both domestically and internationally in the areas of health, education, and workforce productivity. For more information, visit www.air.org.