I am a strong advocate for using student test scores to assess teacher quality and the most effective means to accomplish this are value added measures (VAM scores), designed to assess students’ academic growth in a teacher's class compared to their peers. VAM is an important tool for all who wish to strengthen American education and seek to ensure that all our students have excellent teachers. But there are challenges in VAM use. After linking VAM to teacher evaluations, Connecticut, the District of Columbia, New York and others are now looking to drop VAM from evaluations, as states like Tennessee are fighting for the ability to continue to consider VAM scores. And as
Education Week reported, even early VAM adopters like Georgia, Louisiana and Tennessee are now seeing that, despite the best of intentions, linking teacher preparation to VAM is a challenging task requiring them to explore reducing VAM’s impact on evaluating teacher performance. The Woodrow Wilson Foundation has faced similar challenges. Since our first Teaching Fellows finished their preparation for teaching in 2010, the Woodrow Wilson National Fellowship Foundation sought to measure fellows’ classroom effectiveness. To evaluate these educators, the foundation uses state data to determine VAM scores. So far, three of our five partner states are able to provide the test data necessary to calculate VAM scores. In those states, even when student achievement data is available, VAM measures and related sampling issues make VAM problematic in practice. These are problems that all those using current VAM models face.
Few states test students, particularly those in middle or high school, in the academic subjects we need data for.
Despite the growing focus on STEM instruction in the United States, student achievement data is still not available in a number of science-related subjects. Thus, it is not possible to assess the performance of teachers in many of the courses we expect high school students to take.
In small classes—generally fewer than seven students, depending on the VAM method—obtaining meaningful scores has been impossible. This is a serious problem in rural and some urban schools, where enrollments in many classes are low.
To be statistically significant, VAM requires relatively large numbers of teachers to be teaching in the same tested subjects, grades, schools and districts.
Unfortunately, this isn’t the reality that many teachers, particularly those in high-need schools, experience. In one state where our Teaching Fellows are working, fellows were teaching 66 different subjects in one academic year, and the number teaching any particular course was relatively small. This made most assessments statistically insignificant.
Practically, this means VAM scores, as well as other types of teacher performance measures, cannot be used to gauge the effectiveness of teacher education programs at specific universities in fields that produce small numbers of graduates, like the STEM subjects, as well as other high-need fields like special education and English language learning.
The contribution of educators who replaced the original teacher of record—which happens often, particularly in high-need schools—becomes difficult to determine.
In many of these schools, the current teacher of record is the second or third teacher in the classroom in a given academic year. These educators do not work in isolation and benefit from working with colleagues and contributing to the school environment, components not accounted for under the current VAM model. This is one more challenge in obtaining meaningful data on teacher effectiveness.
The Future of VAM
While the idea and intent of VAM scores remains excellent, in current practice they have proven inadequate to measure teacher effectiveness. VAM 1.0 may be sufficient to measure results in teacher education programs that produce large numbers of graduates (such as elementary education) and in high-enrollment, required secondary subjects such as social studies and English. But the future of VAM requires a model that excels in measuring all teacher education programs, regardless of size or content area. Yes, we will continue to collect VAM data on our fellows, just as states will continue to collect data on their teachers, but we have also begun to explore other assessments and measures that will produce more practical and useful results. First-generation VAM has taught us a lot, including what more we must learn about our teachers and their classroom experiences. There is a pressing need to do more in assessment if the nation is to move to an education system more concerned with what students have learned than what their teachers have taught. In order to determine teacher effectiveness in the years ahead, we need to supplement VAM scores with other measures of student growth, further develop state data systems on student achievement, and create more advanced and sensitive 2.0 versions of VAM assessment. We need to apply what has been learned and develop a next-generation VAM that will help strengthen teaching and learning for the nation’s children.