NAEP Validity Studies Panel Responses to the Re-analysis of TUDA Mathematics Scores
The National Assessment of Educational Progress (NAEP) is meant to be reflective of the entirety of what is taught in the United States. However, with many changes over time, such as the adoption of rigorous state college and career readiness standards like the Common Core State Standards (CCSS), there are questions about the extent to which NAEP continues to meet this objective. During this shift in the national educational landscape, the NAEP Validity Studies (NVS) Panel has monitored, studied, and commented on issues such as this potential mismatch between the states’ and NAEP’s assessment content.
Based on the 2017 NAEP Mathematics TUDA (Trial Urban District Assessment) results, it appeared that student performance trends on NAEP were not as positive as student performance trends on the state assessments that were aligned to college and career ready standards. At that time, several leaders in the affected urban districts called for what amounted to a “recount.”
The results of the recent NVS Panel study by Daro et al. (in press) document the extent of alignment between NAEP and state mathematics assessments according to several important dimensions, one of which is content distribution. These results provide an opportunity for further analysis, that was taken up by Dogan (2019; reproduced in the appendix). Dogan’s TUDA reanalysis study was designed to explore whether content misalignment might be a possible reason for the mismatched results for the TUDAs on NAEP and the respective state assessments.
The following research questions were asked:
- How would the 2017 mathematics Grades 4 and 8 TUDA mean scores change if the NAEP subscales were weighted according to the content distribution of selected state assessments?
- How would the mathematics Grades 4 and 8 TUDA mean scores change in 2013, 2015, and 2019 if the NAEP subscales were weighted according to the content emphasis of selected state assessments, assuming the content emphasis of those assessments and NAEP were similar in these years compared with 2017?
This report, authored by NVS panel members and AIR staff, serves as a response from the NVS Panel to the analysis conducted by Dogan (2019).
First, an extensive background section provides context for the motivation behind conducting such an analysis. This section covers important historical background on the alignment of standards and assessments, the implications of rigorous college and career readiness state content standards for NAEP, and the value of alignment studies to investigate the validity of NAEP.
The second section provides two major comments on, and caveats for, the methods used in Dogan’s analysis: 1) statistically overweighting a domain is not likely to produce the same result as creating a test blueprint with a greater emphasis in that domain; and 2) the analysis is limited due to reliance on state assessments as proxies for the opportunity to learn.
The final section considers the implications of Dogan’s results for NAEP and the challenges that would be inherent in considering the reporting of results from such analysis in any official or systematic manner.
The conclusion of the report is that the secondary analysis done by Dogan for the NAEP TUDA scores is important and worthy of further exploration as part of ongoing efforts to monitor the validity of NAEP. However, such analyses should not be used in the reporting of any official statistics or even as a recurring set of ancillary results or appendix material. To the extent that there is a real and educationally significant mismatch between the content covered on NAEP and that in the states, the best way to ameliorate this is by modifying the NAEP frameworks, not through post hoc reweighting of the NAEP results. Such an updating of the Mathematics framework is already underway by the National Assessment Governing Board.