A Comparison Study of the Program for International Student Assessment (PISA) 2012 and the National Assessment of Educational Progress (NAEP) 2013 Mathematics Assessments

Kim Gattis, Young Yee Kim, Maria Stephens, Linda Dager Hall, Fei Liu, and Juliet Holmes

In the United States, nationally representative data on student achievement come primarily from two sources: the National Assessment of Educational Progress (NAEP)—also known as “The Nation’s Report Card”—and U.S. participation in international assessments, including the Program for International Student Assessment (PISA). Together, these national and international sources provide important information on the performance of U.S. students in key subjects, such as mathematics, science, and reading. While the national assessment provides data on achievement that is tailored to students’ school experiences in the United States, the international assessments allow U.S. student performance to be benchmarked to that of students in other countries.

This study compares the mathematics frameworks and item pools used in NAEP with PISA and vice versa. In addition to that, differences in item features between the two assessments are described.

This paper is part of a series of AIR-NAEP working papers that showcase AIR’s expertise and experience not only with NAEP but with other large-scale assessments and survey-based longitudinal studies. Explore all the AIR-NAEP working papers.

In the winter of 2013, results from both assessments were made publically available in the area of mathematics—for the eighth assessed by NAEP 2013 and the 15-year-olds assessed by PISA 2012. NCES thus commissioned a study to compare these two mathematics assessments so that researchers, educators, the mathematics community, and the public can gain a deeper understanding of the mathematics being assessed in each program, how it compares and contrasts, and what each assessment contributes to the knowledge base about U.S. students’ mathematics performance.

The study found many similarities between the two assessments. However, it also found important differences in the relative emphasis across content areas or categories, in the role of context, in the level of complexity, in the degree of mathematizing, in the overall amount of text, and in the use of representations in assessments