The Mexican Translation of TIMSS-1995: Lessons on Test Translation from a Post-Mortem Study

The translation of tests into languages in which they were not originally written, and their adaptation for groups not originally targeted, has become an important component of education in the context of a global economy (Hambleton, 1994). International comparisons have posed important challenges for the development of measurement instruments in more than one language. As a result, norms and procedures for the linguistic adaptation of tests keep evolving (Grisay, 1998).

This paper is a contribution to test translation review. We address the fact that, while necessary, guidelines for test translation are not sufficient for ensuring the validity of translated tests. We describe an approach to test translation review that addresses a wider variety of translation issues, from production of translated tests, to curriculum representation, to social aspects of language use and language usage. More specifically, we offer a conceptual framework for the coding of translation errors, and provide some empirical evidence on the effect of translation errors on student performance.

We examined the Mexican translation of the Third International Mathematics and Science Study (TIMSS-1995) test. Mexican data from TIMSS-1995 are largely unknown because Mexico withdrew its participation after data had been collected but before the results were published. Examining this test addresses expectations of transparency that the Mexican public now has regarding education issues. While the test is ten years old, lessons learned from that experience will inform internal decisions about Mexico’s participation in future international comparisons. This effort is funded by the National Institute for Educational Evaluation (INEE), whose creation in 2002 initiated in Mexico an era of public awareness of evaluation and sensitivity to accountability issues.

Most of the Mexican TIMSS-1995 data were destroyed. However, soon after the creation of INEE, we were able to recover blank copies of all the test booklets used with Population 1 (grade 3 and 4) and Population 2 (grade 7 and 8) students. We were also able to recover information on the items’ p-values (the proportion of students who responded correctly to the items) for two years: 1995—in which data were collected as part of Mexico’s participation in the international comparison—and 2000—in which the Mexican Ministry of Education’s General Directorate of Evaluation (DGE) administered the same test to a new sample of students (see Backhoff & Solano-Flores, 2003).

The lack of information at the student level (such as external measures of academic performance, demographic information, or information on linguistic abilities) limited the kinds of analyses that we could perform. Our analyses focus on the frequency of translation errors observed across populations, grades, and content areas, and on the effect of translation quality on student performance, as reflected by the correlations of item translation quality measures and item p-values.