State Student Testing
Materials commissioned by the NAEP Validity Studies Panel and available for downloading in Adobe Acrobat:
Bock, R. D., and Zimowski, M. F. (1998). Feasibility studies of two-stage testing in large-scale educational assessment: Implications for State NAEP.
Chromy, J. R. (1998). The effects of finite sampling corrections on state assessment sample requirements.
Duran, R. P. (2000). Implications of electronic technology for the NAEP assessment.
Hedges, L. V., and Vevea, J. L. (1997). A study of equating in NAEP.
Hedges, L. V., Konstantopoulos, S., and Thoreson, A. (November 2000). Computer Use and Its Relation to Academic Achievement in Mathematics, Reading, and Writing.
Jaeger, R. M. (1998). Reporting the results of the National Assessment of Educational Progress.
Jakwerth, P. M., Stancavage, F. B., and Reed, E. D. (1999). An investigation of why students do not respond to questions.
Linn, B., McLaughlin, D., Jiang, T., and Gallagher, L. (2004). Assigning adaptive NAEP booklets based on state assessment scores: A simulation study of the impact on standard errors.
McLaughlin, D., Gallagher, L., and Stancavage, F. (2004). Evaluation of bias correction methods for "worst-case" selective non-participation in NAEP.
Mosquin, P. and Chromy, J. (2004). Federal sample sizes for confirmation of state tests in the No Child Left Behind act.
Mullis, I. V. S. (1997). Optimizing state NAEP: Issues and possible improvements.
Pearson, P. D., and Garavaglia, D. R. (1997). Improving the information value of performance items in large scale assessments.
Stancavage, F. B., et al. (October 2002). An Agenda for NAEP Validity Research.
Weston, T. J. (July 2002). The Validity of Oral Accommodation in Testing.
Additional Assessment materials:
Acosta, B. (February 2001). Alternative assessment for English language learners. Paper presented at the National Association for Bilingual Education Conference, Phoenix, AZ.
Acosta, B. (April 2001). One right way? What culturally diverse students can teach us about math. Paper presented at the Council for Exceptional Children Annual Convention, Kansas, MO.
Baldi, S., Skidmore, D., and Ritter, S. (2000). Developing alternatives to traditional standardized assessments: The Philadelphia grade-four “Second Chance” assessments. Washington, DC: American Institutes for Research.
Betts, J. R., and Shkolnik, J. L. (2000). Key difficulties in identifying the effects of ability grouping on student achievement. Economics of Education Review 19 (1): pp. 21–26.
Bohrnstedt, G. W. (1997). U.S. mathematics and science achievement: How are we doing? Teachers College Record, 99 (1), 19–28. Columbia University.
Bohrnstedt, G. W. (1997). Classical measurement theory: Its utility and limitations for attitude research. In D. Krebs and P. Schmidt (Eds.), New Directions in Attitude Measurement. de Gruyter: Berlin.
Clemans, W.V., Lunneborge, C.E., and Raju, N.S. (2004). Professor Paul Horst's legacy: A differential prediction model for effective guidance in course selection. Educational Measurement: Issues and Practice. Copyright 2004 by the National Council on Measurement in Education. Posted by permission of the publisher.
Chittenden, E., and Salinger, T. (2001). Inquiry into meaning: An investigation of learning to read. New York: Teachers College Press. (Revising and updating: Bussis, A. M., Chittenden, E. A., Amarel, M., and Klauser, E., originally published in 1985).
Cole, S., & McLaughlin, D. (March 2001). Study of the feasibility of using state assessment data for Title I reporting: Data collection process report. Palo Alto, CA: American Institutes for Research.
Farr, B. P., and Estrin, E. (1997). Assessment alternatives for diverse classrooms. Norwood, MA: Christopher-Gordon Publishers.
Farr, R., Farr, B., and Jongsma, G. (1992). The need for new views of language arts assessment. In R. Smith and D. Birdyshaw (Eds.), Perspectives on Assessment. Grand Rapids, MI: Michigan Reading Association.
Fast, M. (April 2001). Focus on foreign language learning in the National Assessment of Educational Progress (NAEP). Paper presented at the 54th Kentucky Foreign Language Conference, Lexington, KY.
Ferrara, S., Huynh, H., and Baghi, H. (1997). Contextual characteristics of locally dependent open-ended item clusters in a large-scale performance assessment. Applied Measurement in Education, 10 (2), 123-144.
Ferrara, S., Huynh, H., and Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large scale hands-on science performance assessment. Journal of Educational Measurement, 36, (2) 119-140.
Ferrara, S., and McTighe, J. (1992). Assessment: A thoughtful process. In A. Costa, J. Bellanca, and R. Fogarty (Eds.), If Minds Matter: A Foreword to the Future (Vol. 2). Palatine, IL: Skylight Publishing; reprinted in K. Burke (Ed.), Authentic assessment: A collection. Palatine, IL: IRI/Skylight Publishing.
Ferrara, S., Willhoft, J., Seburn, C., Slaughter, F., and Stevenson, J. (1991). Local assessments designed to parallel statewide minimum competency tests: Benefits and drawbacks. In R. Stake and R. O’Sullivan (Eds.), Advances in Program Evaluation: Vol. 1b. Effects of Mandated Assessment on Teaching (pp. 41-74). Greenwich, CT: JAI Press.
Fort Fast, E., and Tucker, C. (April 2001). Redesign of the student assessment reporting system in Connecticut. Paper presented at the Annual Meeting of the American Educational Research Association.
Greenberg, E., Macias, R., Rhodes, D., and Chan, T. (2001). English literacy and language minorities in the United States: Results from the National Literacy Survey (NCES 2001-464). U. S. Department of Education, National Center for Educational Statistics. Washington, DC: U. S. Government Printing Office.
Hanson, K., Brown, B., Levine, R., and García, T. (2001). Should standard calculators be provided in testing situations? An investigation of performance and preference differences. Applied Measurement in Education, 14, 59 -72.
Hubbard, M ., & Levine, R. (2000). The impact of the Spanish Immersion Pilot Program at Escondido School: School year 1999-2000. Palo Alto, CA: American Institutes for Research.
Huynh, H., and Ferrara, S. (1994). A comparison of equal percentile and partial credit equatings for performance-based assessments composed of free-response items. Journal of Educational Measurement, 31 (2), 125-141.
Innes, F. K., Anstrom, T., and Woodward, K. (December 2000). Bias, sensitivity, and language simplification review of VNT items in 4th-grade reading and 8th-grade mathematics. Report for the National Assessment Governing Board, American Institutes for Research, Washington, DC.
Innes, F., Klein, S., Best, C., Braswell, J., Ferrara, S., Salinger, T., and Woodward, K. (April 2000). Development and refinement of rubrics for constructed-response items in the Voluntary National Tests during year 2 of the VNT project. Report for the National Assessment Governing Board, American Institutes for Research, Washington, DC.
Innes, F., Mitchell, J., and Farr, B. (June 2000). Language simplification review of the items for the Voluntary National Tests in 4th-grade reading and 8th-grade mathematics. Paper presented at the CCSSO 30th Annual National Conference on Large-Scale Assessment, Snowbird, UT.
Innes, F., Zing, J., Chen, L., Ferrara, S., Garavaglia, D., Johnson, E., and Oppler, S. (April 2000). Results from the 1999 achievement levels study. Report for the National Assessment Governing Board, American Institutes for Research, Washington, DC.
Jakwerth, P. M., Anthony, J. J., and Cole, S. C. (February 2000). Assessing the performance of students with moderate to profound disabilities: What should be measured? (Report to the Office of Special Education Programs.) Palo Alto, CA: American Institutes for Research.
Kenyon, D. M., Farr, B., Mitchell, J., and Armengol, R. (2000). Framework for the 2003 Foreign Language National Assessment of Educational Progress. Washington, DC: Center for Applied Linguistics.
Levine, R., and Huberman, M. (2000). High school exit examination: Cognitive laboratory testing of selected items. Palo Alto, CA: American Institutes for Research.
Levine, R., Huberman, M., Allen, J., and DuBois, P. (1999). The measurement of home background indicators: Cognitive laboratory investigations of the responses of fourth and eighth graders to questionnaire item and parental assessment of the invasiveness of these items. Palo, Alto, CA: American Institutes for Research.
Levine, R., Rathbun, A., Selden, R., and Davis, A. (1998). NAEP’s constituents: What do they want? Report of the National Assessment of Educational Progress Constituents Survey and Focus Groups. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement, NCES 98–521.
McLaughlin, D., Bandeira de Mello, V., Cole, S., and Arenson, E. (January 2000). Comparison of National Assessment of Educational Progress (NAEP) and statewide assessment results: Report to state on 1996 and 1998 assessments. Palo Alto, CA: American Institutes for Research.
McTighe, J., and Ferrara, S. (1995). Assessing learning in the classroom. Journal of Quality Learning, 5 (12), 11-28.
McTighe, J., and Ferrara, S. (1998). Assessing learning in the classroom (rev. ed.). Washington, DC: National Education Association.
Michaels, H., and Ferrara, S. (1999). Evolution of educational reform in Maryland: Using data to drive state policy and local reform. In G. J. Cizek (Ed.), Handbook of Educational Policy. San Diego: Academic Press.
Mitchell, J. H. (2000). Assessment and exercise specifications for the Foreign Language National Assessment of Educational Progress. Washington, DC: Center for Applied Linguistics.
Oppler, S. H., Campbell, J. P., Pulakos, E. D., and Borman, W. C. (1992). Three approaches to the investigation of subgroup bias in performance measurement: Review, results, and conclusions. Journal of Applied Psychology 77, 201-217.
Rodríguez, C. (1996). Our Nation on the fault line: Hispanic American education. President’s Advisory Commission on Educational Excellence for Hispanic Americans, The White House Initiative for Educational Excellence for Hispanic Americans, Washington, DC.
Reed, E., Levine, R., and Huberman, M. (2000). A cognitive laboratory investigation of the performance of learning disabled students on VNT items. Palo Alto, CA: American Institutes for Research.
Rodríguez, C. (1993). The cultural component of ESL/bilingual education and instruction, foundations and assessment of ESL/bilingual populations (two graduate curriculum modules). The University of Phoenix, Arizona.
Sager, C. E., Peterson, N. P., Oppler, S. H., Rosse, R. L., and Walker, C. B. (1997). An examination of five indices of test battery performance: Analysis of the Enhanced Computer-Administered Test (ECAT) battery. Military Psychology, 9, 97-120.
Salinger, T. (2001). Assessing the literacy of young children: The case for multiple forms of evidence. In S. B. Neuman and D. K. Dickinson, Handbook of Early Literacy Development. New York: Guildford Publications, Inc.
Salinger, T. (1998). How do we assess young children’s literacy learning? In S. B. Neuman and K. A. Roskos (Eds.), Children Achieving. Newark, DE: International Reading Association.
Salinger, T. (1997). International perspectives on reading assessment: Theory and practice (co-edited with C. Harrison). London: Routledge.
Salinger, T. (1997). Consequential validity of an early literacy portfolio. In C. Harrison and T. Salinger (Eds.), International Perspectives on Reading Assessment: Theory and Practice. London: Routledge.
Salinger, T. (1998). A case study of system change: “Doing” portfolio assessment. In M. Coles and R. Jenkins. International Perspectives on Reading Assessment: Classroom Innovations and Challenges. London: Routledge.
Salinger, T. (1995). Preparing professionals to assist students in reaching their full potential in reading. In F. Murray (Ed.), The Knowledge Base for Teacher Educators. San Francisco, CA: Jossey-Bass.
Salinger, T. (1995). Literacy for young children. Columbus, OH: Merrill/Prentice-Hall.
Salinger, T. (1995). The gate keepers: Tests for entry into and exit from teacher education. In S. W. Soled (Ed.), Assessment, Testing, and Evaluation in Teacher Education. Norwood, NJ: Ablex.
Salinger, T. (1992). Classroom-based and portfolio assessment for elementary grades. In C. Hedley, D. Feldman, and P. Antonacci (Eds.), Literacy Across the Curriculum. Norwood, NJ: Ablex.
Salinger, T., and Campbell, J. (1997). National assessment in the United States. In C. Harrison and T. Salinger (Eds.), International Perspectives on Reading Assessment: Theory and Practice. London: Routledge.
Stecher, B. M., McCaffrey, D. F., Burroughs, D., Wiley, E. W., Bohrnstedt, G. W. (2000). Chapter 5: Achievement. In B. M. Stecher and G. W. Bohrnstedt (Eds.), Class Size Reduction in California: The 1998–99 Evaluation Findings. Sacramento, CA: California Department of Education.
Wasik, B., Dobbins, D., and Herrmann, S. (2001). Intergenerational family literacy: Concepts, Research, and Practice. In S. B. Neuman and D. K. Dickinson (Eds.), Handbook of Early Literacy Development. New York: The Guildford Press.
Yen, W. M., and Ferrara, S. (1997). The Maryland School Performance Assessment Program: Performance assessments with psychometric quality suitable for high-stakes usage. Educational and Psychological Measurement, 57 (1), 60-84.
Yoon, B., and Young, M. J. (April 2000). Estimating the reliability of test scores with mixed item formats: Internet consistency and generalizability. Paper presented at the annual meeting of the National Council on Measurement in Education. New Orleans, LA.
