Skip to main content
  • About Us
  • Careers
  • Contact

Search form

American Institutes for Research

  • Our Work
    • Education
    • Health
    • International
    • Workforce
    • ALL TOPICS >
  • Our Services
    • Research and Evaluation
    • Technical Assistance
  • Our Experts
  • News & Events

You are here

  • Home

NCES Data R Project - EdSurvey

Project

EdSurvey is an R statistical package designed for the analysis of national and international education data from the National Center for Education Statistics (NCES). The released EdSurvey Version 2.6 includes the following data sources:

  • National Assessment of Educational Progress (NAEP)
  • Trends in International Mathematics and Science Study (TIMSS) and TIMSS Advanced
  • Progress in International Reading Literacy Study (PIRLS) and ePIRLS
  • International Computer and Information Literacy Study (ICILS)
  • International Civic and Citizenship Education Study (ICCS)
  • Civic Education Study (CivEd)
  • Programme for International Student Assessment (PISA)
  • Programme for the International Assessment of Adult Competencies (PIAAC)
  • Teaching and Learning International Survey (TALIS)
  • Early Childhood Longitudinal Study (ECLS)
  • Education Longitudinal Study of 2002 (ELS)
  • High School Longitudinal Study of 2009 (HSLS)
  • Beginning Teacher Longitudinal Study (BTLS)

EdSurvey is developed by AIR, commissioned by the NCES. EdSurvey is tailored to the processing and analysis of NCES large-scale education data with appropriate procedures to analyze these data efficiently—taking into account their complex sample survey design and the use of plausible values.

Here is how to get started with EdSurvey:

  • Installing and Loading EdSurvey
  • Key Functions
  • Technical Papers
  • Contact and Bug Report

Installing and Loading EdSurvey

Unless you already have R version 3.5.0 or later, install the latest R version. Users also may want to install RStudio desktop, which has an interface that many find easier to follow. RStudio is available online from rstudio.

Inside R, run the following command to install EdSurvey as well as its package dependencies:

install.packages ("EdSurvey")

Once the package is successfully installed, EdSurvey can be loaded with the following command:

library(EdSurvey)


Key Functions

The key functions of EdSurvey Version 2.6 include:

  • data processing, including downloading publicly available data and reading data in R;
  • data manipulation, such as the subsetting and merging of data, as well as renaming and recoding variables;
  • data exploration, including methods to better understand survey attributes and search for variables and levels in codebooks;
  • summary statistics, including unweighted and weighted totals, conditional means, and the percentage of respondents in a category (conditional on an ancillary categorical variable or on the interactions of an arbitrary number of categorical variables), estimation of scale score means based on plausible values;
  • percentile that calculates the percentiles of a numeric variable or plausible values;
  • analysis of achievement levels and benchmarks for NAEP and international assessment data;
  • correlations, including Pearson, Spearman, polyserial, polychoric, and correlation between plausible values, with or without weights applied;
  • linear regression with or without plausible values as the dependent variable;
  • logistic regression that allows either a discrete variable or dichotomized plausible values as the dependent variable; and
  • gap analysis that compares the average, percentile, achievement level, or percentage of survey responses between two groups that potentially share members. In EdSurvey 2.6, the gap function now accounts for linking error between NAEP paper and digitally based assessments.
  • multilevel models using weights at multiple levels and allowing plausible values in the dependent variable;
  • multivariate regression that extends multiple linear regression to include models with multiple outcome variables;
  • quantile regression that fits a quantile regression model that uses weights and variance estimates appropriate for the data.

As the development of EdSurvey progresses several additional functions, such as direct estimation, IRT, and other statistical methods will be added to the package.


Technical Papers

Book and Journal Publication

Bailey, P., Lee, M., Nguyen, T., & Zhang, T. (2020). Using EdSurvey to Analyse PIAAC Data. In Large-Scale Cognitive Assessment (pp. 209-237). Springer, Cham.

Data Set Specific Overviews

Documents that describe the analysis of specific survey data in the EdSurvey package include the following:

  • Using EdSurvey to Analyze ECLS-K:2011 Data (PDF) describes the methods in analysis of NCES longitudinal data with ECLS-K:2011 data in examples. The vignette covers topics including preparing the R environment, downloading and processing the data, exploring and manipulating data, and running statistical analyses such as summary tables, correlations, and regression models.
  • Using EdSurvey to Analyze NCES Data: An Illustration of Analyzing NAEP Primer (PDF) describes the basics of using the EdSurvey package for analysis of NAEP data. This vignette covers an introduction to the EdSurvey package with topics such as preparing the R environment for processing, creating summary tables, calculating percentiles and achievement levels, running correlations, linear regression and logistic regression, and conducting gap analysis.
  • Using EdSurvey to Analyze TIMSS Data (PDF) describes the methods used in analysis of large-scale educational assessment programs such as Trends in International Mathematics and Science Study (TIMSS) using the EdSurvey package. The vignette covers topics such as preparing the R environment for processing, creating summary tables, running linear regression models, and correlating variables.
  • Using EdSurvey to Analyze NAEP Data With and Without Accommodations (PDF) provides an overview of the use of NAEP data with accommodations and describes methods used to analyze this data.

Task Specific Walkthroughs

Documents providing an overview of functions developed in the EdSurvey package include the following:

  • Installing the EdSurvey Package on a Restricted-Use Data Computer (PDF) provides guidance for how to install EdSurvey on a restricted-use data (RUD) computer without an Internet connection.
  • Converting Text Data File(s) With Companion SPSS Script to SPSS Data File Format (PDF) details the process of converting a data file and SPSS script to an SPSS Data File for use with EdSurvey.
  • Using the getData Function in EdSurvey (PDF) describes the use of the EdSurvey package when extensive data manipulation is required before analysis.
  • Using EdSurvey for Trend Analysis (PDF) describes the methods used in the EdSurvey package to conduct analyses of statistics that change over time in large-scale educational studies.
  • Exploratory Data Analysis on NCES Data (PDF) provides examples of conducting exploratory data analysis on NAEP data.
  • Calculating Adjusted p-Values From EdSurvey Results (PDF) describes the basics of adjusting p-values to account for multiple comparisons.
  • Producing LaTeX Tables From edsurveyTable Results With edsurveyTable2pdf (PDF) details the creation of pdf summary tables from summary results using the edsurveyTable2pdf function.

Methodology Resources

Documents that describe the statistical methodology used in the EdSurvey package include the following:

  • Statistical Methods (PDF) details estimation procedures of the statistics in the lm.sdf, achievementLevel, and edsurveyTable functions.
  • Analyses Using Achievement Levels Based on Plausible Values describes the methodology for NAEP achievement level estimation.
  • Gap Analysis (PDF) covers the methods comparing the difference between two statistics for two groups that potentially share members.
  • Estimating Percentiles (PDF) describes the methods used to estimate percentiles.
  • Estimating Mixed-Effects Models (PDF) describes the methods used to estimate mixed-effects models with plausible values and survey weights, and how to fit different types of mixed-effects models using the EdSurvey package.
  • Multivariate Regression (PDF) details the estimation of multivariate regression models using mvrlm.sdf.
  • Running Wald Tests (PDF) details the use of the Wald test to exam the joint significance of regression coefficients using lm.sdf and glm.sdf.
  • Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas introduces the methodology used by the wCorr R package for computing the Pearson, Spearman, polyserial, polyserial, polychoric and tetrachoric correlations, with and without weights applied. Simulation evidence is presented to show correctness of the methods, including an examination of the bias and consistency.


Contact and Bug Reports

Send us your questions and comments via e-mail to: EdSurvey.help@air.org

Share

Contact

Ting Zhang

Senior Researcher
Image of Paul Bailey

Paul Bailey

Senior Economist

Topics

Education
NAEP

EdSurvey Team

NCES Project Officer
Emmanuel Sikali
 

Team Leaders
Ting Zhang
Paul Bailey
 

Current Team Members
Charles Blankenship
Michael Cohen
Thomas Fink
Huade Huo
Michael Lee
Sun-Joo Lee
Yuqi Liao
Qingshu Xie

Related Resources

Analyses Using Achievement Levels Based on Plausible Values
Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas
NAEP Data in Focus: Examining the Research

RESEARCH. EVALUATION. APPLICATION. IMPACT.

About Us

About AIR
Board of Directors
Leadership
Experts
Clients
Contracting with AIR
Contact Us

Our Work

Education
Health
International
Workforce

Client Services

Research and Evaluation
Technical Assistance

News & Events

Careers at AIR


Search form


 

Connecting

FacebookTwitterLinkedinYouTubeInstagram

American Institutes for Research

1400 Crystal Drive, 10th Floor
Arlington, VA 22202-3289
Call: (202) 403-5000
Fax: (202) 403-5000

Copyright © 2021 American Institutes for Research®.  All rights reserved.

  • Privacy Policy
  • Sitemap