Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas

Ahmad Emad
Qingshu Xie, MacroSys
Emmanuel Sikali, National Center for Education Statistics

Correlation analysis has been used widely by researchers and analysts when analyzing large-scale assessment data. Limit research provided reliable methods to estimate various correlations and their standard errors with the complex sampling design and multiple plausible values taken into account. This report introduces the methodology used by the wCorr R package (Emad & Bailey, 2017) for computing the Pearson, Spearman, polyserial, and polychoric correlations, with and without weights applied. The methodology treats tetrachoric correlation as a specific case of the polychoric correlation and biserial correlation as a specific case of the polyserial correlation.

Simulation evidence is presented to show correctness of the methods, including an examination of the bias and consistency. Overall, the simulations show first-order convergence for each unweighted correlation coefficient with an approximately linear computation cost. Further, under our simulation assumptions, the weighted correlation performs better than the unweighted correlation for all correlation coefficients.

We show the first-order convergence of the weighted Pearson, polyserial, and polychoric correlation coefficient. The Spearman is shown to not consistently estimate the population Pearson correlation coefficient but is shown to consistently estimate the population Spearman correlation coefficient—under the assumptions of our simulation.