Applying Social Science in the Real World
Informing practice with the best available research and making research more relevant to practice are easier said than done—whether in health care, education, or adult learning. Making a measurable difference in people’s lives is harder still.
The following short reflections on these challenges point to how we might make headway applying what is learned from research studies to the “real worlds” of practice and policy.
The researchers who contributed here work in different fields and research traditions, but all hope to prime conversation and collaboration with policymakers and practitioners, strengthening both research and practice.
Both David Osher and Terry Salinger see the “messiness” of real-world settings, compared to the controlled conditions of many research studies, as high but surmountable hurdles in generalizing, adopting, or scaling up evidence-based practices.
Commercial applications pose particular problems related to educating both developers and consumers, as George Rebok points out in his think piece on adopting cognitive training programs for older adults. And so do efforts to inform government policies, as George Bohrnstedt contends—witness researchers’ frustration when decades of research on achievement gaps between Black and White students go largely unheeded.
Addressing these challenges requires keener understanding of the real world that researchers hope to help improve. One way is to walk in the shoes of your research’s intended beneficiaries. In her personal account of navigating the healthcare system as both a health and aging researcher and a caregiver, Marilyn Moon sees the issues from a new “bottom-up” vantage point. This perspective also underlies Bea Birman’s call for knowing how individuals and organizations get information and learn new approaches.
To better grasp implementation challenges, researchers may need to interact more with the practitioners and policymakers who might use research to inform their work. Steven Garfinkel describes a new way for researchers to work in real-world settings through rapid-cycle evaluation, which allows both researchers and practitioners to better understand how an innovation is being implemented and provides practitioners with a steady flow of information so they can keep improving practice in response. Such new ways of working challenge some of researchers’ well-honed traditional skills along with the underpinnings of some traditional research paradigms.
Research can help solve problems of practice, and practitioners can help make research relevant. The key, these commentaries suggest, is balancing the needs of practitioners and policymakers with the requirements of research rigor, often through shared work.
Significant Differences!! What Do They Really Mean?
In simple terms, education evaluations typically compare classrooms or schools that receive specific programs (the treatment) with schools that receive business-as-usual services. In a recent study of a program designed to improve teacher practice and student reading, for example, all kindergarten to Grade 2 teachers in participating schools received the same business-as-usual services while teachers in schools randomly assigned to the treatment received extra resources, summer professional development institutes, and instructional coaching throughout the school year.
Even the best-designed study can’t stop the inevitable and often unexpected fluctuations in research settings like schools and districts, especially in urban areas. There’s an inherent messiness in schools and districts.
An equally simple description of our job as evaluators is that we must collect the data needed to investigate whether extra resources and services seemed to have a positive impact on teachers or students or both. To go on with the kindergarten example, after the second and third (and final) year of implementation, statistically significant differences between treatment and comparison schools emerged in teacher practice, overall reading achievement in kindergarten and second grade, and other variables. The positive findings—solid because we had validated both the method and the data—affirmed the program’s promise for improving teaching and learning.
Rarely do evaluations of interventions in early reading find such significant differences between treatment and comparison conditions. In a 2003 meta-analysis, only nine studies out of over 1,300 met standards for high-quality, rigorous research. And this shortage of well-designed studies makes it difficult to generalize about how strong an impact professional development really can have on teacher and student outcomes.
But here we were with a rigorous study and statistically significant results—and a pressing need to understand what the results did and did not mean. In a nutshell, the evaluation found positive impacts for the program in specific schools in specific districts, but the findings did not guarantee that positive impacts would be found in other districts or even in other schools in the same study districts. The study met standards for rigor (including matched schools, large sample size, consistent data collection over three years of implementation, and equal attrition rates under the two conditions). But standing by the results is one thing; overgeneralizing from them is another. So we cautioned the program’s developer that the findings were indeed a big deal but still needed to be viewed realistically.
Why? For starters, even the best-designed study can’t stop the inevitable and often unexpected fluctuations in research settings like schools and districts, especially in urban areas. There’s an inherent messiness in schools and districts. Teachers, students and administrators move around frequently, and curriculum changes often, too. Amid such instability, even positive findings like ours may not justify districts’ adoption of the new approach.
More messiness: schools and districts grappling with poor student performance on state reading tests or other accountability measures often search for whatever is marketed as new or special or guaranteed to improve student achievement. They put their trust in the “next big thing” instead of in the slow and steady process of building professional knowledge and teachers’ instructional capacity.
Then, too, the business-as-usual professional development and training or overall instructional procedures in study districts may be intrinsically strong, raising the possibility that all teachers are getting the support needed to improve their skills. As other scholars have pointed out, the nature and quality of instruction in comparison classes and the training provided to those teachers need to be measured carefully if researchers are to understand the real impact of positive program results.
All these factors can cloud the story that evaluation data tell about treatment and comparison schools, making it difficult to determine the extent to which the program being evaluated has produced real change. Evaluators like to assume that the messiness will be equally distributed across treatment and comparison schools, but experimental studies rarely collect the data to prove or disprove this assumption.
So while studies may find real, significant differences, there’s no guarantee that the program evaluated would have the same impact in other settings, even those nearby. This is the impact evaluator’s dilemma.
Implementing Evidence-Based Interventions in Real-World Settings
So why don’t many practitioners implement evidence-based programs and practices? And, when practitioners do practice what is preached, why don’t they strictly follow the recipe? And, when they implement the research with fidelity, why don’t they get the results that efficacy studies say are possible?
It is possible to implement evidence-based practices and programs successfully in earthen trenches. But doing so takes time, and ... organizational readiness, support [for] practitioners ... and the ability to adapt evidence-based programs to individual contexts while maintaining the program’s core ingredients.
These questions point to three research-to-practice challenges. Addressing them now is particularly important as the “rotten social outcomes” identified by Lisbeth Schorr and Paul Steele and the “wicked policy problems” that relate to them get increasing attention.
The first two challenges have been referred to as the “research-to-practice gap” and the third as the gap between efficacy research (which is implemented under relatively ideal conditions) and effectiveness research (implemented under more normal conditions). Addressing these challenges requires (in the words of Peter Jensen, Kimberly Hoagwood, and Edison Trickett) moving research from “ivory towers,” where graduate and postdoctoral students implement interventions to well selected samples, to “‘earthen trenches’ where children are more complex and resources exigent, to examine what is palatable, feasible, durable, affordable, and sustainable in real-world settings.”
Earthen trenches are messy and complex, contextually rich and interdependent, where in-the-moment (“hot action”) decisions are often required and practitioners must grapple with multiple and competing demands for their time, attention, energy and cognitive reserve. Teachers, for example, work in what Michael Huberman referred to as “busy kitchens” while other practitioners (to borrow Donald Schoen’s metaphor) confront tough and complex decisions in the “swampy lowlands of practice where situations are confusing 'messes' incapable of technical solution.” Think here about how the diverse academic, social, emotional and behavioral needs of every student in a classroom can change from day to day or even hour to hour.
Think, too, about having to decide whether a child has been abused or neglected by family members or whether a youth accused of delinquent behavior should be diverted from the juvenile justice system. Change, rarely easy, is especially hard in highly stressed settings, particularly without ample resources and support for learning, reflecting, collaborating and mastering new approaches and technologies.
Paradoxically, successful implementation of evidence-based strategies and programs may depend on moving from a developer/research-centric perspective to one focused on setting. Research-based interventions are not just matters of adhering to blueprints and implementing plans faithfully. Rather, their “ecology” includes other programs and competing demands on practitioner and consumer time and attention. These so-called “setting effects” can either amplify or diminish intervention effects. In short, research, evaluation and technical assistance should account for how a multiplicity of evidence-and non-evidenced-based practices affect particular outcomes.
All this said, it is possible to implement evidence-based practices and programs successfully in earthen trenches. I have seen it happen as a researcher, evaluator and technical assistance provider. But doing so takes time, and success also depends on organizational readiness, the support practitioners changing practices receive, and the ability of those promoting scale-up to adapt evidence-based programs to individual contexts while maintaining the program’s core ingredients.
Applying Research to Practice on a Personal Level
A major challenge of being a health and aging researcher arises when facing those issues personally. It’s humbling to try to reconcile theory and research with practice. But understanding how issues and policies play out in real life can help. As health care becomes more complicated and fragmented, consumers are increasingly responsible for making good choices and even managing what happens at various stages of treatment. Consequently, researchers have worked hard to both measure quality and good practice and to develop materials that consumers can use in decision-making. All that said, practical advice during times of need is hard to come by. Most of us are “just-in-time” information users—seeking advice while in the throes of our complex and fragmented health care system. More needs to be done to empower consumers so the tools that have been developed get used.
Research tells us that we don’t want health care providers steering people to their own agencies or best friends, so we need a better way of providing decision-making information than a midnight-to-1 a.m. search activity by an exhausted caregiver.
The fragmented system we have is difficult to navigate. My firsthand experience with helping my spouse get care following a stroke is pretty typical. While there is a fairly common path to getting care, it wends into different settings managed by different organizations, with almost no coordination or even shared knowledge. Even when the same overarching institution is presumably involved, each handoff occurs with uncertainty and with little sense of how one set of services helps or informs the next. Even knowing the formal rules surrounding health care policy, as I do, helps little since the practice can look quite different from what is implied in the regulations governing Medicare, for example.
For a stroke victim and other patients requiring hospitalization and considerable follow-up care, the usual progression is inpatient hospital, inpatient rehabilitation hospital, home health care, and then outpatient therapy. Technically, discharge planning is offered or required at various stages, but it can amount to as little as handing the family a list of eligible providers, with no supporting information or documentation. Research tells us that we don’t want health care providers steering people to their own agencies or best friends, so we need a better way of providing decision-making information than a midnight-to-1 a.m. search activity by an exhausted caregiver (my experience).
Care providers should be knowledgeable about the quality and ratings information available and share copies of such materials for those moving on to the next site of care. Currently, this is one missing link in health care decision-making. Busy professionals in one setting have little knowledge of how the other settings operate so can offer little guidance. Materials developed won’t be used if they don’t make sense to both patients and care providers.
AIR research done several years ago found that health care professionals and consumers often talk past each other: They are looking for different things and often express very different reasons for ignoring quality information, for example. Getting them on the same page can be challenging.
Other AIR research has also found that many people use proxy information as a shorthand for quality—such as equating higher prices with higher quality care. But many studies have shown that lower cost providers may provide equivalent or higher quality care.
Timing also complicates information-seeking. In my husband’s case, each time there was to be a handoff to another setting I would be reassured that I had several days to make arrangements—but would actually be forced to make a decision on the spot. Quality information that is supposed to help with these decisions is difficult to access and understand when under the stress of both a deadline and the general worry over being the caregiver for someone who is very ill. For that matter, other information, such as on the availability of services, often does not exist.
Some researchers have suggested adding a care coordination specialist to the mix. That might help, but only if that person follows the patient and isn’t housed in a single caregiving setting. And even then, who would coordinate and oversee the coordinators? How would they be accessed—or compensated? At the moment, such activities are largely cottage industries. And services are available only to those who can afford to pay out of pocket.
One answer might be to have a single organization provide all necessary care at each stage of the process. Integrated health care systems promise that they will manage the handoffs and see that the care is seamless. In practice, though, it does not always work that way. In one short-turnaround handoff, it seemed the best approach would be to work with the home health agency affiliated with the rehabilitation facility. But, absent coordination and any advantage of using related entities, I had to “fire” the home health agency. After a brief orientation, it was our responsibility to call all the individual aides to set up appointments; the nurse finally called back after two weeks (a week after I had informed the agency that we were going elsewhere) and said she was ready to meet with us. A homebound, very ill patient in a “coordinated” situation was not going to go untreated for over two weeks! Fee-for-service gave us the option of finding another provider. In a managed care environment, we would have to use the designated agency. Again, research indicates that, overall, quality is fairly equivalent for Medicare Advantage (coordinated) plans and traditional Medicare. But it is hard to find information on the various practical dimensions of receiving care when choosing among health plan options.
Our second experience with home health was more successful—but only because I used personal connections. None of the information on quality or availability indicated anything about actual access to services and timeliness of care. Someone without a network of professional friends would have been hard-pressed to figure out what to do. Moreover, research on this topic needs to recognize the subtle differences between acute care needs and supportive services when both are needed but, in our system, do not come from the same providers.
Every new twist and turn in the caregiving process further convinces me that there must be a better way. And now I know firsthand that it’s nearly as confusing to look at the problems facing the U.S. health care system from the standpoint of a participant as from the standpoint of a researcher. (When I find the time and insight to combine my practical experience and research knowledge, I expect to have more lessons learned to share.) Research needs to help inform consumers but can do so only if researchers choose to study the key questions that matter to patients. So far, consumers must learn the hard way that there are no easy paths for navigating our current healthcare system.
What Is Fidelity in Evaluation Research Anyway?
Evaluation design in the social sciences is a puzzle—literally. As government’s role in everyday life expanded during the 20th century, the demand for accountability, and with it evaluation, grew too. Investigators proposed designs, identified flaws, puzzled out solutions, and so on. My favorite puzzle guide is Campbell and Stanley’s Experimental and Quasi-Experimental Designs for Research. In it, the authors concisely synthesize 13 classic threats to the validity of inferences made from evaluation research and 16 evaluation designs that address those threats.
Fidelity has become a challenging concept, particularly in evaluating health care insurance and delivery system interventions.
Campbell and Stanley popularized the use of “X” to indicate the intervention being evaluated and “O” to indicate observations or measurements of the intervention’s effects. Rereading this work recently, I was struck by what a great choice X was. Undoubtedly, it was chosen to represent any intervention that readers might consider. But X also conveys, perhaps unintentionally, the notion of the intervention as a black box. Interventions in social interaction—teaching, providing health care—are hard to implement precisely, and implementers can take various approaches. Without addressing implementation fidelity explicitly, Campbell and Stanley do recognize it in their discussion of threats to validity. However, they implicitly treat X as a single, coherent intervention common to all participating organizations and persons and treat outside events (history) and internal growth (maturation) as alternative explanations that compete with the uniform X.
Since this influential text was written, fidelity has become a challenging concept, particularly in evaluating health care insurance and delivery system interventions. In 2010, the Affordable Care Act (ACA) accorded unprecedented importance and funding to the design, implementation, and evaluation of innovations that would improve quality and safety, control costs, and optimize patient outcomes in Medicare, Medicaid, and the Children’s Health Insurance Program (CHIP). Congress created the Center for Medicare & Medicaid Innovation (CMMI) at the Centers for Medicare & Medicaid Services (CMS) to carry out this work. Expanding funding and authority to act on the results of the kinds of rigorous evaluations that CMS had long carried out raised the stakes for all Medicare, Medicaid, and CHIP evaluations. Under the ACA, the Secretary of Health and Human Services can expand an innovation demonstration program widely without congressional authorization if the CMS Chief Actuary construes evaluation results and actuarial analysis to mean that certain cost and quality criteria are met.
To achieve cost and quality goals like the ACA’s, the Institute for Healthcare Improvement has, since the early 1990s, promoted the identification and diffusion of best practices through continuous quality improvement. Since 2010, organizational learning and diffusion of best practices have become essential elements of CMMI’s vision, mission and operations. This drive came at about the same time that Congress raised the stakes for evaluations.
With the importance of rigorous evaluation greater than ever and innovation evolving during the demonstrations that are being evaluated, the notion of fidelity, so long central to rigor in evaluation, has been challenged. On the one hand, why should the intervention remain static when we already know how to improve its implementation? Defending scientific rigor, Campbell and Stanley might think of these improvements as threats to validity from history or maturation that should be minimized through experimental design and statistical control. But, by definition, organizational learning and diffusion within the demonstration change X intentionally while it is being evaluated.
CMMI itself embraces rapid-cycle evaluation (RCE) as the answer to this conundrum. If you are continually changing the intervention, then you must also measure outcomes as you go along to see if those changes are harmful or helpful. This means both feeding back results to the demonstration organizations periodically for rapid-cycle improvement and drawing evaluation conclusions from them. Obviously, RCE can identify only short-term effects, but more traditional summative evaluation at the demonstration’s end can capture longer-term effects using the kind of rigorous evaluation designs described by Campbell and Stanley.
All this said, does rapid-cycle improvement (RCI), intentional organizational learning, and their challenge to traditional notions of implementation fidelity threaten or enhance the chances of getting accurate results from the overall rigorous evaluation? With or without RCI and RCE, adherence across demonstration sites to a well-specified intervention model (fidelity) is challenging when the pace and direction of history and maturation vary.
At first blush, fidelity seems degraded when the intervention is altered intentionally while it is being evaluated. But the changes made by communities of practice and rapid feedback of standard performance measures might also move diverse participants toward consistency in implementation and, thus, greater fidelity, at least by the end of the demonstration.
We don’t yet fully understand these trade-offs’ impact on our ability to draw actionable conclusions from demonstration evaluations. Still, it is clear that carefully measuring the shifts and changes introduced by active organizational learning activities throughout a demonstration and considering them as explicit variables in the summative evaluation should help define fidelity for a new research age.
Training the Aging Brain: Fact or Fiction?
There has been ongoing debate for a decade now over whether cognitive stimulation—through such everyday activities as completing crossword puzzles, learning to play a musical instrument, and participating in a book club or through more formal cognitive training interventions—can help maintain or even enhance cognitive functioning as people age.
An equally important question is whether the results of cognitive stimulation and training will transfer to both laboratory and real-life tasks. For example, will training people on a laboratory memory task help them better recall the names and faces of people they meet in their everyday lives? Or does improving processing speed on a simulated driving task improve people’s actual driving ability and on-road safety?
Too often, in their haste to sell brain-improvement products and games, developers rely on one or two studies to back their claims of effectiveness rather than drawing on an accumulated body of research.
Fortunately, a growing number of randomized controlled trials on the effects of cognitive training programs, including adaptive computer training, are assessing the immediate and long-term benefits of cognitive performance and whether such training will “generalize” to abilities and skills besides those targeted by training.
The Advanced Cognitive Training for Independent and Vital Elderly (ACTIVE) clinical trial—the largest test of whether cognitive training can improve the cognitive and speed of processing abilities of healthy older adults—so far shows promising results for cognitive stimulation and cognitive training. It demonstrates that older adults can improve their cognitive abilities, though not as fast as younger adults can, and the improvements last for several months or even years—up to 10 years in the case of the ACTIVE trial.
The evidence for whether training transfers is more mixed. Relatively few studies show transfer to non-trained tasks, including those involving everyday skills. However, in the ACTIVE trial, trained participants self-reported fewer daily living problems, and those getting processing speed training were less likely to cease driving or have at-fault automobile crashes.
Despite positive results, there is often a disconnect between laboratory research findings on cognitive stimulation and cognitive training and their use in commercial “brain training” products designed to stave off mental decline and forgetfulness. Brain training products have become a billion-dollar industry worldwide; revenues are projected to surpass $6 billion by 2020. However, the promised real-life benefits from cognitive training products are often unwarranted, and some products aren’t based on current research evidence.
What works in the laboratory may not work in the real world, so claims about the efficacy of these commercial programs may be premature. For example, no study has shown that brain training programs cure or prevent Alzheimer’s disease, despite claims to the contrary by some commercial vendors.
So one key question is why research is not used more in the development of brain training programs for older adults. Although there is steadily growing scientific evidence for the benefits of cognitive training, many program developers are not trained scientists and often cite research findings that are only tangentially related to their scientific claims about a product.
Developers may also be reluctant to use research findings because the results of many training studies are modest or fleeting—not the stuff of strong advertising claims. Too often, in their haste to sell brain-improvement products and games, developers rely on one or two studies to back their claims of effectiveness rather than drawing on an accumulated body of research (which may not exist for a particular program, might take time to collect, or pose product validity questions that product developers can’t answer). Although pharmaceutical claims are subject to regulatory review, so far brain fitness programs aren’t, so some developers cherry-pick results and make unsubstantiated advertising claims.
Further complicating the issue are important questions about implementing and disseminating cognitive training programs for older adults in community settings. Many such programs are computer-based—inaccessible to those who lack adequate computer or literacy skills, don’t know such programs exist, or find them hard to use for other reasons.
Researchers and developers need to pay more attention to making cognitive training programs accessible and affordable for the increasingly diverse population of older persons, especially those who are most in need. Guidelines for designing training and instructional programs for older learners are available and could inform this translational effort.
Other important unanswered questions are how early cognitive training should begin, how much a person should train, and how long the training can be expected to last. Until we know how to answer these questions, potential consumers should ask questions and require scientific evidence that a cognitive training program works. Which questions? For starters, are there scientists (ideally neuropsychologists) and a scientific advisory board behind the program? Have these advisers published peer-reviewed scientific papers? How many? What benefits are being claimed for using this program? And, does the program fit my personal goals? (For more questions, see this SharpBrains checklist.)
Using Research to Improve Practice: Which Research Makes a Difference?
Efforts by policymakers and program administrators to identify “what works” in education are legion. During the 1970s, the Joint Dissemination Review Panel evaluated the impacts of educational interventions so that the federal government could share them more widely. In recent years, the Education Department’s What Works Clearinghouse has identified practices that improve outcomes, relying primarily on research’s “gold standard”—randomized controlled trials. Yet, despite some positive changes in student outcomes (such as the modest narrowing of achievement gaps between minority and nonminority students), simply identifying effective practices hasn’t yielded widespread or system-wide improvement outcomes. Is the research on education practices partly to blame? Does it lack rigor or, on the other hand, the breadth needed to make results generalizable?
Simply identifying effective practices hasn’t yielded widespread or system-wide improvement outcomes... Changing what individuals and organizations do is best done in a durable community that supports both individual and organizational learning.
Certainly one difficulty is that finding practices that “work,” however rigorous the research behind them, requires taking into account what is known about the organizations using the practices successfully and how the people in these organizations—principals and teachers—learn. Too often, research to determine whether interventions work ignores knowledge from both research and practice about what it takes for teachers and schools to implement effective practices—about how people and organizations develop the capacity to improve.
By the same token, few policymakers design programs that create the optimal conditions for improving education practices. Understanding how people and organizations learn could help shape policies that support practice improvements rather than impede them.
Take teacher learning. Available evidence suggests that teachers learn best in an atmosphere of trust. To improve, teachers must be able to learn new skills and unlearn old habits and behaviors. This means making mistakes, at least at first. To risk trying something new, and to practice enough to develop expertise, teachers require the kind of trust that takes time to develop, along with supportive colleagues.
Beyond individual teachers, school improvement requires organizational learning. Implementing new practices often requires breaking with entrenched organizational routines, monitoring how the new practices are working, and making improvements along the way. Such changes don’t happen overnight! Changing what individuals and organizations do is best done in a durable community that supports both individual and organizational learning.
Schools can be such learning communities, and some already are. But education policies and practices beyond the school level can undermine the very conditions that these communities need to thrive. For example, schools can’t initiate or sustain effective practices without a stable teaching force. Yet, district, state or federal policies can foster “churn” in the teaching force if district rules don’t incentivize teachers to stay in challenging schools or if rules mandate blanket staffing changes (if, for example, School Improvement Grants require some schools to replace leaders or half of the teaching force). And, beyond fostering a stable teaching force, continuous school improvement requires leadership and resources from outside the school. Here, time for ongoing professional learning springs to mind.
Some researchers and technical assistance providers recognize that identifying evidence-based interventions is only one part of changing practice. AIR’s National Center on Intensive Intervention, for instance, employs randomized controlled trials and other rigorous research on “data-based individualization” as the foundation for designing a five-step process of diagnosis, intervention, progress monitoring, analysis and adaptation. Beyond rigorous research, “build[ing] district and school capacity to support implementation of data-based individualization in reading, mathematics, and behavior for students with severe and persistent learning and behavioral needs”—the Center’s mission—requires helping schools prepare to initiate change and to commit to the long haul. Since implementation is multifaceted, it can’t succeed without a host of supports ranging from strong leadership and teacher and parent involvement to opportunities for professional learning and data systems to monitor progress. There’s no one-size-fits-all formula, but these ingredients are all needed in some form.
Identifying interventions that “work,” no matter how high the research standards, is only one part of improving education practices and outcomes. Long-term improvement requires knowledge about the ongoing individual and organizational learning inherent in implementation itself.
Closing the Black-White Achievement Gap: Good News, Bad News
With each National Assessment of Education Progress (NAEP) release we read how sluggish American students’ progress is in subjects such as mathematics, reading and U.S. history and, especially, how poor the achievement of Blacks is and, consequently, how large the Black-White achievement gaps are. The most recent release of the 2015 NAEP results was no different, except there were declines in Grades 4 and 8 mathematics and Grade 8 reading, and the Black-White achievement gaps remained large.
Results as a whole are very encouraging: both our White and Black students are showing academic performance growth... And the bad news? When we compare the Black-White achievement gaps over time, we can see that the gaps are closing but at a snail's pace.
The NAEP assesses changes in the educational achievement of the nation’s fourth- and eighth-graders in mathematics and reading every other year and several other subjects less frequently; U.S. history is currently assessed every four years. When we examine roughly 25 years of achievement assessments for White and Black students in mathematics, reading and U.S. history, a “good news/bad news” picture emerges.
What is the good news? Save for 2015, scores have gone up for all students, and the gains have been greater for Blacks than for Whites. For example, scores at Grades 4 and 8 in mathematics for Whites and Blacks have all risen. In the past 25 years, the scores for Whites in Grade 4 have risen 29 points; for Blacks, 36 points. The somewhat smaller gains at Grade 8 follow this same pattern, although the gain for Black students is only 1 point greater than for Whites—22 points for Whites and 23 points for Blacks.
As a way to understand what these gains mean, consider that roughly 40 NAEP points separate the average Grade 8 and Grade 4 scores, which implies that, on average, students gain 10 NAEP points per year. Thus, these are considerable increases in student performance in the past 25 years, but especially for Black students at Grade 4.
In reading, the same pattern holds, though the overall gains are less than for mathematics. Between 1992 and 2015, White fourth-graders gained 8 points, but Blacks gained 14 points. In Grade 8, White students gained 7 points, compared to 11 for Black eighth-graders. This is all pretty good news so far.
The pattern for U.S. history is similar. At Grade 4, the growth for Black students between 1994 and 2011, 22 points, far exceeded that for White students, 9 points. For eighth-graders who were most recently administered the assessment in 2014, the results are similar but not as dramatic—13 points for Black students compared to 11 points for White students.
These results as a whole are very encouraging: both our White and Black students are showing academic performance growth. Most impressive, Black student growth exceeds that of White students for both fourth and eighth grades and in all three subjects.
And the bad news? When we compare the Black-White achievement gaps over time, we can see that the gaps are closing but at a snail’s pace. Most progress has been made in Grade 4 history: Over a 16-year period, the gap has closed 12 points. But progress has been much slower in the other grade-subject combinations—8 points in Grade 4 mathematics, 6 points in Grade 4 reading, 3 points in Grade 8 reading, 2 points in Grade 8 history, and 1 point in Grade 8 mathematics.
One way to gauge how fast gaps are closing is to examine the performance of Black students in a given grade and subject area in the most recent assessment and compare that to the White students’ score at the earliest point for which we have data. There is but a single instance—fourth-grade mathematics—in which Black students’ most recent score equals or exceeds that earned by White students two or more decades earlier. The average score for Black students in 2015 was 224—just 4 points higher than White students scored in 1990. Still focusing on Grade 4 mathematics, it took 15 years, until 2005, for Black student achievement to reach the 1990 level of White student achievement. Importantly, Black students still have not caught up to early-1990s White student achievement for any of the other grade-subject combinations.
So the good news is that Black students are improving their academic performance faster than White students in key subject areas. But the bad news is that, at the current rate, closing the gaps will take impossibly long. Even for Grade 4 mathematics, where progress has been greatest, it would take a century to close the gap!
While the data do not tell us which policies would close this unacceptable Black-White achievement gap, we know enough from other studies to implement changes that could speed up progress. Most important is the need for early childhood education—education from birth through a child’s arrival at kindergarten. The Early Childhood Longitudinal Study indicates that Black children arrive at kindergarten scoring over 20 percent lower on tests of cognitive ability than White students. To address this disparity, the evidence suggests the importance of wrap-around childhood education programs that include emotional, nutritional, and health supports in addition to learning activities in reading and mathematics. Finally, the evidence is clear that the most effective interventions begin at or shortly after birth.
Black students also have higher absence rates than White students and are more likely to be in schools with less-experienced and more non-credentialed teachers. And a recent AIR study showed that the average eighth-grade Black student attends a school that is 48 percent Black, while the average eighth-grade White student’s school is about 10 percent Black—a differential negatively related to Black male students’ academic performance when socio-economic status, teacher qualifications and classroom practices are taken into account.
If we as a nation care about closing the Black-White achievement gaps, research tells us that early childhood education, reducing segregation, and providing better teachers for our Black students would be good places to start.