How Public Datasets Can Address Data Obstacles in Pay for Success: A Demonstration for Veterans with Service-Connected Disabilities

Learn more: Pay for Success/Social Impact Bonds.

Pay for Success has become an attractive way for policy makers, government agencies, and service providers to drive improvements in essential social services by tying funding to meaningful, long-lasting outcomes. Because of this emphasis on outcomes, a deep understanding of the service’s targeted population is essential to any Pay for Success project. For many projects, there are no perfect data sources.

Project (re)Launch

As part of its Pay for Success work, AIR developed a workaround to common data obstacles these projects face. AIR was contracted to develop the evaluation design for the San Diego-based Project (re)Launch, where Third Sector Capital Partners, Inc., served as the project manager. This project sought to improve employment and health outcomes for veterans with service-connected disabilities by providing intensive case management and wraparound supports.

To determine what meaningful impact Project (re)Launch might have on veterans with service-connected disabilities, the AIR team needed to obtain baseline data on economic and health measures for this specific population of veterans in San Diego. AIR researchers found several public datasets with information on veterans, but they did not include all needed data.

For example, one of the datasets AIR researchers explored was the American Community Survey (ACS), which included identifying information on veterans and their service-connected disability status. The AIR team was able to use ACS to collect baseline economic outcomes, but the dataset did not include health-related outcomes. Another dataset, the Behavioral Risk Factor Surveillance System (BRFSS), included information on physical and mental health outcomes and identified whether a respondent is a veteran, but it lacked information on service-connected disabilities.

AIR is playing a variety of roles in implementing Pay for Success work across the country. Learn more: Pay for Success.

The AIR team thus focused on determining if it would be possible to predict whether a veteran had a service-connected disability by using ACS data on veterans with service-connected disabilities and factors common to both ACS and BRFSS datasets, such as age, race, gender, type of disability, education and income. Information in the datasets was de-identified.

Data Workaround

Burhan Ogut, principal researcher at AIR, led the effort to find a creative solution. “We were able to develop a model where we had the information we needed to predict a variable that is missing another dataset and then we applied the results from this model to the other dataset that was lacking this information but included other data that we needed,” Ogut says. A brief by AIR and Third Sector describes this method in detail.

Another benefit of the model is that AIR researchers discovered additional information on veterans with service-connected disabilities that they might not have otherwise. For example, the collected data showed that veterans with service-connected disabilities had lower job market outcomes (employment and wages) and were more likely to have mental health challenges than veterans with disabilities not related to their service.

The approach has applications beyond finding data on veteran populations. For example, researchers might want to know a student’s free or reduced-price lunch status—or another measure of poverty—but cannot find that information in a particular dataset. If there are factors known to be correlated with that variable, using another dataset with similar predictive variables can predict the free or reduced-price lunch status of a student in the dataset of interest. The approach also could be used to predict missing information, such as receipt of Vocational Rehabilitation services or 504 or Individualized Education Program status, in a dataset on individuals with disabilities.

“You can apply this model for any group. If you are missing an identifying information on your target population, and that information exists in another data set and both datasets share common variables to use as predictors, then you can just use the same idea that we did,” Ogut says.