Data Science and Advanced Analytics
Featured Project
DARPA World Modelers
The Defense Advanced Research Projects Agency’s (DARPA) investment in the World Modelers program created the opportunity to develop sophisticated, near real time, integrated modeling infrastructure to help policymakers quickly respond to or prevent humanitarian disasters.
Our Capabilities
AIR Data Science and Advanced Analytics experts employ a range of data science methods to address complex public policy questions.
Data Science Research and Methodology
Artificial Intelligence
AIR utilizes state-of-the art artificial intelligence tools, such as ChatGPT, to conduct intense computational analyses on records, postings, comments, and investigative data. AIR experts use ChatGPT to execute a variety of advanced analytical techniques including natural language processing, topic modelling, and microsimulations.
Predictive Modeling
Whether it is used to predict the impact of a drought on the food security of a region, the probability of a system failure, or the performance of a new testing standard in schools, AIR leverages predictive modeling to generate valuable insights.
Machine Learning
In order to scale the scope of analysis beyond what is humanly possibe, AIR applies machine learning practices to detect patterns in data and make predictions about outcomes. The outputs inform policy and decisions impacting education, workforce, and international development.
MicroSim
Policy decisions can have unexpected consequences. Microsimulation methods provide a sandbox to test such interventions and to reveal unexpected outcomes by simulating real-world events. AIR experts use microsimulation tools to test various scenarios and to understand the distribution of policy outcomes.
Text Analytics and NLP
AIR uses data science techniques to analyze audio and text data. These techniques leverage a machine's computational abilities to quickly uncover insights and patterns in text data. This process can often be more efficient, more accurate, and less biased than human interpretations of language.
Computer Vision
To derive meaningful data from digital images, videos and other visual inputs, AIR trains machines to observe and identify patterns in the data that would otherwise go unnoticed. One of AIR's tools can generate a nutritional status score from a simple facial image.
Network Analysis
Being able to identify strengths and trends in the relationships between objects/people means we can better understand social networks. AIR uses this information to inform decision makers and validate assumptions.
Entity Resolution and Disambiguation
AIR has extensive knowledge of entity resolution and disambiguation methods. For the PatentsView project, we use this methods to identify unique attorneys, inventors, organizations, and locations associated with ambiguous raw patent data.
Outlier Detection
AIR identifies outliers in order to remove, correct, or study the observation. In cases of true anomalies in observations, removal allows more accurate trends to be found, while mistakes in input would be corrected after investigation.
Statistical Software Development
Statistical software doesn't always measure up to all the needs of users. AIR uses our combined years of data analysis and software development experience to develop statistical software that's tailor made to address unique requirements.
Geospatial Analysis
Geospatial analysis enable modeling and visualizing of spatial patterns and prediction of trends for phenomena that may vary across space. AIR uses geospatial visualizations to monitor and model crop yield production, climate change modeling, crime prediction models and more.
Topological Data Analysis
Topological Data Analysis is a collection of mathematical tools that extract information about the shape of a data set. The intent is to analyze qualitative and quantitative features of the data set, which is useful for noisy, high-dimensional, or sparse data sets.
Data Visualization & Reporting
Interactive Data Visualization
Data visualization displays data in a compelling way by combining and dynamically presenting data for our clients to easily understand trends and relationships. We work with clients to design and build web visualizations, data-driven dashboards, geo-spatial visualization and traditional visualizations for print.
Dashboards and Web Applications
Tools such as RShiny, Tableau, and ArcGIS enable the creation of dashboards that allow the user to explore data without having to remake a plot many times over with slight variations. Additionally, these dashboards can help users understand new data as it comes in.
Data Storytelling
Data storytelling builds a narrative from statistical analysis and data visualization to give insights, or tell the story, of a dataset or datasets. Data stories show the user the underlying view of what the data is telling them, without having to rely on interpreting numbers and tables.
Automated Reporting and 508 Compliance
AIR creates automated publications such as visualizations and reports that are customized based on client needs. These publications are scalable and can be reused with consistency across time.
Data Engineering
Data Cleaning and Transformation
Data transformation involves converting data from its current format/structure into one better suited for the analysis. It can also involve imputation (correcting for missing values when both possible and appropriate using statistics such as mean, etc).
Record Linkage
To generate meaningful insights can require the integration of complimentary data. AIR applies effective entity resolution and connection practices to synthesize comprehensive datasets and detailed perspectives for record linking.
Unstructured Data Extraction and Manipulation
Data is often only available in formats that are not easy to work with such as PDF, HTML, and Word. In these situations, we leverage our data extraction and manipulation processes and models to extract the relevant information and store it in a more accessible format.
Data Pipeline Development & Automation
A data pipeline allows for automated processing of data. Automation removes the need for human interactions, allowing analysts to focus on other tasks while also greatly reducing the chance of human error.
NoSQL and Database Design
NoSQL Database designs allow for the storage of data in unstructured formats, such as documents, key-value pairs, graphs, and wide column storage. These storage formats have better scalability and access than strictly tabular formats such as SQL.
Data Acquisition & Web Scraping
Web scraping is a method of gathering data by extracting data directly from websites. Web scraping, among other data acquisiiton methods leveraged at AIR, allow for the collection large amounts of data for training machine learning algorithms and gathering observational data from social media, news, and blog sites.
API and Machine-readable Data Development
An API (Application Programming Interface), allows for easy communication between systems to relay machine-readable content to one another. AIR implements APIs to enhance integration between data sources and facilitate data access for researchers.
Get in touch with us
Have a question about our capabilities or want to discuss a use case? Connect with us here.
Related Work