AI Robustness (AIR)

Modern data analytic methods and tools—including artificial intelligence (AI) and machine learning (ML) classifiers—are revolutionizing prediction capabilities and automation through their capacity to analyze and classify data. To produce such results, these methods depend on correlations. However, an overreliance on correlations can lead to prediction bias and reduced confidence in AI outputs.

Drift in data and concept, evolving edge cases, and emerging phenomena can undermine the correlations that AI classifiers rely on. As the Department of Defense (DoD) is increasing its use of AI classifiers and predictors, users may grow to distrust results because of these issues. To regain user trust, we need new test and evaluation methods for ongoing testing and evaluation of AI and ML accuracy. Carnegie Mellon University Software Engineering Institute (CMU SEI) has developed a new AI Robustness (AIR) tool that allows users to gauge AI and ML classifier performance with data-based confidence.

The SEI AIR tool offers a precedent-setting capability to improve the correctness of AI classifications and predictions, increasing confidence in the use of AI in development, testing, and operations decision making.

Linda Parker Gates

Initiative Lead, Software Acquisition Pathways

The AIR tool uses state-of-the art algorithms and techniques to

build a causal graph (see Step 1 in Figure 1) from variables in the dataset including treatment (X), outcome (Y), and intermediate (M).
determine adjustment sets (see Step 2 in Figure 1) consisting of the parents of X (Z1, top) and M (Z2, bottom), which allows it to remove any potential bias introduced between X and Y.
calculate the average risk difference and associated 95% confidence intervals for each adjustment set (see Step 3 in Figure 1). In this way, the AIR tool shows when the AI classifier can’t be trusted and suggests where it could be improved.

This project is sponsored and funded by the Office of the Under Secretary of Defense for Research and Engineering OUSD(R&E) to transition the AIR tool to users across the DoD to help identify when and where their AI classifiers are giving biased predictions. We are looking for DoD collaborators to use our technology and provide feedback. As a participant, your AI and subject matter experts will work with our team to identify known causal relationships and build an initial causal graph. Contact us at info@sei.cmu.edu to learn more.

Figure 1: Steps in the AIR Tool Analysis Process. Results and interpretations given by the AIR tool are based on output from all three steps.

In Context

This FY2024-26 project

is a collaborative effort among SEI and CMU researchers to build on knowledge gained from previous SEI projects, including the FY22 Maturation of Determining the Limits of AI Robustness (MDLAR) research project
leverages SEI expertise and experience in necessary data science and programming, mathematical statistics, causal discovery, and causal inference
aligns with the CMU SEI technical objective to modernize software engineering and acquisition (codify AI engineering practices; improve designed-in trustworthiness), and meets the SEI’s primary enduring goals of delivering trustworthiness, capability, and timeliness
aligns with the OUSD(R&E) critical technology priority of building trusted AI and autonomy

Software Engineering Institute

Research Review 2024