SEI Tool Helps Determine Causes of AI Bias, Improves AI Trust

July 14, 2025 • Article

With the increasing use of AI classifiers in the Department of Defense and across the federal government to make classifications and predictions in operations, mission planning, simulation, logistics, and threat modeling, trust in AI robustness is a critical issue. Recognizing the urgency of monitoring AI for potential adverse effects, the Office of Management and Budget recently released a Memo on Accelerating Federal use of AI through Innovation, Governance, and Public Trust. The Software Engineering Institute’s free AI Robustness (AIR) tool helps agencies meet this critical need by determining the causes of adverse impacts of AI classifiers.

The OPM memo identifies the needed monitoring activities: “Agencies must conduct testing and periodic human review of AI use cases, where feasible, to identify any adverse impacts to the performance and security of AI functionality…. Ongoing monitoring must be designed to detect unforeseen circumstances, changes to an AI system after deployment, or changes to the context of use or associated data. Agencies must implement appropriate mitigations and ensure proper system and use documentation; and where possible, develop processes enabling traceability and transparency in this evaluation.”

“The AIR tool fills a critical void. It uses causal learning to improve the correctness of AI classifications and predictions that are largely dependent on data correlation. Its effect is to increase confidence in the use of AI in development, testing, and operations decision making,” said Linda Parker Gates, AIR project principal investigator.

AI and ML classifiers provide powerful tools that have revolutionized prediction capabilities and automation through their capacity to analyze and classify data. To produce their results, most AI and ML methods depend on correlations. However, overreliance on correlations can lead to prediction bias and reduced confidence in AI outputs. Drift in data and concept, evolving edge cases, and emerging phenomena can undermine the correlations that AI classifiers rely on, resulting in a lack of robustness or the ability to perform accurately in unusual or changing contexts. Users may grow to distrust results because of these issues.

The SEI’s open-source AIR tool provides a new method for ongoing testing and evaluation of AI and ML accuracy and allows users to gauge AI and ML classifier performance with unprecedented confidence. The tool incorporates the SEI’s experience applying and adapting novel techniques from causal discovery and causal inference—to evaluate cause-effect relationships—to assess various AI and ML classifications and predictions with more nuance, resulting in confidence that AI and ML models are less biased and more suitable for guiding intervention and control of a system’s performance and better attribution of outliers and causes.

“Powerful AI and ML tools are revolutionizing fields of prediction, automation, cybersecurity, intelligence gathering, training and simulation, and object detection, and more. Yet we know there are weaknesses associated with these tools that we must consider,” said Anita Carleton, director, SEI Software Solutions Division. “The AIR tool offers insight into not only where AI might go astray, but also why it happens.”

The SEI is currently seeking DoD partners to use and provide feedback on this technology. Collaborators’ AI and subject-matter experts will partner with the SEI team for robust implementation guidance and support to identify known causal relationships and use a cutting-edge causal discovery tool to gain actionable insights into an AI classifier’s health.

To work with the SEI on use of the AIR tool or for more information, contact info@sei.cmu.edu.