SEI Blog | Artificial Intelligence Engineering

SEI Blog | Artificial Intelligence Engineeringhttp://sei.cmu.edu/feeds/topic/artificial-intelligence-engineering/atom/?utm_source=blog&utm_medium=rss2026-05-27T00:00:00-04:00Updates on changes and additions to the SEI Blog for posts matching Artificial Intelligence EngineeringExpecting the Unexpected: Monitoring for Drift in ML Systems2026-05-27T00:00:00-04:002026-05-27T00:00:00-04:00Grant Schumock, Matt Walshhttps://www.sei.cmu.edu/blog/expecting-the-unexpected-monitoring-for-drift-in-ml-systems/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Measures need to detect and respond to drift in ML systems before real-world harms are enacted. This post describes what causes drift and how to detect it.

The ELM Library: An LLM Evaluation Toolset2026-05-06T00:00:00-04:002026-05-06T00:00:00-04:00Violet Turri, Natalie Schieber, Charles Loughin, Tyler Brookshttps://www.sei.cmu.edu/blog/the-elm-library-an-llm-evaluation-toolset/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

To help teams meet the need for rigorous evaluation methods, researchers in SEI’s AI Division developed a library built on best practices for LLM evaluation and benchmarking.

From Reality to Virtual Reality: The Impact of 3DGS on Training, Education, and Beyond2026-03-25T00:00:00-04:002026-03-25T00:00:00-04:00Roxxanne White, Matt Walsh, Dominic Ross, Richard Laughlinhttps://www.sei.cmu.edu/blog/from-reality-to-virtual-reality-the-impact-of-3dgs-on-training-education-and-beyond/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

This blog post describes cutting-edge method for creating digital models of the physical world called 3D Gaussian Splatting.

My AI System Works…But Is It Safe to Use?2025-09-09T00:00:00-04:002025-09-09T00:00:00-04:00David Schulker, Matt Walsh, Emil Mathewhttps://www.sei.cmu.edu/blog/my-ai-system-worksbut-is-it-safe-to-use/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

This blog post introduce System Theoretic Process Analysis (STPA), a hazard analysis technique uniquely suitable for dealing with the complexity of AI systems.

Artificial Intelligence in National Security: Acquisition and Integration2025-08-05T00:00:00-04:002025-08-05T00:00:00-04:00Paige Rishel, Carol Smith, Brigid O'Hearn, Rita Creelhttps://www.sei.cmu.edu/blog/artificial-intelligence-in-national-security-acquisition-and-integration/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

This blog post highlights practitioner insights from a recent AI Acquisition workshop, including challenges in differentiating AI systems, guidance on when to use AI, and matching AI tools to mission needs.

Amplifying AI Readiness in the DoD Workforce2025-06-23T00:00:00-04:002025-06-23T00:00:00-04:00Eric Keylor, Robert Beveridge, Jonathan Frederickhttps://www.sei.cmu.edu/blog/amplifying-ai-readiness-in-the-dod-workforce/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

The SEI recently partnered with the Department of the Air Force Chief Data and AI Office to develop a strategy to identify and assess hidden workforce talent for data and AI work roles.

Out of Distribution Detection: Knowing When AI Doesn't Know2025-06-09T00:00:00-04:002025-06-09T00:00:00-04:00Eric Heim, Cole Frankhttps://www.sei.cmu.edu/blog/out-of-distribution-detection-knowing-when-ai-doesnt-know/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

How do we know when an AI system is operating outside its intended knowledge boundaries?

10 Things Organizations Should Know About AI Workforce Development2025-04-28T00:00:00-04:002025-04-28T00:00:00-04:00Jonathan Frederick, Dominic Ross, Eric Keylor, Cole Frank, Intae Namhttps://www.sei.cmu.edu/blog/10-things-organizations-should-know-about-ai-workforce-development/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

This post outlines 10 recommendations developed in response to work with our mission partners in the Department of Defense.

DataOps: Towards More Reliable Machine Learning Systems2025-04-21T00:00:00-04:002025-04-21T00:00:00-04:00Daniel DeCapriahttps://www.sei.cmu.edu/blog/dataops-towards-more-reliable-machine-learning-systems/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Decisions based on ML models can have significant consequences, and managing the raw material—data—in ML systems is a challenge. This post explains DataOps, an area that focuses on the management and optimization of data throughout its lifecycle.

Evaluating LLMs for Text Summarization: An Introduction2025-04-07T00:00:00-04:002025-04-07T00:00:00-04:00Shannon Gallagher, Swati Rallapalli, Tyler Brookshttps://www.sei.cmu.edu/blog/evaluating-llms-for-text-summarization-introduction/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Deploying LLMs without human supervision and evaluation can lead to significant errors. This post outlines the fundamentals of LLM evaluation for text summarization in high-stakes applications.

The Essential Role of AISIRT in Flaw and Vulnerability Management2025-03-26T00:00:00-04:002025-03-26T00:00:00-04:00Lauren McIlvenny, Vijay Sarvepallihttps://www.sei.cmu.edu/blog/the-essential-role-of-aisirt-in-flaw-and-vulnerability-management/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

The SEI established the first Artificial Intelligence Security Incident Response Team (AISIRT) in 2023. This post discusses the role of AISIRT in coordinating flaws and vulnerabilities in AI systems.

Enhancing Machine Learning Assurance with Portend2025-03-24T00:00:00-04:002025-03-24T00:00:00-04:00Jeffrey Hansen, Sebastián Echeverría, Lena Pons, Gabriel Moreno, Grace Lewis, Lihan Zhanhttps://www.sei.cmu.edu/blog/enhancing-machine-learning-assurance-with-portend/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

This post introduces Portend, a new open source toolset that simulates data drift in machine learning models and identifies the proper metrics to detect drift in production environments.

Introducing MLTE: A Systems Approach to Machine Learning Test and Evaluation2025-02-17T00:00:00-05:002025-02-17T00:00:00-05:00Alex Derr, Sebastián Echeverría, Katherine Maffey, Grace Lewishttps://www.sei.cmu.edu/blog/introducing-mlte-systems-approach-to-machine-learning-test-and-evaluation/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Machine learning systems are notoriously difficult to test. This post introduces Machine Learning Test and Evaluation (MLTE), a new process and tool to mitigate this problem and create safer, more reliable systems.

The Myth of Machine Learning Non-Reproducibility and Randomness for Acquisitions and Testing, Evaluation, Verification, and Validation2025-01-13T00:00:00-05:002025-01-13T00:00:00-05:00Andrew Mellinger, Daniel Justice, Marissa Connor, Shannon Gallagher, Tyler Brookshttps://www.sei.cmu.edu/blog/the-myth-of-machine-learning-reproducibility-and-randomness-for-acquisitions-and-testing-evaluation-verification-and-validation/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

A reproducibility challenge faces machine learning (ML) systems today. This post explores configurations that increase reproducibility and provides recommendations for these challenges.

Beyond Capable: Accuracy, Calibration, and Robustness in Large Language Models2024-12-03T00:00:00-05:002024-12-03T00:00:00-05:00Matt Walsh, David Schulker, Shing-hon Lauhttps://www.sei.cmu.edu/blog/beyond-capable-accuracy-calibration-and-robustness-in-large-language-models/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

For any organization seeking to responsibly harness the potential of large language models, we present a holistic approach to LLM evaluation that goes beyond accuracy.

GenAI for Code Review of C++ and Java2024-11-18T00:00:00-05:002024-11-18T00:00:00-05:00David Schulkerhttps://www.sei.cmu.edu/blog/genai-for-code-review-of-c-and-java/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Would ChatGPT-3.5 and ChatGPT-4o correctly identify errors in noncompliant code and correctly recognize compliant code as error-free?

Introduction to MLOps: Bridging Machine Learning and Operations2024-11-04T00:00:00-05:002024-11-04T00:00:00-05:00Daniel DeCapriahttps://www.sei.cmu.edu/blog/introduction-to-mlops-bridging-machine-learning-and-operations/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Machine learning operations (MLOps) has emerged as a critical discipline in artificial intelligence and data science. This post introduces MLOps and its applications.

Measuring AI Accuracy with the AI Robustness (AIR) Tool2024-09-30T00:00:00-04:002024-09-30T00:00:00-04:00Michael Konrad, Nicholas Testa, Linda Parker Gates, Crisanne Nolan, David Shepard, Julie Cohen, Andrew Mellinger, Suzanne Miller, Melissa Ludwickhttps://www.sei.cmu.edu/blog/measuring-ai-accuracy-with-the-ai-robustness-air-tool/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Understanding your artificial intelligence (AI) system’s predictions can be challenging. In this post, SEI researchers discuss a new tool to help improve AI classifier performance.

Weaknesses and Vulnerabilities in Modern AI: AI Risk, Cyber Risk, and Planning for Test and Evaluation2024-08-12T00:00:00-04:002024-08-12T00:00:00-04:00Bill Scherlishttps://www.sei.cmu.edu/blog/weaknesses-and-vulnerabilities-in-modern-ai-ai-risk-cyber-risk-and-planning-for-test-and-evaluation/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

Modern AI systems pose consequential, poorly understood risks. This blog post explores strategies for framing test and evaluation practices based on a holistic approach to AI risk.

Weaknesses and Vulnerabilities in Modern AI: Integrity, Confidentiality, and Governance2024-08-05T00:00:00-04:002024-08-05T00:00:00-04:00Bill Scherlishttps://www.sei.cmu.edu/blog/weaknesses-and-vulnerabilities-in-modern-ai-integrity-confidentiality-and-governance/?utm_source=blog&utm_medium=rss&utm_campaign=my_site_updates

In the rush to develop AI, it is easy to overlook factors that increase risk. This post explores AI risk through the lens of confidentiality, governance, and integrity.