Assessing LLMs for High Stakes Applications

May 31, 2024 • White Paper

By
Shannon Gallagher, Jasmine Ratchford, Tyler Brooks, Bryan Brown, Eric Heim, Bill Nichols, Scott McMillan, Swati Rallapalli, Carol J. Smith, Nathan M. VanHoudnos, Nick Winski, and Andrew O. Mellinger

This work explores LLM deployment in intelligence reporting, highlighting key challenges in data, scaling, and assessment.

Publisher

Association for Computing Machinery (ACM)

DOI (Digital Object Identifier)

10.1145/3639477.3639720

Topic or Tag

Artificial Intelligence

Abstract

Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developing, deploying, and assessing LLMs within a specific high stakes application, intelligence reporting workflows. We identified the following challenges that need to be addressed before LLMs can be used in high stakes applications: (1) challenges with unverified data and data leakage, (2) challenges with fine tuning and inference at scale, and (3) challenges in reproducibility and assessment of LLMs. We argue that researchers should prioritize test and assessment metrics, as better metrics will lead to insight to further improve these LLMs.