icon-carat-right menu search cmu-wordmark

Assessing LLMs for High Stakes Applications

White Paper
This work explores LLM deployment in intelligence reporting, highlighting key challenges in data, scaling, and assessment.
Publisher

Association for Computing Machinery

DOI (Digital Object Identifier)
10.1145/3639477.3639720

Abstract

Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developing, deploying, and assessing LLMs within a specific high stakes application, intelligence reporting workflows. We identified the following challenges that need to be addressed before LLMs can be used in high stakes applications: (1) challenges with unverified data and data leakage, (2) challenges with fine tuning and inference at scale, and (3) challenges in reproducibility and assessment of LLMs. We argue that researchers should prioritize test and assessment metrics, as better metrics will lead to insight to further improve these LLMs.