icon-carat-right menu search cmu-wordmark

Tales from the Wild West: Crafting Scenarios to Audit Bias in LLMs

White Paper
This work introduces a scenario-based audit using RPG-style prompts where LLMs role-play characters to reveal bias in descriptions of individuals around them.
Publisher

Software Engineering Institute

Abstract

While large language models (LLMs) introduce opportunities to build exciting new kinds of human-computer interactions, they also present a host of risks such as the unintended perpetuation of harmful biases. To better identify and mitigate biases in LLMs, new evaluation and auditing methods are needed that circumvent safeguards and reveal underlying learned behaviors. In this work, we present a scenario-based auditing approach to uncovering biases in which the LLM plays the role of a character and describes individuals living in the world around them in the context of a role-playing game (RPG). Through a scenario centered around a cowboy named Jett, we elicit open-ended responses from ChatGPT that reveal ethnic and gender biases. Our findings demonstrate the importance of taking an exploratory approach to identifying bias in LLMs and suggest paths for future investigation.