Tales from the Wild West: Crafting Scenarios to Audit Bias in LLMs
• White Paper
Publisher
Software Engineering Institute
Topic or Tag
Abstract
While large language models (LLMs) introduce opportunities to build exciting new kinds of human-computer interactions, they also present a host of risks such as the unintended perpetuation of harmful biases. To better identify and mitigate biases in LLMs, new evaluation and auditing methods are needed that circumvent safeguards and reveal underlying learned behaviors. In this work, we present a scenario-based auditing approach to uncovering biases in which the LLM plays the role of a character and describes individuals living in the world around them in the context of a role-playing game (RPG). Through a scenario centered around a cowboy named Jett, we elicit open-ended responses from ChatGPT that reveal ethnic and gender biases. Our findings demonstrate the importance of taking an exploratory approach to identifying bias in LLMs and suggest paths for future investigation.