Learning Patterns by Observing Behavior with Inverse Reinforcement Learning
Created February 2020
Like humans, machines can learn by observing and repeating behaviors. However, in its most basic form, this type of imitation learning can lack the ability to translate learnings to real-world scenarios. To address this challenge, Emerging Technology Center researchers have been looking to Inverse Reinforcement Learning (IRL) techniques—an area of machine learning—to more efficiently and effectively teach novices how to perform expert tasks, achieve robotic control, and perform activity-based intelligence.
Understanding Patterns in Activity-based Intelligence
Data is key to the mission of the U.S. Department of Defense (DoD) and the intelligence community (IC), allowing the DoD to make informed decisions based on accurate understandings. With technology, data can now be collected from many sources and put together to see a complete picture. The challenge is in sifting through this vast amount of data to find the signals buried in the white noise of routine observations. If data could be analyzed faster and more precisely, decisions based on complex data could be made more accurately and in real time.
Analyzing activity-based intelligence has allowed the DoD to take large sets of data from multiple sources, understand patterns in the data, identify changes and outliers in those patterns, and make decisions based on how those patterns and changes are characterized. The amount of data that must be analyzed to find patterns and identify outliers is staggering; by automating this analysis and through artificial intelligence (AI) engineering, the DoD is able to more quickly translate data into action. But automation through machine learning and human-machine teaming opens up additional challenges—knowing how to best teach systems how to perform this analysis, translate this analysis to different scenarios, and scale this analysis to large-scale situations and problems.
SEI Emerging Technology Center researchers are working with Dr. Anind Dey of the University of Washington and Dr. Stephanie Rosenthal of Carnegie Mellon University.
Developing Statistical Models of Behavior
Led by Software Engineering Institute (SEI) principal investigator Dr. Eric Heim, researchers at the Emerging Technology Center have been working to improve the efficiency and effectiveness of this activity-based intelligence analysis through artificial intelligence engineering. We have proposed an alternative approach to working with activity-based intelligence—inverse reinforcement learning (IRL). Inverse reinforcement learning is a formalization of imitation learning, which involves learning a task by observing how it is done (e.g., a driverless car observing a human driver to learn how to drive). The difference between IRL and simple imitation learning is that, in addition to taking note of the actions and decisions needed to perform a task, IRL also associates those actions with the intrinsic rewards of taking them (e.g., rewards associated with driving safely and within the law, in the driverless car scenario—why you wouldn’t want to cross a median and hit that pole). By doing so, IRL can teach agents to apply the decisions it makes when performing certain actions to other states that the agent might not yet have observed.
In the DoD scenario, inverse reinforcement learning would take a set of observed behaviors captured in data by one or more agents, learn the preferences agents have that describe observed behaviors, and compute a statistical model of the world that includes whether each behavior is part of a routine. In order to model the behaviors typically considered in DoD and IC domains, we need methods that scale to large data requirements, are robust to model novel behaviors, and faithfully model the domain that is considered.
We are currently working on addressing these considerations:
- scaling IRL techniques so they can be applied to large-scale DoD/intelligence community problem domains: In order for us to apply IRL to large-scale problems through human-machine teaming, we must write the software that learns a reward function fast. Our current work has taken training time from days to minutes.
- creating robust IRL techniques that model rare or novel behaviors: Our work seeks to make models more reliable. If the IRL model hasn’t observed some phenomenon, it’s hard for it to reason about what an agent would do. We are working to make IRL models that are robust to rare events, allowing them to model behavior not explicitly observed.
- using IRL to model human behaviors for the purpose of teaching complex tasks to novices, and giving tips on how to adjust behavior to live healthier lives: We are using models of human behavior to improve quality of life and to teach. We are working to create efficient, robust methods that faithfully model behaviors in important operational domains.
November 11, 2019 Video
Watch SEI principal investigator Eric Heim discuss research to develop novel Inverse Reinforcement Learning (IRL) techniques as efficient and effective means for DoD/IC to perform activity-based intelligence or to teach novices how to perform...watch
March 22, 2019 Video
This SEI Cyber Talk episode explains how inverse reinforcement learning can be effective for teaching agents to perform complex tasks with many states and...watch