search icon-carat-right cmu-wordmark
quotes
2024 Research Review

Techniques for Detection of Information Flows Indicative of Inserted Malicious Code

Detecting malicious code is a challenge, particularly when it’s implanted in otherwise legitimate software. If undetected, malware injected into legitimate software can result in costly compromises of computer systems, such as in the SolarWinds incident of 2020. This project aims to produce a tool to detect such malicious code to prevent system compromises. Our goal is to enable analysts throughout the U.S. Department of Defense (DoD) to detect malicious code before it can compromise DoD systems.

Our goal is to enable analysts throughout the DoD to detect malicious code before it can compromise DoD systems.

Dr. Will Klieber
Software Security Engineer
Dr. Will Klieber

Our tool detects two types of malicious code: (1) exfiltration of sensitive data and (2) timebombs, logic bombs, remote-access Trojans (RATs), and similar malicious code. We use only static analysis, not dynamic analysis, so we do not need to execute potentially malicious code. While our focus is on C/C++ codebases, we perform all analysis at the level of LLVM intermediate representation (IR), so we also have some support for binaries via lifting to LLVM IR.

Our tool uses taint analysis to track the flow of sensitive and auxiliary information. A limitation of traditional taint analysis is that it conflates all flow paths from a given source to a given sink, so a malicious flow path can be "hidden" by a benign flow path. In contrast, in this project we use auxiliary information (e.g., whether a filename was specified by a local user or whether it came from network data) to help separate benign flows from malicious flows.

Our tool flags code as potentially malicious, but further human analysis is required to determine whether the code is actually malicious. This is because whether behavior is malicious depends on the what the program is supposed to do. Our tool is designed to detect both intentionally inserted malicious code and "business logic" vulnerabilities (such as Log4Shell in Log4j); it doesn’t distinguish between them. The goal for our tool is to produce output that concisely and precisely characterizes the potentially malicious behaviors of the codebase, so that a human analyst can quickly and accurately determine whether the behavior is benign or malicious.

Our project addresses the DoD’s operational need for better capabilities for detecting malicious code inserted through supply-chain attacks. While some tools support this task, they still require time-consuming and expensive manual analysis. If successful, our tool will produce a level of assurance (with respect to applicable classes of malicious code) that requires 10 times less manual effort than necessary with existing tools.

Figure 1: Tool Diagram, with Input and Output

In Context: This FY2023-24 Project

  • builds on the CMU SEI’s diverse expertise and experience in software analysis and malware analysis
  • aligns with the CMU SEI technical objective to be trustworthy in construction and implementation and resilient in the face of operational uncertainties, including known and yet unseen adversary capabilities
  • aligns with the OUSD(R&E) critical technology priority of leveraging advanced computing and software technologies