BLC: Blocklist Co-Occurrence Analysis for Large-scale IP Network Traffic Flows
• Presentation
Publisher
Software Engineering Institute
Topic or Tag
Abstract
We have developed BLC (blocklist co-occurrence analysis for large-scale IP network traffic flows), a novel approach to detect malicious IP addresses, such as command & control (C2) servers and downloaders, by analyzing the co-occurring traffic patterns of known malicious and other IP addresses.
Our main detection target is a botnet, which is a group of malware-infected hosts used for various cyber attacks. Malicious actors control botnets using globally distributed and layered architecture.
To comprehensively detect botnets, Internet-scale traffic analysis, and performance-effective detection method is necessary. However, this is challenging because large volume traffic includes so much benign and noise data and little malicious traffic data.
To address the challenge, BLC starts with an existing blocklist of known malicious IP addresses and recursively expands the list by focusing on the co-occurring traffic patterns. BLC is based on the basic idea, that is, malicious actors might prepare not only 1 malicious server but several servers for various reasons, and infected hosts might connect to several malicious servers. In a graph-analysis perspective, when malicious servers are known, the 1 hop neighborhood of the malicious servers can be bots, and a few of the 2 hop neighborhood can be other malicious nodes. BLC is calculating the malicious score of the 2 hop neighborhood focusing on the number of connecting bots, bot importance, and benign hosts.
The first step of BLC is creating a graph from IP network flows. Since IP network flows include the source and destination IP addresses in a unidirectional perspective, but IP addresses do not represent initiators of communication, we create an undirected graph.
The second step is listing up the target nodes for malicious score calculation. Target nodes are all the nodes which are 2 hops away from known malicious nodes.
The third step is the malicious score calculation of target nodes (Node_i) with the following 2 parameters.
- (Co-occurrence Degree) Score related to connecting bots and bot importance: Sum of weighted Jaccard index between Node_i and each known malicious node.
- (Weight) Score related to benign hosts : (number of nodes connecting to both Node_i and one of known malicious nodes) / (total number of nodes connecting to Node_i)
Since the number of nodes in large-scale network traffic is so huge that calculating the malicious score of all the target nodes takes time, we develop a practical algorithm to apply BLC for large-scale network traffic to minimize computer resources.
- (a) Pruning by degree: since malicious nodes are stealthy and basically not so popular in the Internet, when degree of nodes including 1 hop neighborhood (bots) and 2 hop neighborhood(target nodes) is high enough, listing up target nodes can be pruned.
- (b) Pruning by Weight: When the Weight is small enough, the node is possibly benign and Co-occurrence Degree calculation is pruned.
We evaluate the performance of BLC in large ISP network flow data using known C2 server information, and we demonstrate the effectiveness of the method. Specifically, when we start with a large existing blocklist, BLC is successfully detecting other malicious servers.
Attendees Will Learn
- How to handle large-scale IP network flow data in a graph structure
- Co-occurrence analysis for malicious node detection and its practical algorithm
- Evaluation results of malicious node detection using large ISP network flow data