A large organization can generate over ten billion network flow records per day, a high-velocity data source. Finding useful, security-related anomalies in this volume of data is challenging. Most large network flow tools sample the data to make the problem manageable, but sampling unacceptably reduces the fidelity of analytic conclusions. In this paper we discuss SiLK, a tool suite created to analyze this high-volume data source without sampling. SiLK implementation and architectural design are optimized to manage this Big Data problem. SiLK provides not just network flow capture and analysis, but also includes tools to analyze large sets and dictionaries that frequently relate to network flow data, incorporating higher variety data sources. These tools integrate disparate data sources with SiLK analysis.
Jonathan Spring is a member of the technical staff with the CERT Network Situational Awareness Group of the Software Engineering Institute, Carnegie Mellon University. He began working at CERT in 2009. He also serves as an adjunct professor at the University of Pittsburgh's School of Information Sciences. His current research topics include monitoring cloud computing and DNS traffic analysis. He holds a Master's degree in information security and a Bachelor's degree in philosophy from the University of Pittsburgh.