Network Graph Analysis for Suricata and Zeek using Brim and NetworkX

Oliver Rochford
Brim Security
Published in
11 min readFeb 24, 2021

--

Malware Outbreak visualized as Network Graph

Welcome to our second article on Brim’s Data Science blog. In the first article in this series , we learned how to use Brim’s python library to fetch Zeek data into Pandas.

Today we’re going to build on what we learned last time. Instead of just looking at Zeek data by itself, we’re going to fuse Zeek and Suricata data together. We’re also going to improve how we visualize our network graph to gain some useful insights.

About Brim

If you’re new to Brim, Zeek and Suricata:

  • Brim is an open source tool to search and analyze pcaps, Zeek and Suricata logs.
  • Zeek is the most popular open source platform for network security monitoring.
  • Suricata is an open source threat detection engine (commonly called Intrusion Detection and Prevention Systems).

Brim can import raw pcaps to enrich and analyze them with embedded Zeek and Suricata engines, and makes them available for search and analysis in the Brim app. Brim also provides a python library to support data science use cases and pipelines. We’ll be using Brim to create graph networks for network and threat activity.

Instructions and Prep

You can download Brim here
Installation instructions for Brim are here
Instructions for Brim’s python library can be found here
We’re going to be using NetworkX and Jupyter Notebook
Todays malware sample (password: infected) is courtesy of Malware Traffic Analysis and contains a Trickbot infection

Jupyter Notebook

The full and functional Gist for the code in this article can be found here

Getting started

First we’ll import all of our required libraries

Import dependencies and libraries

Next, we’ll select our Brim space to work with. You can find your space in the Brim app in the upper left corner. You can right-click the Space and then copy and paste the full name.

Finding the Brim Space name
Brim Space to query

Z Queries

We’re also going to need to define the Z queries we want to use. The first query is similar to the one we used in the last article. We filter for Zeek’s “conn” stream, and then cut out the id.orig_h (source), id.resp_h (target), and id.resp_p (target port) and count for unique occurrences.

_path=conn | count() by id.orig_h, id.resp_h, id.resp_p | sort id.orig_h, id.resp_h, id.resp_p

Note that we’re using count() to aggregate the logs. While this means that we'll lose some of the fidelity for calculating graph attributes such as clustering, it also pushes some of the heavy processing to ZQ and Brim.

For fetching the Suricata data, we’ll be using the following query:

event_type=alert | count() by src_ip, dest_ip, dest_port, alert.severity, alert.signature | sort src_ip, dest_ip, dest_port, alert.severity, alert.signature

The query filters for Suricata alerts withevent_type=alert, counts and sorts by src_ip (source), dest_ip (target), dest_port (port), alert severity, and alert signature.

Z queries to send to Brim

Creating two DataFrames

If the queries executed correctly, we now have two DataFrames, df containing the Zeek results, and df2 containing the Suricata alert data.

Print out how many records each DataFrame contains
Count of records in the initial DataFrames to hold our Z query results

Prepping the DataFrames

Before we can use our data to create a network graph, we need to do some data preparation. First we’re going to prepare two DataFrames to merge the Zeek and Suricata data, called dfz and dfs.

We’ll assign id.orig_h and src_ip to a column named source to make indexing easier. We also need to change our count columns for both data sources, or we'll have duplicate fields.

Create two DataFrames, one for Zeek and one for Suricata
The output of the two dataframes “dfz” and “dfs”

Merging the DataFrames

Now we need to merge the two DataFrames, using Pandas.concat(). We should end up with a merged data frame, indexed by source, target, and port, with the associated count of alerts and connection transactions attached to each connection. Also note we're setting ignore_index=True to maintain a continuous index value across the rows in the new appended data frame.

Merge the DataFrames
The DataFrame “dfc” with merged Zeek and Suricata data

Populate NaN fields

Because there are usually far more connections without corresponding Suricata alerts, we’ll end up with many records where the alertcount and severity will be unpopulated and filled withNaN, we're going to populate all NaN fields with 0.

Fill NaN fields with 0
All NaN fields have been filled with a 0.0

Recast types

Pandas.concat() will type all numbers as floats, so we'll recast these as int64.

Recast Floats to Int64

Calculate weights

We’re also going to calculate some weights based off of the connection and alert counts. It’s not a fantastically sophisticated calculation: we divide 10 by the maximum value for count, multiply it by the count and add 0.1 (to avoid a divide by zero error). This will give us a range from 0.1–10.1.

We are going to calculate an alertweight weight for force-directed graphs.
And we'll also be calculating a connweight to colorize the edges representing the Zeek conn transactions.

Calculate weights

Let’s print out some data about our DataFrame to validate that our calculations were successful and have been applied.

Inspect some data about our calculated weights
Validating our calculated weights

Creating our Graph

Our DataFrame is now ready to feed into a graph. We’re going to actually create two graphs, a MultiDirected graph , that can store multiple parallel and directed edges, allowing us to model connections to different ports and with different Suricata signatures, as well as whether they were sent or received by a node.

The second graph is a standard Undirected graph which we’ll use for algorithmic graph analysis.

Note how we define port as the edge key, and we keep all of the edge attributes such as severity and alert weights by defining edge_attr=True.

Create two Graphs

Add node attributes

We also need to add attributes to our node list, as this is not done automatically by networkx.from_pandas_edgelist() We'll add the alertcount, severity, alertweight, and connweight, so that we can use these as weights when we draw our nodes.

Adding attributes to our nodes

Adjust for graph size

NetworkX is not the most ideal tool for visualizing network graphs with many nodes (it is designed primarily for graph analysis, see the section at the of the article on Large Networks).

1000 nodes is a safe limit for us to visualize.

Adjust graph size based on nodes
Our example graph with 62 nodes is a small graph

Analyzing our graphs

Awesome — our graphs are constructed, so now we can start analyzing them to get a feel for what our data contains.

We’re going to look at a few different graph attributes and metrics.

Graph Density: A dense graph is a graph in which the number of edges is close to the maximal number of edges, i.e. with almost all nodes connected. Density is measured between 0 and 1.
Graph Transitivity: Transitivity is the overall probability for the network to have adjacent nodes interconnected.
Average Clustering: The average clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Nodes tend to create tightly knit groups characterised by a relatively high density of ties.
Greedy Modularity Communities: Find communities in graph using Clauset-Newman-Moore greedy modularity maximization.

We’re also verifying if the graph is directed, and if it is already weighted.

Some information about our graphs
Some descriptive characteristics about our graph

Drawing our Graph

Now that we’ve created our graph, we can draw it. We’ll just pass the graph to a few different drawing layouts to get a general feel for what’s in the dataset.

Draw four graphs with different layouts

Improving our visualization

Looking at the graphs, they are pretty ugly, so we need to do some work to make our data more legible and more importantly, make the visualization show something useful.

We’re going to create a number of different lists and dictionaries to use when we plot our nodes and edges in groups based on weights and with differentiated labels.

We’ll start by creating lists containing nodes that have either a global, private or reserved IP addresses, so that we can draw these with different colors and node shapes. Next, we’re going to add some dictionaries of nodes based on Suricata alert severity. This will allow us to draw node labels by their severity.

We also need to create a list of weights to automatically adjust the node sizes based on the alertweight weight. This will make our nodes larger if they have a higher count of Suricata alerts

Create lists to contextualize our nodes and edges by severity

Directed graph — In and Out edges

As we’re using a directed graph, our edges have a direction called In and Out, and we can enumerate these directed via G.in_edges() and G.out_edges(). This allows us to draw incoming and outgoing connections individually.

We’re going to create a set of In and Out edges for each severity, 0–3, so we can draw these separately in different colors to help us identify critical connections.

We also need the list of weights for each of these groups, so that we can use these to draw our edges.

Lastly, we also need dictionaries containing our edge labels, so that we can draw the edges' individually by severity.

Create lists for In and Out edges

Drawing the graph

If your graph is a Small Graph, we can going to go ahead now and visualize our graph. If your graph is a large graph, there’s some help in the section Large Networks at the end of this article, but you may want to read through the following sections anyway, or you’ll miss some pretty graphs!

We’re going to be using NetworkX’s Spring Layout

It based on the Fruchterman-Reingold force-directed algorithm, and simulates an anti-gravity force that repels nodes from each other.

You’ll see that we are using the lists and dictionaries we created earlier for the edge_list and node_list, and edge_label parameters. We also pass our calculated alertweight weight. The weight will be passed on and used by the algorithm to determine the strength of the springs repelling the nodes.

Lastly, we also use the node_weights list we created to draw an outline for each node, with the size of a node determined by the weight. We also color the node outline based on weight.

Draw the graph
Graph showing suspicious network activity

Analyzing the graph

There we have it! Our contextualized Suricata and Zeek Network Graph.

We can clearly see nodes of interest, because they have a halo around them, indicating that a large amount of alerts were seen against that host. And we can see if the nodes are internal, private IP addresses, or public and internet facing ones.

We can also see Suricata alerts between hosts, with different colors to show the severity. In addition, for hosts with many network connections the blue edges increase in intensity. This allows us to identify suspicious high-volume connections even if they don’t trigger a Suricata signature.

  • Suricata Severity 3 = Red
  • Suricata Severity 2 = Yellow
  • Suricata Severity 1 = Green
  • Network Connections = Blue

Different Graph layouts

We can experiment with different graph layouts to improve the legibility of the visualization and identify different patterns.

For example, below we first create a Circular Layout, positioning our nodes around a circle. We then pass that layout to the Spring layout to force the graph into a better order. You can immediately see that it bunches the nodes differently.

Circular and Spring Layout
Circular and Spring Layout

Shell Layout

Lastly, we can also use a Shell Layout

What’s nice about the shell layout is that it allows us to define groups, called “shells”, to plot our nodes in concentric circles. In our example, we’ll use the private and global IP’s as shells, but you can create different lists, for example based on Suricata severities.

You can see the global IP’s on the outside of the graph, and the private IP’s towards the centre.

Shell Layout
Shell Layout — you can see the Shells going from Inside to Outside

Conclusion

We’ve combined Zeek and Suricata data to create a unified network graph, and we’ve used weights based off of connection and alert volume to help identify suspicious nodes. You can quickly highlight nodes and communications of interest to guide your further investigation.

Of course, there are always some improvements we can make:

  1. Develop a better weight algorithm: right now we don’t take the severity into account for example
  2. Assign the count of severity 1,2, and 3 to every edge, instead of drawing them separately. We are actually only showing a subset of the connection for each severity, even though it’s sufficient to identify suspicious nodes and activity.
  3. Adjust the edge line widths for the Suricata alerts based on weight
  4. Create a function to draw the graph
  5. Only plot connections and nodes above or below a certain threshold, for example based on alertweight

Large Networks

While what we’ve done so far works really well for smaller networks, it’s the last point (5) that can be leveraged to help us still plot larger networks in a meaningful way. We can create a list of edges based on weights quite easily, for example.

strong_edges = [(u, v) for (u, v, d) in G.edges(data=True) if d["alertweight"] >= 0.5]

If you do find yourself analyzing a large data set, you can still identify the more interesting nodes this way.

Filter by weight
Strong Edges:

[((IPv4Address('179.191.108.58'), IPv4Address('10.2.17.101')), 10), ((IPv4Address('10.2.17.2'), IPv4Address('10.2.17.101')), 6), ((IPv4Address('177.87.0.7'), IPv4Address('10.2.17.101')), 2)]


Weak Edges:

{(IPv4Address('10.2.17.101'), IPv4Address('40.122.160.14')): 1, (IPv4Address('10.2.17.101'), IPv4Address('98.142.109.186')): 2, (IPv4Address('10.2.17.101'), IPv4Address('10.2.17.2')): 10}

Don’t forget that you can also offload a lot of the heavy aggregation and processing lifting to Brim and ZQ like we did for the alert counts. Z has a growing set of aggregator and processor functions.

Next time — Graph algorithms

But we’ve only just begun delving into network graph algorithms, such as local clustering and communities. It is these that will allow us to work with larger data sets, by identifying communities and creating subgraphs, and also by plotting nodes with higher centralities.

For example, we can look at centrality metrics:

Degree: Measures number of incoming connections
Closeness: Measures the minimum number of step stone node needs to connect to others in the network
Eigenvector: Measures a nodes connection to other nodes who are highly connected. A node with a high degree is a key node like a router (or victim X spreading malware).

You can get a hint of what we can do with these below. We could for example plot the Greedy Modularity communities separately if we have several, or we could also use the centralities to create thresholds.

Next time we’ll learn how to work with larger data sets using centrality metrics and communities, so stay tuned!

Graph Centrality and Communities
Graph Communities and Centralities

# of Greedy Modularity Communities: 1

Top 3 Nodes with highest Degree Centrality

{IPv4Address('10.2.17.101'): 1.0, IPv4Address('40.122.160.14'): 0.01639344262295082, IPv4Address('45.14.226.115'): 0.01639344262295082}

Bottom 3 Nodes by Degree Centrality

{IPv4Address('45.14.226.115'): 0.01639344262295082, IPv4Address('40.122.160.14'): 0.01639344262295082, IPv4Address('10.2.17.101'): 1.0}

Top 3 Nodes by Closeness Centrality

{IPv4Address('10.2.17.101'): 1.0, IPv4Address('40.122.160.14'): 0.5041322314049587, IPv4Address('45.14.226.115'): 0.5041322314049587}

Bottom 3 Nodes by Closeness Centrality

{IPv4Address('45.14.226.115'): 0.5041322314049587, IPv4Address('40.122.160.14'): 0.5041322314049587, IPv4Address('10.2.17.101'): 1.0}

Top 3 Nodes by Eigenvector Centrality

{IPv4Address('10.2.17.101'): 0.7071067811865475, IPv4Address('52.183.220.149'): 0.09053574604251859, IPv4Address('13.107.19.254'): 0.09053574604251857}

Bottom 3 Nodes by Eigenvector Centrality

{IPv4Address('13.107.19.254'): 0.09053574604251857, IPv4Address('52.183.220.149'): 0.09053574604251859, IPv4Address('10.2.17.101'): 0.7071067811865475}

Further Reading

Complex Network Analysis in Python: Recognize — Construct — Visualize — Analyze — Interpret by Dmitry Zinoviev
Network Science with Python and NetworkX Quick Start Guide: Explore and visualize network data effectively by Edward L. Platt
A First Course in Network Science by Filippo Menczer, Santo Fortunato, and Clayton. A. Davis
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney

--

--

Oliver Rochford
Brim Security

Oliver is a Security Subject Matter Expert at Brim Security