Analyzing AWS VPC Flow Logs via CloudWatch Logs Insight and Athena

Crishantha Nanayakkara
5 min readMay 10, 2020

This blog post explains how we can leverage CloudWatch Logs Insight and Athena to analyze AWS VPC Flow logs in real time.

AWS Flow Logs

Flow logs can be enabled in three (03) levels in AWS.

  1. VPC level
  2. Subnet level
  3. ENI level

As mentioned, we only focus on VPC Flow logs in this blog post.

AWS VPC Flow Logs

AWS VPC Flow logs can track following information related to the VPC traffic.

  1. Source/Destination IP address
  2. Source/Destination Port
  3. Protocol
  4. Bytes
  5. ALLOW/ REJECT Status

You can send VPC Flow logs outputs to mainly two destinations.

  1. S3 Buckets
  2. CloudWatch Logs

These logs can be then forwarded to either CloudWatch Logs Insight or Athena to query them interactively (See Figure 1).

Figure 1

So lets try the above (figure 1) step by step now.

Steps

Step 01: Create a Custom VPC

Create a Custom VPC with a public subnet, if you do not have one already. Create an EC2 instance (t2.micro) and attach it to the public subnet.

(P.Note: You may use either a Default / Custom VPC here. But it is recommended to use a Custom VPC in production setups. So lets stick to best practices)

Step 02: Create a VPC Flow Log (Destination = CloudWatch Logs)

Select the Custom VPC that you have created and click the Flow Logs tab (See Figure 02).

Figure 02

Now, click the Create flow log button and select following:

Filter: All

Max Aggregation Interval: 10 minutes

Destination: Send to CloudWatch Logs

Destination Log Group: <Here you need to select a Log Group under CloudWatch. If you do not have one created, please do create it especially for vpc_flow_logs>

Figure 03

IAM Role: <It is required to set an IAM Role in order to send EC2 flow logs to CloudWatch Logs. For this, you are required to click “Setup Permissions link”, which is shown just below it. Click “Allow” button to set permissions to the role created>

Figure 04

Now in the IAM Role drop down search for the Role that you have just created and click Create button to confirm the creation of the flow log. This flow log configuration, will send all the logs, which run through the Custom VPC and store them in the CloudWatch Log Group that you have created (See Figure 05).

Figure 05

Step 03: Analyze CloudWatch Logs

Once the VPC Flow Log was created, you can see how CloudWatch Logs are getting the logs while the VPC interacts with the IP traffic it is interfaced.

Go to CloudWatch → Log Groups → Select the ENI of the targeted EC2 instance.

(P.Note: You can see the EC2 instances’ ENI by clicking the eth0 link shown in the EC2 instance description)

Figure 06
Figure 07

Select the ENI related log stream, you will see something similar to the following:

Figure 08

Step 04: Query CloudWatch Logs via CloudWatch Insights

Go to CloudWatch → Select Logs → Select Insights

Select the CloudWatch Log Group from the top drop down, that you want to query

Execute the following query in the query box,

fields @timestamp, interfaceId, srcAddr, dstAddr

| filter interfaceId = ‘eni-09376f175e77d41c0’

| sort @timestamp desc

| limit 20

Figure 09

Step 05: Create a VPC Flow Log (Destination = S3 Bucket)

Create a S3 bucket (crishantha-vpc-flow-logs)

Copy the S3 bucket ARN using the copy ARN button (arn:aws:s3:::crishantha-vpc-flow-logs)

Go to VPC → Select the Custom VPC → Click Flow Logs tab → Click Create Flow Log

Figure 10

The above will create a VPC Flow Log pointing to S3 bucket output. Once the VPC Flow log was created, the respective S3 bucket is created with bucket policy attached to it.

Figure 11

Now, you can check the S3 bucket for any logs.

Figure 12

Step 6: Run Query via Athena

Go to Athena

Select the database as “default”

Enter the query to run on the “New Query 1” text box

P.Note: The following query was extracted from AWS Documentation [2]. You may change the bucket name, subscriber id, region-id in the S3 bucket location details.

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (

version int,

account string,

interfaceid string,

sourceaddress string,

destinationaddress string,

sourceport int,

destinationport int,

protocol int,

numpackets int,

numbytes bigint,

starttime int,

endtime int,

action string,

logstatus string

) PARTITIONED BY (`date` date)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ LOCATION ‘s3://crishantha-vpc-flow-logs/AWSLogs/129992820683/vpcflowlogs/us-east-1/’ TBLPROPERTIES (“skip.header.line.count”=”1");

Figure 13

Now create a partition to read the data for a condition.

ALTER TABLE vpc_flow_logs

ADD PARTITION (`date`=’2020–05–10')

location ‘s3://crishantha-vpc-flow-logs/AWSLogs/129992820683/vpcflowlogs/us-east-1/2020/05/09’;

Figure 14

Once the partition was created, you can run a query based on the partition, which was created.

Figure 15

References

1. CloudWatch Insight VPC Flow Log Sample Queries : https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax-examples.html

2. Athena VPC Flow Log Examples: https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs.html

--

--

Crishantha Nanayakkara

Enterprise Architect, Consultant @ FAO (UN), Former CTO, ICTA Sri Lanka