Ingest AWS logging (Terraform setup)

Prerequisites

Before ingesting AWS logging in this script, make sure:

  1. You should already have credentials to access your AWS account.
  2. You registered with Sneller Cloud and you have a bearer token.
  3. You setup Sneller with a proper ingestion bucket and IAM roles.

If you followed the cloud onboarding, then you should be fine. You can download the Terraform scripts as follows:

git clone https://github.com/snellerinc/examples 
cd examples/terraform/ingest-aws-logging

All examples are written for Linux and should also work on MacOS. All examples have also been tested in WSL2 (Windows Subsystem for Linux).

Summary

The scripts have been written to work with the onboarding scripts, so you should be able to run the scripts like this:

export TF_VAR_sneller_token=<your-bearer-token>
terraform init   # only needed once
terraform apply

The terraform scripts perform the following tasks:

  1. Create an S3 bucket for AWS logging and allow AWS to store logging in it. The script tries to detect the prefix that is used during onboarding and will use the same prefix for the logging bucket.
  2. Allow the Sneller IAM role to access (read-only) the bucket with AWS logging for ingestion.
  3. Create a table definition (database: aws, table: cloudtrail) that ingests the CloudTrail logging.
  4. Create a table definition (database: aws, table: flow) that ingests the default VPC flow logging.

AWS logging batches the delivery of events, so it may take a while before data shows up in Sneller. You can browse through the AWS console to make sure that API calls are invoked on your account. Spinning up an EC2 instance in the default VPC will also generate some VPC activity.

If not already done, then set the variables to access the Sneller query engine:

export SNELLER_TOKEN=<your token here>
export SNELLER_ENDPOINT=https://snellerd-production.<region>.sneller.ai

Now run the following command to determine the number of items per service (via CloudTrail).

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/json" \
     -s "$SNELLER_ENDPOINT/query?database=aws" \
     --data-raw "SELECT eventSource, COUNT(*) FROM cloudtrail GROUP BY eventSource ORDER BY COUNT(*) DESC LIMIT 100"

This command can be used to determine the number of packets per interface ID (via VPC flow logs):

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/json" \
     -s "$SNELLER_ENDPOINT/query?database=aws" \
     --data-raw "SELECT interface_id, COUNT(*) FROM flow GROUP BY interface_id ORDER BY COUNT(*) DESC LIMIT 10"

Details

Setting up Terraform

These scripts depend on the AWS and Sneller provider. The AWS provider uses the current user’s AWS credentials, so make sure you have sufficient rights.

This script uses the following variables:

  • region specifies the AWS region of your Sneller instance (default: us-east-1).
  • sneller_token should hold the Sneller bearer token. If it’s not set, then Terraform will ask for it.
  • prefix specifies a prefix that is used for the S3 bucket name. If you don’t specify a prefix, then it will try to autodetect the prefix or create a new random prefix.

The next steps require the SQS queue and IAM role that Sneller uses, so it should be determined using the sneller_tenant_region data source that provides this information.

Some “magic” is used to automatically derive the prefix from the current Sneller IAM role. When that’s not possible, a random prefix will be generated:

S3 bucket for AWS logging

All AWS logging is written into an S3 bucket with the following characteristics:

  • Disallow public access.
  • Add the bucket policy to allow the AWS services to write to the bucket.
  • Add S3 event notification to notify Sneller when new AWS logging objects are available.

Note that this file doesn’t contain the actual bucket policy, but merges all the bucket policies for the individual logging services.

Enable IAM role to access AWS logging

The IAM role that is assumed by Sneller to read the source data should be granted access to the AWS log data:

CloudTrail logging

In this example, all service logging in all regions will be enabled, but this can be customized using event filtering. The CloudTrail logging is stored in the logging bucket, so a policy is added to allow AWS to write to this bucket.

The data is exposed via the cloudtrail Sneller table that is created here as well. The table is partitioned based on the region of the CloudTrail data. This makes queries on a single region faster and more cost efficient.

VPC flow logging

In this example, the VPC flow logging of the region’s default VPC is logged. The VPC flow logging is stored in the logging bucket, so a policy is added to allow AWS to write to this bucket.

The data is exposed via the flow Sneller table that is created here as well.