Table definition examples

Introduction

Sneller can be used to ingest data from a variety of sources. This section lists sample definition.json files that can used for easy setup.

Create new table

All (definition) state for a table is stored in a single JSON file in S3. See here for the full specification.

Creating a new table is straightforward:

  • create a definition.json file and update the pattern attribute to point to the correct location in the source bucket
  • copy the definition.json file to the ingestion bucket under the path db/<db-name>/<table-name>/ (and update <db-name> and <table-name> accordingly)
  • If not already configured for another table, enable S3 Event Notifications for the source bucket

That is all there is to it (and see here for an example). If there is existing data in the source bucket it will be ingested or when a new object is created it will be added automatically.

Table template definitions

AWS CloudTrail

CloudTrail is Amazon’s most comprehensive logging service that virtually includes everything (with a few very exceptions). We have provided the Terraform scripts to enable CloudTrail logging and making sure it is ingested into Sneller. If you rather setup CloudTrail ingestion manually, then follow these instructions and use the following table definition:

  • click to show/hide content
  • aws-cloudtrail-definition.json
{
    "input": [
      {
        "pattern": "s3://config-bucket-XXXXXXXX/cloudtrail/AWSLogs/ACCOUNT-ID/CloudTrail/{region}/*/*/*/*.json.gz",
        "format": "cloudtrail.json.gz"
      }
    ],
    "partitions": [ { "field": "region" } ]
}

AWS Config

Setup: instructions.

  • click to show/hide content
  • aws-config-definition.json
{
    "input": [
      {
        "pattern": "config-bucket-XXXXXXXX/AWSLogs/ACCOUNT-ID/Config/{region}/*/*/*/*/*.log.gz",
        "format": "json.gz"
      }
    ],
    "partitions": [ { "field": "region" } ]
}

AWS WAF

Setup: instructions.

  • click to show/hide content
  • aws-waf-definition.json
{
  "input": [
    {
      "pattern": "s3://aws-waf-logs-DOC-EXAMPLE-BUCKET-SUFFIX/DOC-EXAMPLE-KEY-NAME-PREFIX/AWSLogs/ACCOUNT-ID/WAFLogs/{region}/web-acl-name/*/*/*/*/*/*.log.gz",
"format": "json.gz"
    }
  ],
  "partitions": [ { "field": "region" } ]
}

AWS S3 Inventory

Setup: instructions (use output format of Apache Parquet).

  • click to show/hide content
  • aws-s3-inventory-definition.json
{
    "input": [
      {
        "pattern": "s3://DESTINATION-PREFIX/SOURCE-BUCKET/config-ID/data/*.parquet",
        "format": "parquet"
      }
    ]
}  

AWS Route53 (Resolver query logs)

Setup: instructions.

  • click to show/hide content
  • aws-route53-definition.json
{
    "input": [
      {
        "pattern": "your_bucket_name/AWSLogs/ACCOUNT-ID/*.log.gz",
        "format": "json.gz"
      }
    ]
}

AWS VPC Flow Logs

Setup: instructions. We have provided the Terraform scripts to enable VPC flow logging for the default VPC and making sure it is ingested into Sneller.

  • click to show/hide content
  • aws-vpc-flow-logs-definition.json
{
  "input": [
    {
      "pattern": "s3://YOUR_SOURCE_BUCKET_HERE/vpcflowlogs/AWSLogs/YOUR_AWS_ACCOUNT_ID_HERE/vpcflowlogs/{region}/{yyyy}/{mm}/{dd}/*.log.gz",
      "format": "csv.gz",
      "hints": {
        "skip_records": 1,
        "separator": " ",
        "missing_values": [
          "-"
        ],
        "fields": [
          { "name": "version", "type": "int" },
          { "name": "account_id", "type": "string" },
          { "name": "interface_id", "type": "string" },
          { "name": "srcaddr", "type": "string" },
          { "name": "dstaddr", "type": "string" },
          { "name": "srcport", "type": "int" },
          { "name": "dstport", "type": "int" },
          { "name": "protocol", "type": "int" },
          { "name": "packets", "type": "int" },
          { "name": "bytes", "type": "int" },
          { "name": "start", "type": "datetime", "format": "unix_seconds" },
          { "name": "end", "type": "datetime", "format": "unix_seconds" },
          { "name": "action", "type": "string" },
          { "name": "log_status", "type": "string" }
        ]
      }
    }
  ],
  "partitions": [
    { "field": "region" },
    { "field": "date", "value": "$yyyy/$mm/$dd" }
  ],
  "retention_policy": {
    "field": "end",
    "valid_for": "1m"
  }
}

AWS S3 Access

Setup: instructions.

Note: Alternatively CloudTrail can be used, see here for more info.

  • click to show/hide content
  • aws-s3-access-definition.json
{
  "input": [
    {
      "pattern": "s3://YOUR_SOURCE_BUCKET_HERE/AWSLogs/YOUR_AWS_ACCOUNT_ID_HERE/*.log.gz",
      "format": "csv.gz",
      "hints": {
        "skip_records": 1,
        "separator": " ",
        "missing_values": [
          "-"
        ],
        "fields": [
          { "name": "bucket_owner", "type": "string" },
          { "name": "bucket", "type": "string" },
          { "name": "time", "type": "datetime" }
        ]
      }
    }
  ]
}

AWS ELB

  • click to show/hide content
  • aws-elb-definition.json
{
  "input": [
    {
      "pattern": "s3://bucket/prefix/AWSLogs/ACCOUNT-ID/elasticloadbalancing/{region}/*/*/*/*.log.gz",
      "format": "csv.gz",
      "hints": {
        "skip_records": 1,
        "separator": " ",
        "missing_values": [
          "-"
        ],
        "fields": [
          { "name": "type", "type": "string" },
          { "name": "time", "type": "datetime" },
          { "name": "elb", "type": "string" }
        ]
      }
    }
  ]
}