Sneller Cloud Onboarding Tutorial

Tutorial

This tutorial assumes you have successfully completed the steps in the previous section and it continues from there.

Add some more data

With event notifications setup we can copy simply new files to the S3 source bucket. You can either add some ND-JSON encoded files to the sample_data folder and run terraform apply again or manually copy some JSON data using the AWS CLI:

Note: since Sneller supports dynamic schemas. it will ingest any valid JSON data, irrespective of the structure of the data.

export SNELLER_SOURCE=$(terraform output -json sneller_source | jq -r '.')
aws s3 cp *.ndjson s3://$SNELLER_SOURCE/sample_data/

AWS will automatically send S3 event notifications for the source bucket to the SQS queue and Sneller will immediately add the data to the table.

Query again

After a few seconds the data will be available if you run the query again and you should see a larger count of records:

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/json" \
     -s "$SNELLER_ENDPOINT/query?database=$SNELLER_DATABASE" \
     --data-raw "SELECT COUNT(*) FROM $SNELLER_TABLE"

Of course you can continue copying more data into the source bucket in order to ingest it.

Create new table

Creating a new table is very simple since it only involves creating a new definition.json file in the right location and adding data to the source bucket.

For this example we will be creating a table for the gharchive (GitHub archive) data and ingest some data.

Add table definition

You can simply create the definition.json like this and copy it into the S3 ingestion bucket:

export SNELLER_SOURCE=$(terraform output -json sneller_source | jq -r '.')
export SNELLER_INGEST=$(terraform output -json sneller_ingest | jq -r '.')
cat > definition.json <<EOF
{
  "input": [
    {
      "pattern": "s3://$SNELLER_SOURCE/gharchive/*.json.gz",
      "format": "json.gz"
    }
  ]
}
EOF
aws s3 cp definition.json s3://$SNELLER_INGEST/db/demo/gharchive/

Note: the pattern in the definition.json file refers to the source bucket whereas the definition.json itself goes into the ingestion bucket.

Add some data

Simply copy some data into the source bucket at the correct path to add it to the gharchive table:

wget https://data.gharchive.org/2015-01-01-{15..16}.json.gz
aws s3 mv 2015-01-01-15.json.gz s3://$SNELLER_SOURCE/gharchive/
aws s3 mv 2015-01-01-16.json.gz s3://$SNELLER_SOURCE/gharchive/

Query the table

Now you can simply query the gharchive table (from the demo database):

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/json" \
     -s "$SNELLER_ENDPOINT/query?database=demo" \
     --data-raw "SELECT COUNT(*) FROM gharchive"

or do a more adventurous query …

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/x-ndjson" \
     -s "$SNELLER_ENDPOINT/query?database=demo" \
     --data-raw "SELECT type, COUNT(*) FROM gharchive GROUP BY type ORDER BY COUNT(*) DESC"

… and copy some more data …

wget https://data.gharchive.org/2015-01-01-{17..18}.json.gz
aws s3 mv 2015-01-01-17.json.gz s3://$SNELLER_SOURCE/gharchive/
aws s3 mv 2015-01-01-18.json.gz s3://$SNELLER_SOURCE/gharchive/

… and repeat the query (for more results) …

curl -H "Authorization: Bearer $SNELLER_TOKEN" \
     -H "Accept: application/x-ndjson" \
     -s "$SNELLER_ENDPOINT/query?database=demo" \
     --data-raw "SELECT type, COUNT(*) FROM gharchive GROUP BY type ORDER BY COUNT(*) DESC"

Ingesting CloudTrail

For a more elaborate example, see ingesting AWS CloudTrail (and some other AWS services).

Final words

Final words: although we would hate to see you go, in case you want to tear everything down, here’s how to do it:

terraform destroy

Hasta la vista, baby 😀