AVX-512

64-bit Integer Division with AVX-512

Integer division is an arithmetic operation that is not provided natively by SIMD instruction set extensions. In this article we provide a vectorized solution to successfully divide signed 64-bit integers by taking advantage of AVX-512

continue reading...

Branchless Code With AVX-512

Sneller uses 16 parallel data lanes for almost all tasks, including loading and decompressing data, all without the use of branches. We heavily rely on predicated instruction execution provided by the AVX-512 instruction set to achieve this. In this post, we will explain a simple example of converting a string to uppercase, which is frequently used in our string processing functions.

continue reading...

Accelerating Fuzzy Search using AVX-512

We present our SQL fuzzy string compare and contains functionality that allows multi GiB/s processing without any need for preprocessing or indexing. Yes, that is right, fuzzy functionality yet no planning needed!

continue reading...

Sneller: Querying terabytes of JSON per second

Learn how Sneller is capable of querying terabytes of JSON per second on medium sized clusters.

continue reading...

Accelerating Regular Expressions with AVX-512

We present a high-performance regular expression engine that uses 16 parallel lanes, that does not need branching or backtracking. This engine is developed for the Intel Icelake processor, and is written in AVX-512 assembly.

continue reading...

Blazing Fast Unicode-aware ILIKE in AVX-512

We present a method to perform case-insensitive comparison of UTF-8 encoded strings using 16 parallel lanes and no branching. This method is used to implement the ILIKE operator for the Intel SkylakeX processor, written in AVX-512 assembly.

continue reading...

64-bit Integers to Strings with AVX-512

This article explores the possibility of branchlessly converting multiple signed 64-bit integers to strings by taking advantage of AVX-512 extensions. Most research and implementations focus on improving the performance of converting a single value instead of performing multiple conversions at once. At Sneller, we use AVX-512 to process 16 values in parallel, and thus we would like to describe how we have done it in our query engine.

continue reading...

Building a SQL VM in AVX-512 Assembly

One of Sneller’s novel features is a bytecode-based virtual machine written almost entirely in AVX-512 assembly. While Sneller is far from the first project to incorporate SIMD acceleration into a query engine, our interpreter is unusual in that it is implemented entirely in assembly.

continue reading...