OLAP on Your Cassandra Data with Arrow, Flight SQL, ADBC, and DuckDB – Philip Moore, Voltron Data
OLAP on Your Cassandra Data with Arrow, Flight SQL, ADBC, and DuckDB – Philip Moore, Voltron Data
Cassandra data is not easily accessed for analytical purposes. The spark cassandra connector (through cql) is traditionally leveraged to tackle this problem but spark workloads easily saturate Cassandra and can negatively impact transactional workloads. In this talk we will demonstrate how it is possible to directly convert cassandra data (sstable snapshots) into parquet files and query them with a modern OLAP stack. We will also discuss additional steps that could lead to real-time OLAP on cassandra data. The future of OLAP is converging around a set of standards and technologies including Apache Arrow, Arrow JDBC, and ADBC to enable OLAP engine pluggability. By building out Cassandra’s integration with this ecosystem, we allow for cassandra interoperability with next generation analytics tools. We leveraged the power of Flight SQL Server to enable users to submit OLAP SQL (and Ibis) queries against parquet data exported from Cassandra using Arrow JDBC and ADBC. Some things you’ll learn: How to run Flight SQL server – (with DuckDB and SQLite back-ends) with Parquet files stored in S3 How to secure Flight SQL How to run Flight SQL in Kubernetes – with Graviton (arm64) CPUs
by The Linux Foundation
linux foundation