OPERATING SYSTEMSOS Linux

Database Native Support for CDC Streaming Pipeline – Neha Maheshwari & Vinit Gupta, Amazon

Database Native Support for CDC Streaming Pipeline – Neha Maheshwari & Vinit Gupta, Amazon

Most consumers of Cassandra’s CDC logs set up very similar data processing pipelines with responsibilities including but not limited to – transforming the log format into application-consumable structures, de-duplication of log entries across multiple replicas of a table/partition and writing the log records to common streaming applications such as Kafka or Amazon Kinesis. In this talk, we propose a database native solution to managing these commonly occurring problems that are repeatedly solved by applications consuming Cassandra’s CDC. We shall further discuss a re-imagined CDC architecture for Cassandra based on learning’s from these recurring data pipelines as well as other managed CDC solutions – such as those of ScyllaDB, Amazon DynamoDB and Microsoft Azure.

source

by The Linux Foundation

linux foundation