Reducing Client Latency and Timeouts When Running Cassandra on Public Cloud – German Eichberger

January 31, 2024

Reducing Client Latency and Timeouts When Running Cassandra on Public Cloud – German Eichberger, Microsoft

Running Cassandra in a public cloud has been popular and provides good performance. However, unlike an on premise data center cloud providers often perform maintenance in the background, be it mitigating (hardware) problems, upgrading components or host operating systems, or rebalancing their fleet. During those times a virtual machine might be paused, the network or disk unavailable, or something else altogether. Though Cassandra will eventually remove the node from the ring, during this time clients will experience increased latency when being connected to this node or trying to reach data from the node’s replica – often leading to database transactions failing and needing to be retried.

This talk will explore metrics helping to detect incidents faster and how to turn them into actionable automation to speed up time-to-recovery. It will show on the example of Microsoft Azure’s Scheduled Events how these maintenance announcements can be used to mitigate latency events. It will also discuss best practices like speculative execution on how to configure clients to take advantage of this, reduce latency even further, and make latency more predictable.

source

by The Linux Foundation

linux foundation