Improving Bad Partition Handling in Apache Cassandra – Jordan West & Cheng Wang, Netflix
Improving Bad Partition Handling in Apache Cassandra – Jordan West & Cheng Wang, Netflix
Reading and compacting Bad Partitions have long been known to impact Cassandra performance. They have been the root cause of various production issues at Netflix. While there are several potential solutions for addressing them at an implementation level we must also deal with them today when they arise. There are several forms of bad partitions, which include: a) a partition that gets large in size several GBs+; b) a partition with many (millions or more) small rows, potentially spread across many sstables; c) a partition with many small rows and many of them have been deleted or expired; d) a partition with rows that themselves are very large (e.g. blobs of binary or text). In this talk, we present the approaches we use at Netflix to handle bad partitions when they arise. Specifically, we present how we identify, block, and mitigate them during production incidents. We will also share our on-going efforts on improving some of the existing tools as well as new tools for the Cassandra community. Additionally, we will present examples from real production incidents.
by The Linux Foundation
linux foundation