Kubernetes Rook/Ceph and Storage Failures
Whatever Can Go Wrong, Will Go Wrong
Rook/Ceph and Storage Failures
Imagine running a 200-node Kubernetes cluster, and suddenly you lost a node or even a ToR switch. What is the state of your persistent storage that your application relies on? How can you make sure your storage is always available? How can you time and plan how long it takes for your storage to get back to 100% resiliency? In this presentation we’ll go over the basics of storage demands (RPO/RTO), How different types of replications in Ceph impact our recovery time, and how components failure such as drive, node or cluster determine how long we are at risk. We’ll include a live demo of a Rook/Ceph recovery process from a failed component. We’ll show what components of Rook are recreated, how Ceph behaves during components/pods recreation, and what is the impact on the application while these failures occur (In our case the application will be MariaDB).
This video is part of a series where we bring the best videos to you, we have made some modifications to this video.
Speaker: Sagy Volkov, Red Hat
Publication Permissions: Original video was published with the Creative Commons Attribution license (reuse allowed).
Attribution Credits:
Source: https://youtu.be/e0_M1Keed6M
Please consider subscribing to CNCF channel
Join us on
Slack: https://join.slack.com/t/shrlrncom/shared_invite/zt-hllhjr28-hVYjXZ9WE8Byx4ZZS8zP8Q
Reddit: https://www.reddit.com/user/ShareLearn
Telegram: https://t.me/shrlrn
Facebook: https://www.facebook.com/shrlrn
Instagram: https://www.instagram.com/shrlrncom
Pinterest: https://www.pinterest.com/shrlrncom
Twitter: https://twitter.com/shrlrncom
by Share Learn
redhat openstack