OPERATING SYSTEMSOS Linux

Ø Kubernetes Failover Improvement: Non-Graceful Node Shutdown – Yuiko Mori, NEC Solution Innovators

Ø Kubernetes Failover Improvement: Non-Graceful Node Shutdown – Yuiko Mori, NEC Solution Innovators, Ltd.

Today, many companies are using Kubernetes in production environments, but there are some big issues in Kubernetes. One of them is that pods with persistent volumes (PV) in a StatefulSet fail to migrate when a Kubernetes node is down. We have faced it with our customer’s projects. When a worker node in a Kubernetes cluster goes down or in a non-recoverable state such as hardware failure or broken OS, the controller will try to terminate pods on it and recreate them on different nodes. But until v1.23, it fails for terminating pods if they are attached with PV because Kubernetes can’t detach PVs attached pods on failed nodes. Many of companies did force deletion of pods; in this case, as a workaround in this situation, but originally, pods should be deleted automatically in this case. In order to fix the issue, we had developed a new feature “Non-graceful node shutdown”. It graduated to GA in Kubernetes v1.28 which has been released on August 15. In this session, I will introduce the issue and the new feature “Non-graceful node shutdown” which is a solution for this issue. And also, I will talk about future work and the “Self Node Remediation” project which we join a development team and also which will be one of the solutions of the future work.

source

by The Linux Foundation

linux foundation