Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes – Kevin Klues, NVIDIA

December 6, 2023

Dynamic Resource Allocation (DRA) is new Kubernetes feature that puts resource scheduling in the hands of 3rd-party developers. It moves away from the limited “countable” interface for requesting access to resources (e.g. “nvidia.com/gpu: 2”), providing an API more akin to that of persistent volumes. In the context of GPUs, this unlocks a host of new features without the need for awkward solutions shoehorned on top of the existing device plugin API. These features include: * Controlled GPU Sharing (both within a pod and across pods) * Multiple GPU models per node (e.g. T4 and A100) * Specifying arbitrary constraints for a GPU (min/max memory, device model, etc.) * Dynamic allocation of Multi-Instance GPUs (MIG) * … the list goes on … In this talk, you will learn about the DRA resource driver we have built for GPUs. We walk through each of the features it provides, and conclude with a series of demos showing you how you can get started using it today.

source

by CNCF [Cloud Native Computing Foundation]

linux foundation