OPERATING SYSTEMSOS Linux

Efficient and Cross-Platform LLM Inference in the Heterogenous Cloud – Michael Yuan, Second State

Efficient and Cross-Platform LLM Inference in the Heterogenous Cloud – Michael Yuan, Second State

As AI/LLM applications gain popularity, there are increasing demands to run and scale them in the cloud. However, compared with traditional cloud workloads, AI workloads are heavily reliant on the GPU. Linux containers are not portable across different hardware devices, and traditional container management tools are not setup to re-compile applications on new devices at deployment time. Cloud native Wasm provides a new portable bytecode format that abstracts away GPUs and hardware accelerators for these applications. With emerging W3C standards like WASI-NN, you can write and test LLM applications in Rust on your Macbook, and then deploy on a Nvidia cloud server or an ARM NPU device without re-compilation or any change to the Wasm bytecode file. The Wasm apps can also be managed by existing container tools such as Docker, Podman, and K8s, making them a great alternative to Linux containers for this new workload. This talk will discuss how WasmEdge (CNCF sandbox) implements WASI-NN and supports a large array AI/LLM applications. You will learn practical skills on how to build and run LLM applications on ALL your local, edge, and cloud devices using a single binary application.

source

by The Linux Foundation

linux foundation