Wasm Is Becoming the Runtime for LLMs – Michael Yuan, Second State

November 16, 2023

Today’s LLM apps, including inference apps and agents, are mostly written in Python. But this is about to change. Python is too slow, too bloated, and too complicated to install and manage. That’s why popular LLM frameworks, such as llama2.c, whisper.cpp, llama.rs, all thrive to have zero Python dependency. All those post-Python LLM applications and frameworks are written in compiled languages (C/C++/Rust) and can be compiled into Wasm. With WASI NN, you can now create complex LLM apps in Rust and run them in Wasm sandboxes. Rust and Wasm could be high-performance and developer-friendly alternatives to Python today. The combination to develop and run LLM apps is more efficient, safe, high performance with small footrprint. In this talk, Michael will demonstrate how to run llama2 series of models in Wasm, how to develop LLM agents in Rust and run them in Wasm. In-production use cases, like LLM-based code review and book-based learning assistants, will be discussed and demoed.

source

by CNCF [Cloud Native Computing Foundation]

linux foundation