Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

July 22, 2024

Mozilla’s Llamafile open source project democratizes access to AI not only by making open models easier to use, but also by making them run fast on consumer CPUs. Lead developer Justine Tunney will share the insights, tricks, and hacks that she and the project community are using to deliver these performance breakthroughs, and project leader Stephen Hood will discuss Mozilla’s approach to supporting open source AI.

Recorded live in San Francisco at the AI Engineer World’s Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World’s Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Stephen
Open source AI at Mozilla. Formerly of del.icio.us, Yahoo Search. Co-founder of Storium (AI-assisted storytelling game) and Blockboard.

About Justine
Justine is a founder of Mozilla’s LLaMAfile project, a Google Brain alumni, and the owner of the Cosmopolitan C Library. She’s focusing on democratizing access to open source AI software while elevating its performance and quality.

source

by AI Engineer

linux foundation

33 thoughts on “Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney”

@longboarderanonymous5718

July 22, 2024 at 9:32 am

These individuals are pioneers of the Personal AI. Efficient, Universal, and Economical.
@7T7Soulz

July 22, 2024 at 9:32 am

this is future
@JimAmos

July 22, 2024 at 9:32 am

Hats off for the engineering feat. But in terms of application, we are still just talking about text summarization. And the image generation in your own demo was just as disappointing as ever. There's no killer app for LLMs yet even though we keep throwing money and science at it. What are we even doing?
@dbreardon

July 22, 2024 at 9:32 am

He said,, "Who remembers using the original Netscape Navigator?" ……..to that I say, who remembers using the original Mosaic browser? And then telnet before the graphical internet?
@johnkost2514

July 22, 2024 at 9:32 am

This is better than the Nvidia NIM solution (which is just containerization). Way better ..
@delq

July 22, 2024 at 9:32 am

Awesome, exactly what I have been looking for, no more virtual heavy environments, no more heavy nvidea cuda drivers ! Lets fricking go !!!
@indylawi5021

July 22, 2024 at 9:32 am

This is fantastic! I can't wait to try it out.
@tejaslotlikar3573

July 22, 2024 at 9:32 am

Now this is called achievement. Meanwhile the so-called "open"AI is looting people. You guys are awesome
@NeXTOoOoOoO

July 22, 2024 at 9:32 am

Wow! Really great work!
@FirstNameLastName-fv4eu

July 22, 2024 at 9:32 am

These cloud companies trying their best to keep the valuation high!!! This guy is the new CDO manager!!
@deadlokIV

July 22, 2024 at 9:32 am

Justine just shifted the timeline 💥🔀
@Charles-Darwin

July 22, 2024 at 9:32 am

Awesomesauce
@jav65

July 22, 2024 at 9:32 am

You just destroyed Nvidea stocks😂
@Godkidz7

July 22, 2024 at 9:32 am

Freedom and Justices are more expensive than Money and Power. No one live and rule forever.
Respects and Salute to you guys…
@leejacksondev

July 22, 2024 at 9:32 am

This is utterly brilliant. What a fantastic presentation. Amazing project.
@timchapman8539

July 22, 2024 at 9:32 am

I need an AI that can access the files on my hard drive. Does anyone have a suggestion? I don't want to upload them to the AI. I want the AI to access them directly.
@Neltharion2k

July 22, 2024 at 9:32 am

This is so awesome! Just tried out llava 1.5 7b llamafile and it worked out of the box running on my CPU, without eating all of my RAM! The token generation speed was good enough for me! And my CPU is ~8 years old. Holy cow!
@eggmaster88

July 22, 2024 at 9:32 am

Awesome work!
@JohnnysaidWhat

July 22, 2024 at 9:32 am

this guy is a fkn rockstar on stage I was totally blown away 🎉
@tollington9414

July 22, 2024 at 9:32 am

Absolutely fascinating and totally genius
@navodpeiris9054

July 22, 2024 at 9:32 am

loving the llamafile already. this is how i deploy local LLMs now!
@LaHoraMaker

July 22, 2024 at 9:32 am

I really like the idea of a Threadripper configuration but… does anyone have a reference machine configuration for that? I'd like to compare the price to existing alternatives like the dual RTX4090 setup that is mentioned!
@fkxfkx

July 22, 2024 at 9:32 am

well this feels like something out of left field.🤷‍♂️

Seems too good to be true. What are the catches?
@craigscott4205

July 22, 2024 at 9:32 am

Justine an absolute champion!
@snow8725

July 22, 2024 at 9:32 am

Fuck yeah!!!
@sammcj2000

July 22, 2024 at 9:32 am

Such awesome work. I love that the enhancements are going back to llama.cpp upstream too. We really don’t want other Internet Explorer with NVidia and OpenAI
@XEQUTE

July 22, 2024 at 9:32 am

Love it!!
@TalsBadKidney

July 22, 2024 at 9:32 am

let's go to the gym
@johnkintree763

July 22, 2024 at 9:32 am

I went to the llamafile github page, downloaded one of the LLMs, and ran it on my Oneplus 11 smartphone. That's the first time I have been able to run an LLM on-device.
@raiumair7494

July 22, 2024 at 9:32 am

Refreshing indeed – tokens per seconds is one measure and I like eval speed but what and how do you measure that?
@rayhere7925

July 22, 2024 at 9:32 am

This is a game-changing breakthrough. Can't underplay this any other way.
@aiforsocialbenefit

July 22, 2024 at 9:32 am

Awesome. Great project and presenters!
@aeu126

July 22, 2024 at 9:32 am

This was my favorite presentation!