[Workshop] AI Engineering 201: Inference
Optional introductory course for AI Engineers, free for all Summit attendees. Advanced knowledge of AI Engineering, led by instructor Charles Frye of the massively popular Full Stack LLM Bootcamp.
Part I: Running Inference
What is the workload?
Open vs Proprietary Models
Execution
End User Device
Over a Network
Serving Inference
Timestamps
0:00:00 Intro & Overview
0:03:52 What is Inference?
0:10:16 Proprietary Models for Inference
0:21:22 Open Models for Inference
0:30:41 Will Open or Proprietary Models Win Long-Term?
0:36:19 Q&A on Models
0:44:12 Inference on End-User Devices
1:04:32 Inference-as-a-Service Providers
1:10:00 Cloud Inference and Serverless GPUs
1:17:46 Rack-and-Stack for Inference
1:20:12 Inference Arithmetic for GPUs
1:27:07 TPUs and Other Custom Silicon for Inference
1:36:11 Containerizing Inference and Inference Services
by AI Engineer
linux foundation
1:16:08 — I looked into CloudFlare's GPU Workers and they are what I call here "inference-as-a-service", aka running specific models on your behalf, rather than what I call "serverless GPUs", aka running arbitrary workloads for you with scale-to-zero pricing.
12:20 — I missed a decimal point on the price of Anthropic. They were actually cheaper than OpenAI at the time. But with the new GPT-4 Turbo API announced at Dev Day, OpenAI is in fact cheaper than Anthropic now.