[Workshop] AI Engineering 201: Inference

November 8, 2023

Optional introductory course for AI Engineers, free for all Summit attendees. Advanced knowledge of AI Engineering, led by instructor Charles Frye of the massively popular Full Stack LLM Bootcamp.

Part I: Running Inference

What is the workload?
Open vs Proprietary Models
Execution
End User Device
Over a Network
Serving Inference

Timestamps

0:00:00 Intro & Overview
0:03:52 What is Inference?
0:10:16 Proprietary Models for Inference
0:21:22 Open Models for Inference
0:30:41 Will Open or Proprietary Models Win Long-Term?
0:36:19 Q&A on Models
0:44:12 Inference on End-User Devices
1:04:32 Inference-as-a-Service Providers
1:10:00 Cloud Inference and Serverless GPUs
1:17:46 Rack-and-Stack for Inference
1:20:12 Inference Arithmetic for GPUs
1:27:07 TPUs and Other Custom Silicon for Inference
1:36:11 Containerizing Inference and Inference Services

source

by AI Engineer

linux foundation

2 thoughts on “[Workshop] AI Engineering 201: Inference”

Charles Frye

November 8, 2023 at 9:03 am
Permalink

1:16:08 — I looked into CloudFlare's GPU Workers and they are what I call here "inference-as-a-service", aka running specific models on your behalf, rather than what I call "serverless GPUs", aka running arbitrary workloads for you with scale-to-zero pricing.
Charles Frye

November 8, 2023 at 9:03 am
Permalink

12:20 — I missed a decimal point on the price of Anthropic. They were actually cheaper than OpenAI at the time. But with the new GPT-4 Turbo API announced at Dev Day, OpenAI is in fact cheaper than Anthropic now.

Comments are closed.

You May Also Like

Ubuntu Budgie Review Easy and Fluent: )

Unix & Linux: Mount space on Ubuntu

Could I disable syslog.service after installing rsyslog on Ubuntu 18.04 server?

2 thoughts on “[Workshop] AI Engineering 201: Inference”