OPERATING SYSTEMSOS Linux

Retrieval Augmentation and Semantic Search at Scale – Ash Vardanian, Unum Cloud

Retrieval Augmentation and Semantic Search at Scale – Ash Vardanian, Unum Cloud

The topic of Vector Search and Retrieval-Augmented Generation has gained much attention in 2023. With thousands of open-source Embedding models on HuggingFace and dozens of Vector DBs on GitHub, there’s a lot to explore. Yet, not all scale well, and the high costs of AI work can hit hard.

This talk won’t be a tutorial. Instead, we’ll dive into technical benchmarks, spotlight the issues that block different solutions from scaling, and share lessons from multiple CLIP-like AI pre-training experiments and serving over 10 Billion vectors from a single machine.

We will cover the design decisions that went into the USearch and UForm open-source libraries, and will answer questions, like what is the optimal GPU for my inference workload? Can one serve search results from SSDs instead of RAM? And which tools will let me do that?

source by The Linux Foundation

linux foundation