Applications & System Design for LLMs

The practical challenges of deploying LLMs in the real world.

4 days

Topics in this Chapter

The architecture of serving LLMs at scale.

Techniques like quantization to make models faster and smaller.

The key performance metrics for an LLM serving system.

A crucial optimization for speeding up text generation.