Tokenization, Embeddings & Representation

How text is converted into a format LLMs can understand.

4 days

Topics in this Chapter

Solving the out-of-vocabulary problem with subword units.

From raw text to a list of integer IDs.

The lookup table that maps token IDs to dense vectors.

How Transformers create dynamic representations unlike Word2Vec.