The lookup table that maps token IDs to dense vectors.
After the tokenizer has converted an input text into a sequence of integer IDs, the next step is to transform these IDs into meaningful vector representations. This is accomplished using an embedding matrix. An embedding matrix is a large, learnable lookup table that is a core component of the language model. The matrix has a specific structure: it has 'V' rows and 'D' columns, where 'V' is the size of the vocabulary (the total number of unique tokens the model knows) and 'D' is the dimensionality of the embedding space (a hyperparameter, often 768, 1024, or larger for LLMs). Each row in this matrix corresponds to a unique token ID from the vocabulary. The row itself is a dense vector of 'D' floating-point numbers, known as the 'embedding vector' for that token. When the model receives a sequence of token IDs as input, it performs a simple lookup operation. For each ID in the sequence, it retrieves the corresponding row (the embedding vector) from the embedding matrix. The result is a new matrix where the sequence of integer IDs has been replaced by a sequence of dense vectors. For example, an input sequence of length 'L' would be transformed from a vector of shape [L] into a matrix of shape [L, D]. Crucially, this embedding matrix is not static. Its values are initialized randomly at the beginning of training and are then treated as model parameters. During the training process, backpropagation updates the values in the embedding matrix, learning representations that place semantically similar tokens closer together in the embedding space. This learned matrix thus encodes a rich understanding of the relationships between the tokens in the vocabulary.