Defining large language models by their scale, architecture, and training.
A Large Language Model (LLM) is a type of artificial intelligence model specifically designed to understand, generate, and interact with human language. What makes them 'large' are three key dimensions: the size of the model itself (the number of internal parameters or 'weights'), the vast amount of data they are trained on, and the immense computational resources required for their training. At a technical level, an LLM is a deep neural network, almost universally based on the Transformer architecture. Its fundamental task is next-token prediction. During training, it is fed an enormous corpus of text from the internet, books, and other sources, and its sole objective is to predict the next word (or token) in a sequence. For example, given the input 'The cat sat on the', the model learns to predict 'mat' with high probability. By performing this simple task billions of times over trillions of words, the model learns incredibly complex patterns, including grammar, syntax, semantics, factual knowledge, and even reasoning abilities. The key breakthrough is that at a certain scale, these quantitative improvements in prediction lead to qualitative leaps in capability, known as 'emergent abilities.' Smaller models might learn grammar, but only very large models exhibit the ability to perform tasks they were never explicitly trained on, such as writing poetry, translating languages, or generating computer code, simply by being prompted in natural language. They are not databases or search engines; they don't 'look up' answers. Instead, they generate responses probabilistically, token by token, based on the patterns they have learned from their training data.