Aligning LLMs with human values and ensuring safe outputs.
Introduction to agents, environments, actions, states, and rewards.
Training a model to predict human preferences.
Using RL to fine-tune the LLM to maximize the reward score.
Broader concepts of making AI helpful, honest, and harmless.