Back to Artificial Intelligence & LLMs

RLHF (Reinforcement Learning from Human Feedback) & Safety

Aligning LLMs with human values and ensuring safe outputs.

5 days

Topics in this Chapter

1

Reinforcement Learning Basics

Introduction to agents, environments, actions, states, and rewards.

2

Reward Modeling

Training a model to predict human preferences.

3

Policy Optimization (PPO)

Using RL to fine-tune the LLM to maximize the reward score.

4

AI Safety & Alignment

Broader concepts of making AI helpful, honest, and harmless.

GeekDost - Roadmaps & Snippets for Developers