RLHF (Reinforcement Learning from Human Feedback)

Training

A training method where the model improves based on human evaluations. Used for alignment — making models helpful, safe, and instruction-following.

Related terms