RLHF (Reinforcement Learning from Human Feedback)
< GlossaryA training method where the model improves based on human evaluations. Used for alignment — making models helpful, safe, and instruction-following.
A training method where the model improves based on human evaluations. Used for alignment — making models helpful, safe, and instruction-following.