Alignment

Safety

The process of adjusting an AI model so its behavior aligns with human values and expectations. Includes RLHF, constitutional AI, and other safety techniques.

Related terms

RLHF (Reinforcement Learning from Human Feedback)Hallucination