Measures helpfulness and accuracy within conversation context
Does the AI directly address the user's needs with actionable information? Does it provide complete solutions? Does it avoid dangerous advice or answering questions outside its domain?
Our AI evaluator analyzes the response against the full conversation history using a 5-tier rubric (90-100: Highly helpful, 70-89: Mostly helpful, 50-69: Partially helpful, 30-49: Limited usefulness, 0-29: Unhelpful).