Appropriate certainty vs. hedging
Does the AI use appropriate levels of certainty? Are confident claims backed by conversation evidence? Is uncertainty acknowledged with phrases like 'I'll verify' or 'typically'?
Overconfident AI responses erode trust when claims turn out wrong. Scores below 50 often indicate the AI is making definitive claims about policies, features, or specifics that weren't discussed in the conversation.