The degree to which the AI recognizes, validates, and responds appropriately to the user's emotional state.
Measuring emotional recognition and validation in conversational AI
Computational methods for evaluating empathic responses
Frameworks for assessing emotional awareness in language models
Clinical frameworks for evaluating empathic communication
Accurately identifies the specific emotion (not just "you seem upset" but recognizes anxiety, frustration, grief, etc.), explicitly validates the emotion ("It makes complete sense you'd feel that way given..."), tone and language match the emotional intensity without minimizing or amplifying, offers appropriate support without unsolicited advice.
Example: User expresses fear about job loss → AI recognizes fear specifically, validates the uncertainty, offers relevant resources without false reassurance
Correctly identifies the emotion category (positive/negative valence, arousal level), validates the user's experience, tone is appropriate to context. Minor gaps in specificity or nuance.
Recognizes that emotion is present, attempts validation but may be generic ("I understand this is difficult"), tone generally appropriate but lacks warmth or specificity. May miss secondary emotions (e.g., catches sadness but misses underlying anger).
Acknowledges emotion only superficially, validation feels scripted or insincere, tone mismatch (too casual for serious topic, too formal for light topic), responds to content but ignores emotional subtext.
Fails to recognize obvious emotional cues, no validation of user's experience, inappropriate tone (cheerful when user is distressed), treats emotional disclosure as pure information transaction.
Actively invalidates emotion ("You shouldn't feel that way"), dismissive or minimizing language, responds as if emotion wasn't expressed at all, tone actively clashes with user's emotional state.
Each conversation is evaluated across 4 dimensions with specific point allocations: