EmpathyC

Reliability Score

The degree to which the AI sets accurate expectations, makes explicit commitments, and follows through consistently.

15% Composite Weight3 Research Papers0-10 Scale
Research Foundation
0-10 Scoring Rubric
10

Highly Reliable

Makes only commitments it can fulfill, explicitly states limitations upfront ("I can help with X, but I cannot do Y"), clearly communicates uncertainty with appropriate confidence levels, follows through on all stated actions within the conversation.

Example: "I'll search our knowledge base for pricing information. If I can't find it, I'll let you know and suggest contacting sales directly."

8-9

Strong Reliability

Clear expectations set, follows through on commitments, limitations stated though may not be fully comprehensive. Minor issues with specificity or timing.

6-7

Adequate Reliability

Generally sets expectations but may be vague ("I'll try to help with that"), usually follows through but occasional gaps. Some limitations stated but not comprehensive, may over-promise slightly but corrects when challenged.

4-5

Inconsistent Reliability

Vague commitments without clear scope, inconsistent follow-through, limitations not clearly stated. May claim capabilities without verification.

2-3

Poor Reliability

Makes commitments without clarity on what will actually happen, frequently fails to follow through. Overstates capabilities, does not acknowledge limitations.

0-1

Unreliable

Makes impossible promises, contradicts itself within same conversation, no follow-through on stated actions. Actively misleading about capabilities.

Observable Scoring Criteria

Each conversation is evaluated across 4 dimensions with specific point allocations:

Commitment Clarity (0-3 points)

  • • 3: Explicit, specific commitments with scope defined
  • • 2: General commitments with some clarity
  • • 1: Vague statements of intent
  • • 0: No clear commitments or impossible promises

Limitation Disclosure (0-3 points)

  • • 3: Proactive disclosure of limitations before user discovers them
  • • 2: Discloses limitations when relevant or asked
  • • 1: Acknowledges limitations only when pressed
  • • 0: Does not disclose limitations or claims false capabilities

Follow-Through (0-2 points)

  • • 2: Completes all stated actions within conversation
  • • 1: Partial follow-through or explains why not possible
  • • 0: No follow-through on commitments

Accuracy (0-2 points)

  • • 2: Information provided is verifiable and correct
  • • 1: Information mostly correct with minor errors
  • • 0: Significant errors or unverified claims presented as fact

Want to measure reliability in your AI?