Taxonomy
The 6-layer classification framework
Every resolution packet is classified across six orthogonal layers. The agent must get each one right. Partial credit comes from consistency and rubric checks, but the primary signal is exact match against ground truth.
Issue Type
The primary classification of what went wrong: pricing mismatch, metering discrepancy, outage, maintenance exclusion, customer-caused failure, degraded performance, or ambiguous case.
Root Cause
What specifically caused the issue: a misconfigured pricing tier, a telemetry gap, a maintenance window overlap, a customer misconfiguration, etc.
Customer Impact
The financial, operational, or contractual impact on the customer: overbilling, underbilling, service degradation, missed SLA, or no material impact.
Contractual Applicability
Whether a credit, refund, SLA adjustment, or no action is contractually warranted based on the terms and evidence.
Owner
Which team should own the resolution: finance, RevOps, engineering, shared ownership, or escalation to legal.
Next Action
The concrete next step: issue credit, adjust configuration, escalate, schedule review, or close with no action.
Scoring
Deterministic composite scoring
Exact Match
Each taxonomy field is compared against ground truth. All-or-nothing per field: the agent gets credit only for exact matches.
Consistency
Cross-field logical checks: does the owner match the issue type? Does the action match the contractual finding? Catches plausible but internally contradictory outputs.
Rubric
Structured rubric checks on the notes: are they bounded, factual, and non-hallucinated? Do they avoid overstepping into billing logic or legal interpretation?
Example
How a case flows through the eval
Messy Input
A customer reports being overcharged. The CRM shows a recent tier upgrade, billing shows the old rate, usage telemetry shows consumption above the old tier limit, and the contract has a 30-day pricing lock clause.
What the Agent Should Notice
The pricing config was updated in the CRM but not propagated to billing. The 30-day lock clause means the customer should have been billed at the old rate for the remaining lock period. The overcharge is real but partial, only the delta above the locked rate.
Correct Output
Scoring
5/5 exact matches (70% weight) + all consistency checks pass (20% weight) + notes are bounded and factual (10% weight) = 100% composite.