InfraResolution Bench

Case Explorer

Short GPU node failure does not trigger credit

A short GPU node failure caused customer-visible job failures, but the incident stayed below contract thresholds and does not warrant a credit or goodwill path.

Evidence Packet

CRM Record

Account: Argent Models

Tier: enterprise

Plan: Committed-220

Billing Owner: revops@prime.example

SLA Tier: standard-covered-service

Standard enterprise terms without special goodwill commitments.

Billing Record

Plan: Committed-220

Invoice Preview: $17,690

Credits Applied: $0

Burst GPU Hours: 0

No billing anomaly detected.

Usage & Telemetry

Window: 2026-03-01 to 2026-03-31

GPU Hours: 214

Meter Status: healthy

A single GPU node failure caused several jobs to fail, but the incident resolved within 11 minutes.

Anomalies: Jobs pinned to one GPU pool failed and were rescheduled after node replacement.

Incident Record

Status: resolved

Service: managed-training-api

Duration: 11m

Customer Visible: Yes

A subset of jobs failed during an 11-minute node incident.

A GPU node failure in one pool caused temporary job failures until workloads were rescheduled.

Customer Note

Several jobs failed during a brief platform incident. Please confirm whether anything applies commercially.

Policy Snippet

Covered service credits apply only when customer-visible outages or covered delays exceed 30 minutes, or when monthly uptime thresholds are breached.

Ground Truth

issue_type: incident_impact_reviewroot_cause: gpu_node_failurecustomer_impact: job_failurecontractual_applicability: no_credit_duediscrepancy_detected: falserecommended_owner: engineering_ownerrecommended_action: send_explanation_onlyneeds_human_review: falseconfidence: highadjudication_notes: ["The incident is cleanly attributable, but commercial relief is not triggered because duration stayed below threshold and no special account context is present."]reference_customer_note: We confirmed a brief platform incident caused the job failures, but the event stayed below the contract’s service-credit threshold, so no credit is due. We can share the incident summary and remediation details.reference_internal_note: Owner: engineering_owner. Action: send_explanation_only because root cause is gpu_node_failure, customer impact is job_failure, and contractual outcome is no_credit_due.