InfraResolution Bench

Case Explorer

Job failure traced to invalid checkpoint format

Customer attributes a job failure to the platform, but engineering evidence shows an invalid checkpoint format provided by the customer.

Evidence Packet

CRM Record

Account: Helix Models

Tier: standard

Plan: OnDemand

Billing Owner: revops@prime.example

SLA Tier: standard-covered-service

Self-serve customer on standard terms.

Billing Record

Plan: OnDemand

Invoice Preview: $980

Credits Applied: $0

No billing anomalies present.

Usage & Telemetry

Window: 2026-03-01 to 2026-03-31

GPU Hours: 14

Meter Status: healthy

Cluster availability remained healthy while the job failed during checkpoint validation.

Incident Record

Status: no_known_incident

Service: job-runtime

Customer Visible: No

Job failed because the uploaded checkpoint format was invalid and could not be parsed by the runtime.

Customer Note

The platform failed our training run again and we think this should be treated as a service issue.

Policy Snippet

Failures caused by invalid customer inputs, artifacts, or configuration are excluded from SLA service credits.

Ground Truth

issue_type: customer_caused_issueroot_cause: customer_misconfigurationcustomer_impact: job_failurecontractual_applicability: sla_excluded_customer_causeddiscrepancy_detected: falserecommended_owner: engineering_ownerrecommended_action: send_explanation_onlyneeds_human_review: falseconfidence: highadjudication_notes: ["The operational evidence clearly attributes the issue to customer input quality, not a platform incident."]reference_customer_note: We reviewed the runtime logs and found the failure was caused by an invalid checkpoint format rather than a platform outage. Because the issue was customer-caused, it does not qualify for SLA credits, but we can share the validation details to help you rerun successfully.reference_internal_note: Owner: engineering_owner. Action: send_explanation_only because root cause is customer_misconfiguration and the contract excludes customer-caused failures.