InfraResolution Bench

Case Explorer

Repeated retries from invalid customer config still under final validation

Healthy platform telemetry and repeated retry behavior point to customer configuration, but engineering still wants one final validation pass before closing the case.

Evidence Packet

CRM Record

Account: Pine Harbor ML

Tier: enterprise

Plan: Committed-120

Billing Owner: revops@prime.example

SLA Tier: standard-covered-service

Customer insists the repeated retries reflect platform instability.

Billing Record

Plan: Committed-120

Invoice Preview: $4,120

Credits Applied: $0

Burst GPU Hours: 0

No billing anomaly detected.

Usage & Telemetry

Window: 2026-03-01 to 2026-03-31

GPU Hours: 23

Meter Status: healthy

Cluster availability remained healthy while repeated validation failures triggered customer-visible retries.

Anomalies: Retry loop repeated after invalid runtime configuration was resubmitted.

Incident Record

Status: no_known_incident

Service: job-runtime

Customer Visible: No

Jobs repeatedly retried after failing validation.

Primary logs point to an invalid customer config manifest that triggered repeated retries, but engineering wants one final validation pass before closing the case.

Customer Note

The platform kept retrying our workload and burning time. We think this should be treated as a service issue unless you can prove otherwise.

Policy Snippet

Failures caused by invalid customer inputs, artifacts, or configuration are excluded from SLA service credits.

Ground Truth

issue_type: customer_caused_issueroot_cause: customer_misconfigurationcustomer_impact: retry_stormcontractual_applicability: sla_excluded_customer_causeddiscrepancy_detected: falserecommended_owner: engineering_ownerrecommended_action: hold_for_engineering_reviewneeds_human_review: trueconfidence: mediumadjudication_notes: ["This is still a customer-caused case, but the response should preserve the requested final engineering validation step."]reference_customer_note: We found strong evidence that the repeated retries were caused by a customer-supplied configuration issue rather than a platform outage, but engineering is doing one final validation pass before we send a final resolution. Based on the current evidence, this would not qualify for SLA credits because customer-caused failures are excluded.reference_internal_note: Owner: engineering_owner. Action: hold_for_engineering_review because current evidence points to customer_misconfiguration, customer impact is retry_storm, and engineering requested one final validation pass before closure.