InfraResolution Bench

Case Explorer

Scheduler failure delays starts beyond covered threshold

A scheduler control-plane failure delayed job starts for more than the covered threshold, and the policy explicitly grants a flat credit for that condition.

Evidence Packet

CRM Record

Account: Quill Compute

Tier: enterprise

Plan: Committed-300

Billing Owner: finance@prime.example

SLA Tier: premium-covered-service

Premium addendum includes flat credit for covered control-plane delays over 30 minutes.

Billing Record

Plan: Committed-300

Invoice Preview: $24,010

Credits Applied: $0

Burst GPU Hours: 0

No pricing anomaly detected.

Usage & Telemetry

Window: 2026-03-01 to 2026-03-31

GPU Hours: 291

Meter Status: healthy

Job execution capacity remained available, but new starts were delayed for 37 minutes by control-plane scheduler failures.

Anomalies: Scheduler backlog prevented new job starts while existing jobs continued.

Incident Record

Status: resolved

Service: managed-training-api

Duration: 37m

Customer Visible: Yes

New training jobs could not start for 37 minutes.

A scheduler failure in the control plane blocked new job admissions until the scheduler quorum was restored.

Customer Note

Our jobs did not start for over half an hour even though existing jobs kept running. Please confirm what applies commercially.

Policy Snippet

Covered control-plane delays longer than 30 minutes on the managed training service receive the same flat service credit as covered outages.

Ground Truth

issue_type: incident_impact_reviewroot_cause: scheduler_failurecustomer_impact: delayed_job_startcontractual_applicability: credit_duediscrepancy_detected: falserecommended_owner: shared_revops_engineeringrecommended_action: send_explanation_onlyneeds_human_review: falseconfidence: highadjudication_notes: ["This is a clean covered incident where delayed starts are contractually covered once they cross the threshold."]reference_customer_note: We confirmed a covered 37-minute control-plane delay on the managed training service, which qualifies for the contract’s flat service credit. We will communicate the credit application and the scheduler incident summary.reference_internal_note: Owner: shared_revops_engineering. Action: send_explanation_only because incident evidence is clean, root cause is scheduler_failure, customer impact is delayed_job_start, and contractual outcome is credit_due.