Daily job-level failures across all monitored repos, stacked by category. Click a category in the legend to toggle it.
Loading…
—
Filter hotspots:
Release Nightly —
Loading…
CI Nightly —
Loading…
Bump PR —
Loading…
Other —
Loading…
Most affected architectures
Loading…
Most affected workflows
Loading…
Slowest jobs —
Top 15 (workflow, job pattern) pairs by median wall time across the selected window. Useful for identifying CI cost-optimization targets.
Loading…
Never been green
—
Jobs that have been failing for 3+ consecutive days with zero successful runs in the last 14 days. These are the silent CI-cost burners — fix them, exclude them from the matrix, or accept them as known-bad and add to a quarantine list.
Loading chronically-broken jobs…
Release Nightly
Branch: main
Data generated: — · Fetched: —
⚠No runs on — yet · showing the most recent build instead. The daily nightly may still be queued or starting; refresh in a few minutes.
Per-shard results from ci_nightly_pytorch_full_test.yml (6 default + 3 distributed + 2 inductor = 11 shards per run)
Loading…
Linux JAX Wheel Builds & Tests
Loading…
Legend
PASS Build SuccessfulFAIL Build FailedSKIPPED Test SkippedIN PROGRESS Currently RunningCANCELLED Build CancelledNO DATA No Recent RunsPENDING Not Started
Time format: Run time +Queue time
Release History
Last Updated: —
How to read this table.
Each row is one nightly ROCm version (newest first). Each workflow is one column showing passed/total.
Click any column header ▸ to expand it sideways into per-arch status dots; click an arch to drill into its full matrix.
The Downloads button opens a popover with TAR / DEB / RPM artifacts and pip-install snippets per architecture.
From
→
To
Loading release history…
Legend
PASS All jobs in this pipeline succeededFAIL One or more jobs failedRUNNING Pipeline still in progressN/A No run for this version yet
Cell shows top line a status icon, and bottom linejobs / arch counts (e.g. 14 jobs · 4 arch).
Multi-Arch Release - Nightly
Last Updated: —
How to read this table.
Each row is one nightly multi_arch_release.yml run plus its dispatched downstreams
(test_artifacts, PyTorch wheels). Columns follow pipeline execution order across
11 lanes (Setup → Linux Build → Linux Pkg → Publish → Tests → Wheels → same for Windows).
Click ▸ on a column header to expand Math Libs / Test Artifacts / Wheels into per-arch dots.
Click any row to open the full pipeline tree, or the ⬇ Downloads
button for install commands pinned to that exact build.
From
→
To
Loading multi-arch release history…
Legend
PASS All jobs in this column succeededFAIL One or more jobs failedRUNNING Stage still in progressN/A No jobs for this stage on this version yet
Lanes (11 total): Setup · Linux Build · Linux Pkg · Linux Publish · Linux Tests · Linux PyTorch Wheels · Windows Build · Windows Pkg · Windows Publish · Windows Tests · Windows PyTorch Wheels — each lane is shaded differently (cool blues/greens for Linux, warmer ambers/reds for Windows) and starts with a thicker primary-tinted border.
LINUX / WINDOWS tags above each column header indicate the OS.
Cell badges show passed / total jobs for that column on that ROCm version (multiple workflow runs for the same nightly are summed together).
Build jobs are routed by job-name path (Linux::release / Build Artifacts / {stage}); Test jobs by extracted Test {component} name; Wheels jobs by source workflow_id (multi_arch_release_*_pytorch_wheels.yml) with per-(py × torch) sub-cells.
Downloads — ROCm …
CI HUD
Last Updated: —
Loading…
Loading CI HUD data...
Showing 0 of 0 commits
Legend
✓ Passed✗ Failed⏳ Running○ Skipped/N/A
Tip: Click on a commit row to expand and see detailed job status. Click any job status to go to GitHub.
Multi-Arch
Last Updated: —
Loading…
Loading Multi-Arch CI data...
Showing 0 of 0 commits
Legend
✓ Passed✗ Failed⏳ Running○ Skipped/N/A
Tip: Click any category header (▸) to expand it into per-arch status dots. Click a commit row to see all jobs grouped by category.
A release event = the workflow runs sharing one (repo, head_sha) per nightly. Shows up here once the next nightly fires multi_arch_release.yml + the wheel workflows + dispatched test_artifacts.yml runs.
Lane
Linux
Windows
Loading…
Legend
✓ Passed✗ Failed◉ Running / queued○ Empty / N/AItalic cell = workflow run is OS-combined (mirrored into both columns).
Test Artifacts Runs
No test_artifacts.yml runs in the selected window.
Filled in once the post-PR-5212 release/CI workflows start dispatching test runs. Try widening the window to 30 days.
Loading…
Legend
✓ Passed✗ Failed◉ Running· No run for this familySame SHA on the same day → one row; multiple test runs for one family on that day → worst-status wins.
🧮 Issues Pivot — Open issues grouped by GPU architecture × framework version
Click any number to drill into those issues
Loading…
Issues
Legend
OPEN Issue is openCLOSED Issue is closed
Note: Use the multi-select filter to find issues by label. Data updates in real-time via webhooks.
Triage
Last Updated: — · Window: last 24h
Failure Clusters
Recent failures grouped by job name + workflow + arch. Larger clusters mean a stage is broken across many runs — start here when triaging.
Failures
Job pattern
Workflow
Architectures
Repository
Last seen
Action
Loading…
Flake Detection
Jobs where a re-run produced a different result than the original attempt — the test flapped pass↔fail without code changes.
Job
Architecture
Attempt sequence
Repository
Last seen
Action
Loading…
Legend
✓ Attempt passed✗ Attempt failed○ Cancelled
Tip: Click any "Logs" link to jump to the failing job on GitHub. A flake of fail → pass usually means a transient issue (resource, network); pass → fail means a regression slipped past the first run.
Release Notes
Generated: —
Generate notes between two commits
Paste two commit SHAs (7+ hex chars). The lambda will look up their timestamps and pull every PR merged in that window across the selected repos. Tip: copy SHAs from the CI HUD tab. Share with #release-notes?from=SHA1&to=SHA2&repos=ROCm/TheRock.
Time-to-first-review is omitted: our ingestion lambda doesn't subscribe to the pull_request_review webhook event. Adding it requires (1) updating the GitHub webhook subscription, (2) extending the lambda + ClickHouse schema, (3) backfilling. Worth doing if review responsiveness becomes a focus area.
Click contributor or label bars to drill into the Issues tab filtered by author or label.
Component Insights
Last Updated: —
From
→
To
—
0Merged Bump PRs
0Issues (filtered)
0Infra Issues
0Code Issues
Issues by Component
Click a row to filter the issue list below.
Component
Open
Closed
Total
Distribution
Loading…
Issues by Classification
Infra = CI / build / release-engineering plumbing.
Code = product code bugs & feature requests.
Classification
Open
Closed
Total
Distribution
Loading…
Issues by Age surface stale work
Bucketed by days since opened. Click a row to filter the issue list below to just that age bucket.
Age
Open
Closed
Total
Distribution
Loading…
Submodule Bump Trends per submodule activity over the selected period
Each panel shows total merges + average per week, a scatter chart, a weekly breakdown table,
and the full per-period PR list.
Showing one stackable panel per submodule. Pick a specific submodule to focus.
Loading submodule trends…
Bump Merge Latency time from PR open → merge · overlay of GPU runner events
Each dot is one bump PR — x-axis is when it was merged, y-axis is how many hours it sat open before merging.
Vertical dashed lines mark OSSCI GPU runner events from frontend/data/gpu-events.json (currently empty —
populate that file to overlay events). Use this to spot whether bump-PR delays line up with GPU removals.
Manually-tracked triage rows mapping each bump PR to issues it surfaced + the fix applied.
Source of truth: frontend/data/bump-triage.json in the repo — edit via PR.
Rows that share a Bump PR group visually so you can see all issues caused by one bump together.
—
Bump PR
Test / Build
Component
Issue
Class
AI Summary
From Comments
Fix
Loading triage data…
Issues — actionable backlog — open issues without a triage row. Use the filters to widen.
Or click a row in the breakdown tables above.
Issue
Title
Status
Component
Class
Triage
Bump PR(s)
Author
Updated
Loading…
Legend & Methodology
Components are derived from issue title, body preview, and labels via a priority-ordered regex matcher (see ISSUE_COMPONENT_RULES in lambdas/query-proxy/index.js). An issue that matches no rule is bucketed as Other.
Classification uses the same approach (ISSUE_INFRA_RULES) plus an allow-list of label names (infra, ci, build, release, tooling, etc.). Issues that don't look like infra default to code.
Merged Bump PRs always source from ROCm/TheRock (the only repo that runs bump_submodules.yml) and are narrowed by the selected period.