cds-ai/atvm/atvm-automation-runs.md

# Run ATVM Automation Runs

This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.

## Entry Rule
- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
- Do not add routine runs with no new learning.

## Current State
- No run-learning entries recorded yet from `atvm-automation-guide.md` source material.

## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
- Request:
  - template: `cmc-e2e`
  - filter: `--containsVm redhat9.7`
  - integration: `--integration_type pure`
  - plugin: `--use_specified_plugin fc`
- Observed result:
  - Cypress spec execution passed (`1` test, `1` passing, `0` failing).
  - Cloud run URL was produced and marked uploaded.
  - `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
- Action for future runs:
  - If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
  - Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.

## Run Learning: 2026-03-09 (Blacklist handling and status format)
- Observed requirement:
  - Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
- Machines to blacklist via `--exclude_partial_match`:
  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
    - `atvm6-centos6.0`
    - `atvm41-redhat6.0`
    - `atvm73-oracle6.0`
  - `BLACKLISTED: SUPPORT REQUEST - WAITING`:
    - `atvm113-debian9.0.0`
    - `atvm115-debian9.1.0`
    - `atvm116-debian9.2.0`
    - `atvm156-debian9.3.0`
  - Needs re-creation:
    - `atvm157-debian13.0.0`
- Action for future runs:
  - Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
  - When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
  - Use the run `build_name` as the heading/title for status responses so the test type is obvious.
  - For failed machines in status responses, include the failure reason taken from the run log.
  - Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
  - Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.

## Run Learning: 2026-03-11 (Machine-first status lines and whole-run ETA)
- Observed requirement:
  - Status output must list each machine first and then its status, rather than leading with the status label.
  - Estimated completion time must refer to the entire remaining automation run, not only the currently running machine.
- Action for future runs:
  - Format machine entries as `machine-name - STATUS`.
  - Keep failure reasons after the machine/status entry when a machine failed.
  - When giving ETA, explicitly state it is the estimate for completion of the full remaining run.

## Run Learning: 2026-03-11 (Categorized run status must be reconstructed across batches)
- Observed failure mode:
  - `run-sorry-cypress.py --categorize` mutates the active config to the current category batch, so live files such as `specPattern`, `current_vm`, and the newest `/tmp` Cypress JSON only describe the current category, not the full automation run.
  - Answering from only the current live batch underreports the run and misses already-finished machines from earlier category batches.
- Action for future runs:
  - Reconstruct whole-run status from the generated machine scope plus all machine result artifacts written since the run start time.
  - Use the current batch only to identify the live `RUNNING` machine and immediate next machine(s), not as the full run scope.
  - Do not answer status requests for categorized runs until earlier category results have been checked as part of the same run.

## Run Learning: 2026-03-11 (Hash-named XML files still belong to machine runs)
- Observed failure mode:
  - Same-run JUnit output is not consistently named `test-result-atvm...xml`.
  - Many machine results for the same automation run were written as hash-named files such as `test-result-01fe412894862398d06d9cc4bc7e81a0.xml`.
  - Limiting status reconstruction to machine-named XML files causes major undercounting of completed machines.
- Action for future runs:
  - Parse all `test-result-*.xml` files written since the run start time, not only `test-result-atvm*.xml`.
  - Extract the machine name from XML contents such as `testsuite file=`, `testsuite name=`, or `testcase name=` when the filename does not include the machine name.
  - Treat `check-xml-files.ts` XML outputs as bookkeeping steps, not machine results.
  - Prefer the most recently written same-run XML per machine when multiple XML files exist for that machine.