Fix categorized ATVM watcher host result recovery
This commit is contained in:
@@ -73,6 +73,9 @@ Run ATVM CMC automation tests on the designated automation VM without unintended
|
||||
- per-run HTML reports
|
||||
- When a machine fails, use the matching `logs/` entry first to capture the detailed failure context for that host.
|
||||
- When reconstructing historical status, prefer `cmcReporter` artifacts over less-specific runner output because they preserve per-host results after the live run has ended.
|
||||
- Do not treat the existence of a per-host reporter artifact by itself as proof that the host passed.
|
||||
- For categorized grouped recovery, prefer the matching per-host reporter JSON or mochawesome result and carry through the real `failures`, `pending`, and failure message instead of assuming `PASS completed`.
|
||||
- If grouped XML only contains `check-xml-files.ts`, cross-check the grouped result against the per-host reporter artifacts before posting or repeating status for that grouped sub-run.
|
||||
|
||||
Typical sequence:
|
||||
1. Build the exact `cmc-templates.py` and `run-sorry-cypress.py` commands for the request.
|
||||
@@ -81,8 +84,9 @@ Typical sequence:
|
||||
4. Run `cmc-templates.py` with the approved options.
|
||||
5. Wait for `cmc-templates.py` to fully finish and confirm success.
|
||||
6. Verify the generated `.ts` files and the config `specPattern` include every requested VM before starting the runner.
|
||||
7. If the watcher is approved, start the watcher before launching `run-sorry-cypress.py`.
|
||||
8. Run `run-sorry-cypress.py` with the matching approved config and build name.
|
||||
7. If the watcher is approved, make sure the controller's deployed watcher code is the intended version before relying on its posts.
|
||||
8. If the watcher is approved, start the watcher before launching `run-sorry-cypress.py`.
|
||||
9. Run `run-sorry-cypress.py` with the matching approved config and build name.
|
||||
|
||||
## Config File / Gold Disk Mapping
|
||||
- `cypress.atvm-config-gold.ts` -> Gold Disk 1
|
||||
|
||||
@@ -356,3 +356,20 @@ This file stores run-specific examples only when a run produced a new learning r
|
||||
- Keep the maintained `--exclude_partial_match` list for broad selectors such as `--containsVm` or `--randomize`.
|
||||
- When the operator uses `--specify_vms`, do not auto-add the blacklist unless they explicitly request it.
|
||||
- Even when the operator uses `--specify_vms`, first check whether any requested VM is on the maintained blacklist and stop instead of launching it if one is included.
|
||||
|
||||
## Run Learning: 2026-03-30 (Controller watcher deployment must match the repo watcher before trusting live posts)
|
||||
- Observed failure mode:
|
||||
- The repo watcher had the corrected `cmc-reboot` flow, but the controller install at `/opt/atvm-watcher-service/atvm_run_watcher.py` still had the old generic 5-step fallback.
|
||||
- A live categorized reboot subrun therefore posted the stale 5-step `TEST FLOW:` even though the repo copy had already been fixed.
|
||||
- Action for future runs:
|
||||
- Before trusting watcher-generated live posts for new watcher behavior, verify that the controller install matches the intended repo watcher version.
|
||||
- If the controller install is stale and the operator approves it, deploy the updated watcher code to `/opt/atvm-watcher-service` and restart only the watcher instance for the active build.
|
||||
|
||||
## Run Learning: 2026-03-30 (Categorized grouped recovery must parse real per-host reporter status, not assume pass)
|
||||
- Observed failure mode:
|
||||
- A categorized Red Hat reboot subrun posted both hosts as passed even though `atvm71-redhat9.1` actually failed during `1. Verifying set up`.
|
||||
- The grouped XML only contained `check-xml-files.ts`, and the watcher incorrectly treated the presence of a per-host reporter artifact as `PASS completed`.
|
||||
- Action for future runs:
|
||||
- When grouped XML lacks explicit host testcase results, recover grouped host status from the per-host reporter JSON or equivalent detailed artifact.
|
||||
- Carry through the real `failures`, `pending`, and failure message from that host artifact instead of assuming `PASS completed`.
|
||||
- If a correction post is needed because stale or reconstructed state was wrong, mark it explicitly as a correction that supersedes the earlier result.
|
||||
|
||||
Reference in New Issue
Block a user