Improve ATVM failed-host detail recovery

This commit is contained in:
2026-03-30 21:38:59 -04:00
parent d1a909f9ab
commit 18dcbc89f9
4 changed files with 209 additions and 20 deletions

View File

@@ -76,6 +76,12 @@ Run ATVM CMC automation tests on the designated automation VM without unintended
- `mochawesome/`
- per-run HTML reports
- When a machine fails, use the matching `logs/` entry first to capture the detailed failure context for that host.
- Apply the failed-host detail recovery path to every ATVM template type, not just reboot.
- For any failed host, recover detail in this order when available:
- consolidated run log
- matching `mochawesome` HTML
- structured reporter artifacts such as per-host JSON or XML
- text reporter artifacts
- When reconstructing historical status, prefer `cmcReporter` artifacts over less-specific runner output because they preserve per-host results after the live run has ended.
- Do not treat the existence of a per-host reporter artifact by itself as proof that the host passed.
- For categorized grouped recovery, prefer the matching per-host reporter JSON or mochawesome result and carry through the real `failures`, `pending`, and failure message instead of assuming `PASS completed`.
@@ -269,6 +275,8 @@ Status-report expectations:
- Do not include generic watcher bookkeeping messages in `NOTES:` such as artifact-detection confirmations.
- Do not include internal watcher fallback notes in `NOTES:` such as `check-xml-files.ts` validation confirmations or reporter-artifact recovery details.
- The `HOSTS:` table includes `Host`, `Kernel`, `Status`, and `Detail` columns in that order.
- For any failed host, keep the `Detail` column compact by showing the failing step plus a short error summary, not the full raw stack trace.
- If richer failure text is available, put the longer trimmed excerpt in `NOTES:` so the result stays readable in Mattermost and local status output.
- In `COVERAGE:`, describe the important `cmc-templates.py` command inputs such as template, categorize mode, datastore/config family, config filename, migration style, any real plugin/integration path, and other operator-relevant run options, but do not list target hosts there or include verbose prose scope descriptions.
- Only include coverage fields that the template command actually used. Do not show empty or irrelevant fields such as an integration/plugin path for templates that did not use one.
- If `categorize mode: enabled` is already shown in `COVERAGE:`, do not also repeat `--categorize` under `run options`.

View File

@@ -451,3 +451,30 @@ This file stores run-specific examples only when a run produced a new learning r
- Resolve `TEST FLOW:` from the generated `.ts` spec for the actual run whenever that spec exists.
- Extract the numbered `it(...)` steps from the generated spec referenced by the run's `specPattern`.
- Only use template-level or static fallback flow definitions when the generated spec cannot be found or parsed.
## Run Learning: 2026-03-30 (Event-log reporter JSON must not be ignored in non-categorized fallback)
- Observed failure mode:
- A failed non-categorized run still posted/saved host detail as only `1 failures` even though the per-host reporter artifacts preserved the failing step.
- The per-host `.json` artifact used an event-log format with `metadata` plus `tests`, but no top-level `stats` block.
- The watcher ignored that JSON format, fell back to the `.txt`, and lost structured test counts/detail.
- Action for future runs:
- Support the event-log JSON format directly when parsing per-host reporter artifacts.
- In non-categorized fallback, prefer the structured `.json` artifact over the matching `.txt` when they belong to the same run timestamp.
- Recover at least the failing testcase name and a nonzero test count from those artifacts even when the consolidated run log is missing.
## Run Learning: 2026-03-30 (Use `mochawesome` as the rich fallback for host failure detail)
- Observed failure mode:
- The full UI-visible Cypress error text for a failed ATVM host run existed in `cypress/cmcReporter/mochawesome/*.html`, but the lower-fidelity host-level `.json` and `.txt` reporter artifacts only preserved the failing step boundary.
- That made the host detail fall back to a thin summary even though a richer error payload was available on the controller.
- Action for future runs:
- When the consolidated run log is missing, use `mochawesome` as the rich fallback source for per-host failure text before settling for lower-fidelity reporter artifacts.
- Keep the `HOSTS` table compact by showing the failing step plus a short error summary.
- Put the longer trimmed failure excerpt in `NOTES:` instead of dumping the full raw stack trace into the host-detail column.
## Run Learning: 2026-03-30 (Apply rich failed-host detail recovery to every ATVM template)
- Observed operator requirement:
- The same failed-host recovery and formatting rules should apply across all ATVM template runs, not only reboot scenarios.
- If any ATVM test template fails, the result should still recover the best available failure detail and present it consistently.
- Action for future runs:
- Use the same failure-detail recovery order for every ATVM template: consolidated run log, `mochawesome`, structured reporter artifacts, then text reporter artifacts.
- Keep failed-host `Detail` compact and put the longer trimmed excerpt in `NOTES:` for every template type.