Update ATVM watcher for categorized sub-run posting

- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases
- document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post
- add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup
- update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
This commit is contained in:
2026-03-26 11:00:39 -04:00
parent 68cd428733
commit d60b8b9b18
6 changed files with 399 additions and 89 deletions

View File

@@ -8,8 +8,9 @@ This is a deployment plan only. It does not perform the installation.
Install the local watcher package so the controller can:
- watch one ATVM run per watcher instance
- send final Mattermost status only for `COMPLETED` or `FAILED`
- watch one requested ATVM run per watcher instance
- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
- suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN`
- stop automatically after the watched run reaches a terminal state
@@ -116,7 +117,9 @@ Recommended permissions:
9. Do a real ATVM run test.
- launch a real run
- start the watcher for that build name
- if the run uses `--categorize`, also pass `--categorize` to the watcher start helper
- confirm final Mattermost delivery for a completed run
- confirm categorized execution sends one post per completed grouped sub-run
## Recommended Validation Commands
@@ -163,6 +166,7 @@ Example:
--config-family gold \
--migration-style "ATVM end-to-end migration validation" \
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
```
@@ -184,9 +188,11 @@ The cancel helper should:
- This is not a daemon.
- One watcher instance is started per ATVM run.
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
- The watcher exits after the run reaches a terminal state.
- The watcher writes state under `/var/lib/atvm-run-watcher/<build-name>`.
- The watcher prevents duplicate Mattermost posts by writing a posted marker.
- The watcher prevents duplicate Mattermost posts by writing posted markers.
- Categorized sub-run state is written under `/var/lib/atvm-run-watcher/<build-name>/subruns/<subrun-key>/`.
## Failure Handling
@@ -200,6 +206,10 @@ Expected terminal behavior:
- post to Mattermost
- verify `ok`
- exit
- categorized `COMPLETED` / `FAILED`
- post once for that grouped sub-run
- verify `ok`
- continue until the parent request finishes
- `CANCELLED`
- write final `CANCELLED` state to `state.json`
- do not post