Reset reused watcher state before starting a new ATVM run

- update the watcher start helper to stop any stale watcher instance for the same requested parent build name and remove its old state directory before starting fresh
- document that reused parent build names must not inherit stale cancelled, posted, state.json, or subruns state from older runs
- update the watcher install and design docs so the controller workflow explicitly treats stale reused-build-name state as part of startup cleanup
This commit is contained in:
2026-03-26 11:30:28 -04:00
parent dda0a0b4c0
commit f5849dde0c
4 changed files with 15 additions and 0 deletions

View File

@@ -144,6 +144,10 @@ Possible contents:
- one cancellation marker per parent run id
- optional lock file to prevent multiple watcher instances from racing
When the same requested parent build name is reused for a new run:
- the watcher start workflow must clear old watcher state for that requested build name before starting
- stale `cancelled.marker`, `posted.marker`, `state.json`, and `subruns/` contents must not be allowed to affect the new run
## Recommended Operator Workflow
Normal completion workflow:
1. ATVM run starts.

View File

@@ -120,6 +120,7 @@ Recommended permissions:
- if the run uses `--categorize`, also pass `--categorize` to the watcher start helper
- confirm final Mattermost delivery for a completed run
- confirm categorized execution sends one post per completed grouped sub-run
- confirm reused parent build names do not inherit stale `cancelled.marker`, `posted.marker`, or `subruns/` state from older runs
## Recommended Validation Commands
@@ -154,6 +155,7 @@ Once installed, the intended workflow is:
1. Launch the ATVM run as usual.
2. Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
3. Let the watcher run on the controller.
4. The watcher exits on terminal state.

View File

@@ -49,6 +49,7 @@ Typical workflow:
1. Launch the ATVM run.
2. Start the watcher for that run.
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
4. For non-categorized runs, when the run reaches a terminal state:
- `COMPLETED` or `FAILED`
- build the final ATVM status
@@ -105,6 +106,12 @@ That results in:
- service instance:
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
The helper also:
- stops any stale watcher instance for that same requested build name
- removes the old watcher state directory for that requested build name
- starts the new watcher with a clean state root for the new run
## Cancel Example
```bash

View File

@@ -49,6 +49,8 @@ if [[ -z "$BUILD_NAME" ]]; then
fi
RUN_DIR="${STATE_ROOT}/${BUILD_NAME}"
systemctl stop "atvm-run-watcher@${BUILD_NAME}.service" >/dev/null 2>&1 || true
rm -rf "$RUN_DIR"
mkdir -p "$RUN_DIR"
cat >"${RUN_DIR}/watch.env" <<EOF