From f5849dde0c9623e08d76bf95849f6dda68da5bf0 Mon Sep 17 00:00:00 2001 From: "anthony.wen" Date: Thu, 26 Mar 2026 11:30:28 -0400 Subject: [PATCH] Reset reused watcher state before starting a new ATVM run - update the watcher start helper to stop any stale watcher instance for the same requested parent build name and remove its old state directory before starting fresh - document that reused parent build names must not inherit stale cancelled, posted, state.json, or subruns state from older runs - update the watcher install and design docs so the controller workflow explicitly treats stale reused-build-name state as part of startup cleanup --- atvm/docs/automation/mattermost-watcher-design.md | 4 ++++ atvm/watcher-service/INSTALL.md | 2 ++ atvm/watcher-service/README.md | 7 +++++++ atvm/watcher-service/start-atvm-run-watcher.sh | 2 ++ 4 files changed, 15 insertions(+) diff --git a/atvm/docs/automation/mattermost-watcher-design.md b/atvm/docs/automation/mattermost-watcher-design.md index 74fd536..ae1e8b4 100644 --- a/atvm/docs/automation/mattermost-watcher-design.md +++ b/atvm/docs/automation/mattermost-watcher-design.md @@ -144,6 +144,10 @@ Possible contents: - one cancellation marker per parent run id - optional lock file to prevent multiple watcher instances from racing +When the same requested parent build name is reused for a new run: +- the watcher start workflow must clear old watcher state for that requested build name before starting +- stale `cancelled.marker`, `posted.marker`, `state.json`, and `subruns/` contents must not be allowed to affect the new run + ## Recommended Operator Workflow Normal completion workflow: 1. ATVM run starts. diff --git a/atvm/watcher-service/INSTALL.md b/atvm/watcher-service/INSTALL.md index 53fee80..c46af09 100644 --- a/atvm/watcher-service/INSTALL.md +++ b/atvm/watcher-service/INSTALL.md @@ -120,6 +120,7 @@ Recommended permissions: - if the run uses `--categorize`, also pass `--categorize` to the watcher start helper - confirm final Mattermost delivery for a completed run - confirm categorized execution sends one post per completed grouped sub-run + - confirm reused parent build names do not inherit stale `cancelled.marker`, `posted.marker`, or `subruns/` state from older runs ## Recommended Validation Commands @@ -154,6 +155,7 @@ Once installed, the intended workflow is: 1. Launch the ATVM run as usual. 2. Start the watcher for that build name. + - the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance 3. Let the watcher run on the controller. 4. The watcher exits on terminal state. diff --git a/atvm/watcher-service/README.md b/atvm/watcher-service/README.md index 2689264..ca8dd5b 100644 --- a/atvm/watcher-service/README.md +++ b/atvm/watcher-service/README.md @@ -49,6 +49,7 @@ Typical workflow: 1. Launch the ATVM run. 2. Start the watcher for that run. 3. The watcher polls the run log, process state, and `cmcReporter` artifacts. + - before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run 4. For non-categorized runs, when the run reaches a terminal state: - `COMPLETED` or `FAILED` - build the final ATVM status @@ -105,6 +106,12 @@ That results in: - service instance: - `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service` +The helper also: + +- stops any stale watcher instance for that same requested build name +- removes the old watcher state directory for that requested build name +- starts the new watcher with a clean state root for the new run + ## Cancel Example ```bash diff --git a/atvm/watcher-service/start-atvm-run-watcher.sh b/atvm/watcher-service/start-atvm-run-watcher.sh index a9ae715..5dcb41f 100644 --- a/atvm/watcher-service/start-atvm-run-watcher.sh +++ b/atvm/watcher-service/start-atvm-run-watcher.sh @@ -49,6 +49,8 @@ if [[ -z "$BUILD_NAME" ]]; then fi RUN_DIR="${STATE_ROOT}/${BUILD_NAME}" +systemctl stop "atvm-run-watcher@${BUILD_NAME}.service" >/dev/null 2>&1 || true +rm -rf "$RUN_DIR" mkdir -p "$RUN_DIR" cat >"${RUN_DIR}/watch.env" <