Reset reused watcher state before starting a new ATVM run
- update the watcher start helper to stop any stale watcher instance for the same requested parent build name and remove its old state directory before starting fresh - document that reused parent build names must not inherit stale cancelled, posted, state.json, or subruns state from older runs - update the watcher install and design docs so the controller workflow explicitly treats stale reused-build-name state as part of startup cleanup
This commit is contained in:
@@ -144,6 +144,10 @@ Possible contents:
|
||||
- one cancellation marker per parent run id
|
||||
- optional lock file to prevent multiple watcher instances from racing
|
||||
|
||||
When the same requested parent build name is reused for a new run:
|
||||
- the watcher start workflow must clear old watcher state for that requested build name before starting
|
||||
- stale `cancelled.marker`, `posted.marker`, `state.json`, and `subruns/` contents must not be allowed to affect the new run
|
||||
|
||||
## Recommended Operator Workflow
|
||||
Normal completion workflow:
|
||||
1. ATVM run starts.
|
||||
|
||||
@@ -120,6 +120,7 @@ Recommended permissions:
|
||||
- if the run uses `--categorize`, also pass `--categorize` to the watcher start helper
|
||||
- confirm final Mattermost delivery for a completed run
|
||||
- confirm categorized execution sends one post per completed grouped sub-run
|
||||
- confirm reused parent build names do not inherit stale `cancelled.marker`, `posted.marker`, or `subruns/` state from older runs
|
||||
|
||||
## Recommended Validation Commands
|
||||
|
||||
@@ -154,6 +155,7 @@ Once installed, the intended workflow is:
|
||||
|
||||
1. Launch the ATVM run as usual.
|
||||
2. Start the watcher for that build name.
|
||||
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
|
||||
3. Let the watcher run on the controller.
|
||||
4. The watcher exits on terminal state.
|
||||
|
||||
|
||||
@@ -49,6 +49,7 @@ Typical workflow:
|
||||
1. Launch the ATVM run.
|
||||
2. Start the watcher for that run.
|
||||
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
|
||||
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
||||
4. For non-categorized runs, when the run reaches a terminal state:
|
||||
- `COMPLETED` or `FAILED`
|
||||
- build the final ATVM status
|
||||
@@ -105,6 +106,12 @@ That results in:
|
||||
- service instance:
|
||||
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
|
||||
|
||||
The helper also:
|
||||
|
||||
- stops any stale watcher instance for that same requested build name
|
||||
- removes the old watcher state directory for that requested build name
|
||||
- starts the new watcher with a clean state root for the new run
|
||||
|
||||
## Cancel Example
|
||||
|
||||
```bash
|
||||
|
||||
@@ -49,6 +49,8 @@ if [[ -z "$BUILD_NAME" ]]; then
|
||||
fi
|
||||
|
||||
RUN_DIR="${STATE_ROOT}/${BUILD_NAME}"
|
||||
systemctl stop "atvm-run-watcher@${BUILD_NAME}.service" >/dev/null 2>&1 || true
|
||||
rm -rf "$RUN_DIR"
|
||||
mkdir -p "$RUN_DIR"
|
||||
|
||||
cat >"${RUN_DIR}/watch.env" <<EOF
|
||||
|
||||
Reference in New Issue
Block a user