fix atvm watcher-backed run launch sequence

Execute the template step before starting watcher-backed ATVM runs.

- run --template-command synchronously in start-atvm-run.sh
- write template output to /tmp/<build>.launch.log
- stop before watcher/runner startup if template generation fails
- document the corrected wrapper behavior in watcher-service docs
- record the stale specPattern failure mode in automation run learnings
This commit is contained in:
2026-04-29 12:14:55 -04:00
parent 2832ea4175
commit 9673d769e2
4 changed files with 46 additions and 9 deletions

View File

@@ -9,6 +9,15 @@ This file stores run-specific examples only when a run produced a new learning r
## Current State
- No run-learning entries recorded yet from `guide.md` source material.
## Run Learning: 2026-04-29 (Combined watcher wrapper must execute template generation before runner startup)
- Observed failure mode:
- A watcher-backed `start-atvm-run.sh` launch for `cmc-migrateops-compute-migration` started `run-sorry-cypress.py` without ever running the approved `cmc-templates.py` command.
- The wrapper passed `--template-command` into watcher metadata only, so the runner consumed stale controller config state and started against a previous `specPattern` pointing at `atvm121-ubuntu24.04`.
- Action for future runs:
- The combined watcher wrapper must execute `--template-command` synchronously before watcher and runner startup.
- Write the template phase output to `/tmp/<build>.launch.log` so template activity is preserved separately from the live runner log.
- If the template step fails, stop immediately and do not start the watcher or the runner.
## Run Learning: 2026-04-24 (Categorized watcher false-PASS guardrail)
- Observed failure mode:
- A categorized compute-migration run was incorrectly reported as `PASS` for `atvm121-ubuntu24.04` even though the actual Ubuntu grouped sub-run failed.

View File

@@ -183,11 +183,13 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
Once installed, the intended workflow is:
1. Start the watcher for that build name.
1. Run the approved `cmc-templates.py` command for that build name.
- when using `start-atvm-run.sh`, the wrapper should execute `--template-command` synchronously and stop immediately if that step fails
2. Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
2. Start the runner service for that build name.
3. Let the runner and watcher run on the controller.
4. The watcher exits on terminal state.
3. Start the runner service for that build name.
4. Let the runner and watcher run on the controller.
5. The watcher exits on terminal state.
Example:

View File

@@ -57,11 +57,12 @@ Each watcher instance is tied to one requested build name.
Typical workflow:
1. Start the watcher for that run.
2. Start the runner service for that run.
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
1. Run the approved `cmc-templates.py` command for that run when one is provided.
2. Start the watcher for that run.
3. Start the runner service for that run.
4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
4. For non-categorized runs, when the run reaches a terminal state:
5. For non-categorized runs, when the run reaches a terminal state:
- `COMPLETED` or `FAILED`
- build the final ATVM status
- send the status to Mattermost
@@ -72,7 +73,7 @@ Typical workflow:
- do not post
- mark the final state
- exit
5. For categorized runs:
6. For categorized runs:
- detect each grouped sub-run in sequence from the parent run log
- wait for that grouped sub-run to finish
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
@@ -154,6 +155,9 @@ That results in:
The helper also:
- runs `--template-command` synchronously first when one is provided
- writes the template phase output to `/tmp/<build-name>.launch.log`
- exits before watcher/runner startup if the template step fails
- stops any stale watcher instance for that same requested build name
- removes the old watcher state directory for that requested build name
- starts the new watcher with a clean state root for the new run

View File

@@ -37,6 +37,7 @@ SCOPE_DESCRIPTION=""
WATCHER_CATEGORIZED="false"
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
RUNNER_LOG=""
LAUNCH_LOG=""
STATE_ROOT="/var/lib/atvm-run-watcher"
while [[ $# -gt 0 ]]; do
@@ -76,6 +77,8 @@ if [[ -z "$RUNNER_LOG" ]]; then
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
fi
LAUNCH_LOG="/tmp/${BUILD_NAME}.launch.log"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
watcher_cmd=(
@@ -112,6 +115,25 @@ runner_cmd=(
--state-root "${STATE_ROOT}"
)
mkdir -p "$(dirname "${LAUNCH_LOG}")"
: > "${LAUNCH_LOG}"
if [[ -n "${TEMPLATE_COMMAND}" ]]; then
{
echo "Running template command:"
echo "${TEMPLATE_COMMAND}"
echo
} >>"${LAUNCH_LOG}"
if ! (
cd "${RUNNER_WORKDIR}"
bash -lc "${TEMPLATE_COMMAND}"
) >>"${LAUNCH_LOG}" 2>&1; then
echo "Template command failed for ${BUILD_NAME}. See ${LAUNCH_LOG}" >&2
exit 1
fi
fi
"${watcher_cmd[@]}"
for _ in {1..15}; do