fix atvm watcher-backed run launch sequence
Execute the template step before starting watcher-backed ATVM runs. - run --template-command synchronously in start-atvm-run.sh - write template output to /tmp/<build>.launch.log - stop before watcher/runner startup if template generation fails - document the corrected wrapper behavior in watcher-service docs - record the stale specPattern failure mode in automation run learnings
This commit is contained in:
@@ -9,6 +9,15 @@ This file stores run-specific examples only when a run produced a new learning r
|
||||
## Current State
|
||||
- No run-learning entries recorded yet from `guide.md` source material.
|
||||
|
||||
## Run Learning: 2026-04-29 (Combined watcher wrapper must execute template generation before runner startup)
|
||||
- Observed failure mode:
|
||||
- A watcher-backed `start-atvm-run.sh` launch for `cmc-migrateops-compute-migration` started `run-sorry-cypress.py` without ever running the approved `cmc-templates.py` command.
|
||||
- The wrapper passed `--template-command` into watcher metadata only, so the runner consumed stale controller config state and started against a previous `specPattern` pointing at `atvm121-ubuntu24.04`.
|
||||
- Action for future runs:
|
||||
- The combined watcher wrapper must execute `--template-command` synchronously before watcher and runner startup.
|
||||
- Write the template phase output to `/tmp/<build>.launch.log` so template activity is preserved separately from the live runner log.
|
||||
- If the template step fails, stop immediately and do not start the watcher or the runner.
|
||||
|
||||
## Run Learning: 2026-04-24 (Categorized watcher false-PASS guardrail)
|
||||
- Observed failure mode:
|
||||
- A categorized compute-migration run was incorrectly reported as `PASS` for `atvm121-ubuntu24.04` even though the actual Ubuntu grouped sub-run failed.
|
||||
|
||||
@@ -183,11 +183,13 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||
|
||||
Once installed, the intended workflow is:
|
||||
|
||||
1. Start the watcher for that build name.
|
||||
1. Run the approved `cmc-templates.py` command for that build name.
|
||||
- when using `start-atvm-run.sh`, the wrapper should execute `--template-command` synchronously and stop immediately if that step fails
|
||||
2. Start the watcher for that build name.
|
||||
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
|
||||
2. Start the runner service for that build name.
|
||||
3. Let the runner and watcher run on the controller.
|
||||
4. The watcher exits on terminal state.
|
||||
3. Start the runner service for that build name.
|
||||
4. Let the runner and watcher run on the controller.
|
||||
5. The watcher exits on terminal state.
|
||||
|
||||
Example:
|
||||
|
||||
|
||||
@@ -57,11 +57,12 @@ Each watcher instance is tied to one requested build name.
|
||||
|
||||
Typical workflow:
|
||||
|
||||
1. Start the watcher for that run.
|
||||
2. Start the runner service for that run.
|
||||
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
||||
1. Run the approved `cmc-templates.py` command for that run when one is provided.
|
||||
2. Start the watcher for that run.
|
||||
3. Start the runner service for that run.
|
||||
4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
||||
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
||||
4. For non-categorized runs, when the run reaches a terminal state:
|
||||
5. For non-categorized runs, when the run reaches a terminal state:
|
||||
- `COMPLETED` or `FAILED`
|
||||
- build the final ATVM status
|
||||
- send the status to Mattermost
|
||||
@@ -72,7 +73,7 @@ Typical workflow:
|
||||
- do not post
|
||||
- mark the final state
|
||||
- exit
|
||||
5. For categorized runs:
|
||||
6. For categorized runs:
|
||||
- detect each grouped sub-run in sequence from the parent run log
|
||||
- wait for that grouped sub-run to finish
|
||||
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
|
||||
@@ -154,6 +155,9 @@ That results in:
|
||||
|
||||
The helper also:
|
||||
|
||||
- runs `--template-command` synchronously first when one is provided
|
||||
- writes the template phase output to `/tmp/<build-name>.launch.log`
|
||||
- exits before watcher/runner startup if the template step fails
|
||||
- stops any stale watcher instance for that same requested build name
|
||||
- removes the old watcher state directory for that requested build name
|
||||
- starts the new watcher with a clean state root for the new run
|
||||
|
||||
@@ -37,6 +37,7 @@ SCOPE_DESCRIPTION=""
|
||||
WATCHER_CATEGORIZED="false"
|
||||
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
||||
RUNNER_LOG=""
|
||||
LAUNCH_LOG=""
|
||||
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
@@ -76,6 +77,8 @@ if [[ -z "$RUNNER_LOG" ]]; then
|
||||
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
||||
fi
|
||||
|
||||
LAUNCH_LOG="/tmp/${BUILD_NAME}.launch.log"
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
watcher_cmd=(
|
||||
@@ -112,6 +115,25 @@ runner_cmd=(
|
||||
--state-root "${STATE_ROOT}"
|
||||
)
|
||||
|
||||
mkdir -p "$(dirname "${LAUNCH_LOG}")"
|
||||
: > "${LAUNCH_LOG}"
|
||||
|
||||
if [[ -n "${TEMPLATE_COMMAND}" ]]; then
|
||||
{
|
||||
echo "Running template command:"
|
||||
echo "${TEMPLATE_COMMAND}"
|
||||
echo
|
||||
} >>"${LAUNCH_LOG}"
|
||||
|
||||
if ! (
|
||||
cd "${RUNNER_WORKDIR}"
|
||||
bash -lc "${TEMPLATE_COMMAND}"
|
||||
) >>"${LAUNCH_LOG}" 2>&1; then
|
||||
echo "Template command failed for ${BUILD_NAME}. See ${LAUNCH_LOG}" >&2
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
"${watcher_cmd[@]}"
|
||||
|
||||
for _ in {1..15}; do
|
||||
|
||||
Reference in New Issue
Block a user