fix atvm watcher-backed run launch sequence
Execute the template step before starting watcher-backed ATVM runs. - run --template-command synchronously in start-atvm-run.sh - write template output to /tmp/<build>.launch.log - stop before watcher/runner startup if template generation fails - document the corrected wrapper behavior in watcher-service docs - record the stale specPattern failure mode in automation run learnings
This commit is contained in:
@@ -9,6 +9,15 @@ This file stores run-specific examples only when a run produced a new learning r
|
|||||||
## Current State
|
## Current State
|
||||||
- No run-learning entries recorded yet from `guide.md` source material.
|
- No run-learning entries recorded yet from `guide.md` source material.
|
||||||
|
|
||||||
|
## Run Learning: 2026-04-29 (Combined watcher wrapper must execute template generation before runner startup)
|
||||||
|
- Observed failure mode:
|
||||||
|
- A watcher-backed `start-atvm-run.sh` launch for `cmc-migrateops-compute-migration` started `run-sorry-cypress.py` without ever running the approved `cmc-templates.py` command.
|
||||||
|
- The wrapper passed `--template-command` into watcher metadata only, so the runner consumed stale controller config state and started against a previous `specPattern` pointing at `atvm121-ubuntu24.04`.
|
||||||
|
- Action for future runs:
|
||||||
|
- The combined watcher wrapper must execute `--template-command` synchronously before watcher and runner startup.
|
||||||
|
- Write the template phase output to `/tmp/<build>.launch.log` so template activity is preserved separately from the live runner log.
|
||||||
|
- If the template step fails, stop immediately and do not start the watcher or the runner.
|
||||||
|
|
||||||
## Run Learning: 2026-04-24 (Categorized watcher false-PASS guardrail)
|
## Run Learning: 2026-04-24 (Categorized watcher false-PASS guardrail)
|
||||||
- Observed failure mode:
|
- Observed failure mode:
|
||||||
- A categorized compute-migration run was incorrectly reported as `PASS` for `atvm121-ubuntu24.04` even though the actual Ubuntu grouped sub-run failed.
|
- A categorized compute-migration run was incorrectly reported as `PASS` for `atvm121-ubuntu24.04` even though the actual Ubuntu grouped sub-run failed.
|
||||||
|
|||||||
@@ -183,11 +183,13 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
|||||||
|
|
||||||
Once installed, the intended workflow is:
|
Once installed, the intended workflow is:
|
||||||
|
|
||||||
1. Start the watcher for that build name.
|
1. Run the approved `cmc-templates.py` command for that build name.
|
||||||
|
- when using `start-atvm-run.sh`, the wrapper should execute `--template-command` synchronously and stop immediately if that step fails
|
||||||
|
2. Start the watcher for that build name.
|
||||||
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
|
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
|
||||||
2. Start the runner service for that build name.
|
3. Start the runner service for that build name.
|
||||||
3. Let the runner and watcher run on the controller.
|
4. Let the runner and watcher run on the controller.
|
||||||
4. The watcher exits on terminal state.
|
5. The watcher exits on terminal state.
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
|
|||||||
@@ -57,11 +57,12 @@ Each watcher instance is tied to one requested build name.
|
|||||||
|
|
||||||
Typical workflow:
|
Typical workflow:
|
||||||
|
|
||||||
1. Start the watcher for that run.
|
1. Run the approved `cmc-templates.py` command for that run when one is provided.
|
||||||
2. Start the runner service for that run.
|
2. Start the watcher for that run.
|
||||||
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
3. Start the runner service for that run.
|
||||||
|
4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
||||||
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
||||||
4. For non-categorized runs, when the run reaches a terminal state:
|
5. For non-categorized runs, when the run reaches a terminal state:
|
||||||
- `COMPLETED` or `FAILED`
|
- `COMPLETED` or `FAILED`
|
||||||
- build the final ATVM status
|
- build the final ATVM status
|
||||||
- send the status to Mattermost
|
- send the status to Mattermost
|
||||||
@@ -72,7 +73,7 @@ Typical workflow:
|
|||||||
- do not post
|
- do not post
|
||||||
- mark the final state
|
- mark the final state
|
||||||
- exit
|
- exit
|
||||||
5. For categorized runs:
|
6. For categorized runs:
|
||||||
- detect each grouped sub-run in sequence from the parent run log
|
- detect each grouped sub-run in sequence from the parent run log
|
||||||
- wait for that grouped sub-run to finish
|
- wait for that grouped sub-run to finish
|
||||||
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
|
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
|
||||||
@@ -154,6 +155,9 @@ That results in:
|
|||||||
|
|
||||||
The helper also:
|
The helper also:
|
||||||
|
|
||||||
|
- runs `--template-command` synchronously first when one is provided
|
||||||
|
- writes the template phase output to `/tmp/<build-name>.launch.log`
|
||||||
|
- exits before watcher/runner startup if the template step fails
|
||||||
- stops any stale watcher instance for that same requested build name
|
- stops any stale watcher instance for that same requested build name
|
||||||
- removes the old watcher state directory for that requested build name
|
- removes the old watcher state directory for that requested build name
|
||||||
- starts the new watcher with a clean state root for the new run
|
- starts the new watcher with a clean state root for the new run
|
||||||
|
|||||||
@@ -37,6 +37,7 @@ SCOPE_DESCRIPTION=""
|
|||||||
WATCHER_CATEGORIZED="false"
|
WATCHER_CATEGORIZED="false"
|
||||||
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
||||||
RUNNER_LOG=""
|
RUNNER_LOG=""
|
||||||
|
LAUNCH_LOG=""
|
||||||
STATE_ROOT="/var/lib/atvm-run-watcher"
|
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||||
|
|
||||||
while [[ $# -gt 0 ]]; do
|
while [[ $# -gt 0 ]]; do
|
||||||
@@ -76,6 +77,8 @@ if [[ -z "$RUNNER_LOG" ]]; then
|
|||||||
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
LAUNCH_LOG="/tmp/${BUILD_NAME}.launch.log"
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
|
||||||
watcher_cmd=(
|
watcher_cmd=(
|
||||||
@@ -112,6 +115,25 @@ runner_cmd=(
|
|||||||
--state-root "${STATE_ROOT}"
|
--state-root "${STATE_ROOT}"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
mkdir -p "$(dirname "${LAUNCH_LOG}")"
|
||||||
|
: > "${LAUNCH_LOG}"
|
||||||
|
|
||||||
|
if [[ -n "${TEMPLATE_COMMAND}" ]]; then
|
||||||
|
{
|
||||||
|
echo "Running template command:"
|
||||||
|
echo "${TEMPLATE_COMMAND}"
|
||||||
|
echo
|
||||||
|
} >>"${LAUNCH_LOG}"
|
||||||
|
|
||||||
|
if ! (
|
||||||
|
cd "${RUNNER_WORKDIR}"
|
||||||
|
bash -lc "${TEMPLATE_COMMAND}"
|
||||||
|
) >>"${LAUNCH_LOG}" 2>&1; then
|
||||||
|
echo "Template command failed for ${BUILD_NAME}. See ${LAUNCH_LOG}" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
"${watcher_cmd[@]}"
|
"${watcher_cmd[@]}"
|
||||||
|
|
||||||
for _ in {1..15}; do
|
for _ in {1..15}; do
|
||||||
|
|||||||
Reference in New Issue
Block a user