fix atvm watcher-backed run launch sequence

Execute the template step before starting watcher-backed ATVM runs.

- run --template-command synchronously in start-atvm-run.sh
- write template output to /tmp/<build>.launch.log
- stop before watcher/runner startup if template generation fails
- document the corrected wrapper behavior in watcher-service docs
- record the stale specPattern failure mode in automation run learnings
This commit is contained in:
2026-04-29 12:14:55 -04:00
parent 2832ea4175
commit 9673d769e2
4 changed files with 46 additions and 9 deletions

View File

@@ -183,11 +183,13 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
Once installed, the intended workflow is:
1. Start the watcher for that build name.
1. Run the approved `cmc-templates.py` command for that build name.
- when using `start-atvm-run.sh`, the wrapper should execute `--template-command` synchronously and stop immediately if that step fails
2. Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
2. Start the runner service for that build name.
3. Let the runner and watcher run on the controller.
4. The watcher exits on terminal state.
3. Start the runner service for that build name.
4. Let the runner and watcher run on the controller.
5. The watcher exits on terminal state.
Example:

View File

@@ -57,11 +57,12 @@ Each watcher instance is tied to one requested build name.
Typical workflow:
1. Start the watcher for that run.
2. Start the runner service for that run.
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
1. Run the approved `cmc-templates.py` command for that run when one is provided.
2. Start the watcher for that run.
3. Start the runner service for that run.
4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
4. For non-categorized runs, when the run reaches a terminal state:
5. For non-categorized runs, when the run reaches a terminal state:
- `COMPLETED` or `FAILED`
- build the final ATVM status
- send the status to Mattermost
@@ -72,7 +73,7 @@ Typical workflow:
- do not post
- mark the final state
- exit
5. For categorized runs:
6. For categorized runs:
- detect each grouped sub-run in sequence from the parent run log
- wait for that grouped sub-run to finish
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
@@ -154,6 +155,9 @@ That results in:
The helper also:
- runs `--template-command` synchronously first when one is provided
- writes the template phase output to `/tmp/<build-name>.launch.log`
- exits before watcher/runner startup if the template step fails
- stops any stale watcher instance for that same requested build name
- removes the old watcher state directory for that requested build name
- starts the new watcher with a clean state root for the new run

View File

@@ -37,6 +37,7 @@ SCOPE_DESCRIPTION=""
WATCHER_CATEGORIZED="false"
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
RUNNER_LOG=""
LAUNCH_LOG=""
STATE_ROOT="/var/lib/atvm-run-watcher"
while [[ $# -gt 0 ]]; do
@@ -76,6 +77,8 @@ if [[ -z "$RUNNER_LOG" ]]; then
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
fi
LAUNCH_LOG="/tmp/${BUILD_NAME}.launch.log"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
watcher_cmd=(
@@ -112,6 +115,25 @@ runner_cmd=(
--state-root "${STATE_ROOT}"
)
mkdir -p "$(dirname "${LAUNCH_LOG}")"
: > "${LAUNCH_LOG}"
if [[ -n "${TEMPLATE_COMMAND}" ]]; then
{
echo "Running template command:"
echo "${TEMPLATE_COMMAND}"
echo
} >>"${LAUNCH_LOG}"
if ! (
cd "${RUNNER_WORKDIR}"
bash -lc "${TEMPLATE_COMMAND}"
) >>"${LAUNCH_LOG}" 2>&1; then
echo "Template command failed for ${BUILD_NAME}. See ${LAUNCH_LOG}" >&2
exit 1
fi
fi
"${watcher_cmd[@]}"
for _ in {1..15}; do