fix atvm watcher-backed run launch sequence

Execute the template step before starting watcher-backed ATVM runs. - run --template-command synchronously in start-atvm-run.sh - write template output to /tmp/<build>.launch.log - stop before watcher/runner startup if template generation fails - document the corrected wrapper behavior in watcher-service docs - record the stale specPattern failure mode in automation run learnings
2026-04-29 12:14:55 -04:00
parent 2832ea4175
commit 9673d769e2
4 changed files with 46 additions and 9 deletions
--- a/atvm/watcher-service/INSTALL.md
+++ b/atvm/watcher-service/INSTALL.md
@@ -183,11 +183,13 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help

 Once installed, the intended workflow is:

-1. Start the watcher for that build name.
+1. Run the approved `cmc-templates.py` command for that build name.
+   - when using `start-atvm-run.sh`, the wrapper should execute `--template-command` synchronously and stop immediately if that step fails
+2. Start the watcher for that build name.
   - the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
-2. Start the runner service for that build name.
-3. Let the runner and watcher run on the controller.
-4. The watcher exits on terminal state.
+3. Start the runner service for that build name.
+4. Let the runner and watcher run on the controller.
+5. The watcher exits on terminal state.

 Example:

--- a/atvm/watcher-service/README.md
+++ b/atvm/watcher-service/README.md
@@ -57,11 +57,12 @@ Each watcher instance is tied to one requested build name.

 Typical workflow:

-1. Start the watcher for that run.
-2. Start the runner service for that run.
-3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
+1. Run the approved `cmc-templates.py` command for that run when one is provided.
+2. Start the watcher for that run.
+3. Start the runner service for that run.
+4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
   - before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
-4. For non-categorized runs, when the run reaches a terminal state:
+5. For non-categorized runs, when the run reaches a terminal state:
   - `COMPLETED` or `FAILED`
     - build the final ATVM status
     - send the status to Mattermost
@@ -72,7 +73,7 @@ Typical workflow:
     - do not post
     - mark the final state
     - exit
-5. For categorized runs:
+6. For categorized runs:
  - detect each grouped sub-run in sequence from the parent run log
  - wait for that grouped sub-run to finish
  - send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
@@ -154,6 +155,9 @@ That results in:

 The helper also:

+- runs `--template-command` synchronously first when one is provided
+- writes the template phase output to `/tmp/<build-name>.launch.log`
+- exits before watcher/runner startup if the template step fails
 - stops any stale watcher instance for that same requested build name
 - removes the old watcher state directory for that requested build name
 - starts the new watcher with a clean state root for the new run
--- a/atvm/watcher-service/start-atvm-run.sh
+++ b/atvm/watcher-service/start-atvm-run.sh
@@ -37,6 +37,7 @@ SCOPE_DESCRIPTION=""
 WATCHER_CATEGORIZED="false"
 RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
 RUNNER_LOG=""
+LAUNCH_LOG=""
 STATE_ROOT="/var/lib/atvm-run-watcher"

 while [[ $# -gt 0 ]]; do
@@ -76,6 +77,8 @@ if [[ -z "$RUNNER_LOG" ]]; then
  RUNNER_LOG="/tmp/${BUILD_NAME}.log"
 fi

+LAUNCH_LOG="/tmp/${BUILD_NAME}.launch.log"
+
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

 watcher_cmd=(
@@ -112,6 +115,25 @@ runner_cmd=(
  --state-root "${STATE_ROOT}"
 )

+mkdir -p "$(dirname "${LAUNCH_LOG}")"
+: > "${LAUNCH_LOG}"
+
+if [[ -n "${TEMPLATE_COMMAND}" ]]; then
+  {
+    echo "Running template command:"
+    echo "${TEMPLATE_COMMAND}"
+    echo
+  } >>"${LAUNCH_LOG}"
+
+  if ! (
+    cd "${RUNNER_WORKDIR}"
+    bash -lc "${TEMPLATE_COMMAND}"
+  ) >>"${LAUNCH_LOG}" 2>&1; then
+    echo "Template command failed for ${BUILD_NAME}. See ${LAUNCH_LOG}" >&2
+    exit 1
+  fi
+fi
+
 "${watcher_cmd[@]}"

 for _ in {1..15}; do