Add ATVM systemd runner service

This commit is contained in:
2026-04-14 09:49:49 -04:00
parent a22ff8edf1
commit 7cdcbf8cf1
8 changed files with 220 additions and 14 deletions

View File

@@ -1,13 +1,14 @@
# ATVM Watcher Service Install Plan
This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at `192.168.3.190`.
This document describes how to deploy the ATVM per-run watcher and runner services to the ATVM Cypress controller at `192.168.3.190`.
This is a deployment plan only. It does not perform the installation.
## Goal
Install the local watcher package so the controller can:
Install the local watcher/runner package so the controller can:
- start one requested ATVM Cypress runner per service instance
- watch one requested ATVM run per watcher instance
- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
@@ -20,6 +21,8 @@ Recommended controller paths:
- package root:
- `/opt/atvm-watcher-service`
- runner service unit:
- `/etc/systemd/system/atvm-runner@.service`
- service unit:
- `/etc/systemd/system/atvm-run-watcher@.service`
- global environment file:
@@ -40,6 +43,10 @@ Best-practice rule:
From the local workspace:
- `/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py`
- `/home/aw/code/cds/atvm/watcher-service/run-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
@@ -84,12 +91,18 @@ Recommended permissions:
- `/var/lib/atvm-run-watcher`
2. Copy package files to the controller.
- copy the runner wrapper
- copy the runner `systemd` unit file
- copy the runner helper scripts
- copy the Python watcher
- copy the `systemd` unit file
- copy the helper scripts
- copy `vm-inventory.md`
3. Set executable permissions.
- `run-atvm-runner.sh`
- `start-atvm-runner.sh`
- `cancel-atvm-runner.sh`
- `atvm_run_watcher.py`
- `start-atvm-run-watcher.sh`
- `cancel-atvm-run-watcher.sh`
@@ -99,6 +112,7 @@ Recommended permissions:
- keep permissions restricted
5. Install the `systemd` unit file.
- copy the runner unit to `/etc/systemd/system/atvm-runner@.service`
- copy to `/etc/systemd/system/atvm-run-watcher@.service`
6. Reload `systemd`.
@@ -132,6 +146,9 @@ mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
```
```bash
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
@@ -139,6 +156,7 @@ chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
```bash
systemctl daemon-reload
systemctl cat atvm-runner@.service
systemctl cat atvm-run-watcher@.service
```
@@ -146,6 +164,10 @@ systemctl cat atvm-run-watcher@.service
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
```
```bash
/opt/atvm-watcher-service/start-atvm-runner.sh --help
```
```bash
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
```
@@ -154,10 +176,10 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
Once installed, the intended workflow is:
1. Launch the ATVM run as usual.
2. Start the watcher for that build name.
1. Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
3. Let the watcher run on the controller.
2. Start the runner service for that build name.
3. Let the runner and watcher run on the controller.
4. The watcher exits on terminal state.
Example:
@@ -173,10 +195,19 @@ Example:
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
/opt/atvm-watcher-service/start-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
```
Cancel example:
```bash
/opt/atvm-watcher-service/cancel-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
```
```bash
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
@@ -192,7 +223,9 @@ The cancel helper should:
## Operational Notes
- This is not a daemon.
- One runner instance is started per ATVM run.
- One watcher instance is started per ATVM run.
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
- The watcher exits after the run reaches a terminal state.

View File

@@ -19,10 +19,18 @@ The watcher does not run indefinitely. It is designed for one run per service in
## Files
- `atvm-runner@.service`
- `systemd` template unit for one runner instance per build name
- `atvm_run_watcher.py`
- main watcher implementation
- `atvm-run-watcher@.service`
- `systemd` template unit for one watcher instance per build name
- `run-atvm-runner.sh`
- runner wrapper used by the `systemd` runner unit
- `start-atvm-runner.sh`
- helper to write per-run runner environment data and start a runner instance
- `cancel-atvm-runner.sh`
- helper to stop a runner instance
- `start-atvm-run-watcher.sh`
- helper to write per-run environment data and start a watcher instance
- `cancel-atvm-run-watcher.sh`
@@ -33,6 +41,7 @@ The watcher does not run indefinitely. It is designed for one run per service in
These are the default install targets assumed by the included unit file:
- service package root: `/opt/atvm-watcher-service`
- runner unit: `/etc/systemd/system/atvm-runner@.service`
- watcher state root: `/var/lib/atvm-run-watcher`
- controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
- watcher environment file: `/etc/atvm-run-watcher.env`
@@ -46,9 +55,9 @@ Each watcher instance is tied to one requested build name.
Typical workflow:
1. Launch the ATVM run.
2. Start the watcher for that run.
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
1. Start the watcher for that run.
2. Start the runner service for that run.
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
4. For non-categorized runs, when the run reaches a terminal state:
- `COMPLETED` or `FAILED`
@@ -88,9 +97,18 @@ Optional metadata for better status formatting:
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
- `ATVM_WATCHER_CATEGORIZED`
Runner environment required per run:
- `ATVM_RUNNER_COMMAND`
Runner environment optional per run:
- `ATVM_RUNNER_WORKDIR`
- `ATVM_RUNNER_LOG`
## Start Example
This helper writes a per-run environment file and starts the matching instance:
These helpers write per-run environment files and start the matching instances:
```bash
./start-atvm-run-watcher.sh \
@@ -103,6 +121,10 @@ This helper writes a per-run environment file and starts the matching instance:
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
./start-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
```
That results in:
@@ -111,6 +133,7 @@ That results in:
- `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc`
- service instance:
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
- `atvm-runner@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
The helper also:
@@ -126,9 +149,16 @@ The helper also:
This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stops the watcher instance. The watcher will not send Mattermost results for that run.
Runner cancel example:
```bash
./cancel-atvm-runner.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
```
## Notes
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.

View File

@@ -0,0 +1,14 @@
[Unit]
Description=ATVM Cypress runner for %i
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=/opt/atvm-watcher-service
EnvironmentFile=-/var/lib/atvm-run-watcher/%i/run.env
ExecStart=/opt/atvm-watcher-service/run-atvm-runner.sh %i
Restart=no
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
cancel-atvm-runner.sh --build-name <name>
EOF
}
BUILD_NAME=""
while [[ $# -gt 0 ]]; do
case "$1" in
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
esac
done
if [[ -z "$BUILD_NAME" ]]; then
echo "--build-name is required" >&2
usage >&2
exit 1
fi
systemctl stop "atvm-runner@${BUILD_NAME}.service" || true

View File

@@ -0,0 +1,37 @@
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
run-atvm-runner.sh <build-name>
This script is intended to be launched by systemd for one ATVM run.
It expects environment variables from the runner unit/environment files:
ATVM_RUNNER_COMMAND
ATVM_RUNNER_WORKDIR
ATVM_RUNNER_LOG
EOF
}
BUILD_NAME="${1:-}"
if [[ -z "$BUILD_NAME" ]]; then
echo "build name is required" >&2
usage >&2
exit 1
fi
RUNNER_COMMAND="${ATVM_RUNNER_COMMAND:-}"
RUNNER_WORKDIR="${ATVM_RUNNER_WORKDIR:-/root/cdc-e2e-cyp-12.17.4}"
RUNNER_LOG="${ATVM_RUNNER_LOG:-/tmp/${BUILD_NAME}.log}"
if [[ -z "$RUNNER_COMMAND" ]]; then
echo "ATVM_RUNNER_COMMAND is required" >&2
exit 1
fi
mkdir -p "$(dirname "$RUNNER_LOG")"
: > "$RUNNER_LOG"
cd "$RUNNER_WORKDIR"
exec bash -lc "$RUNNER_COMMAND" >>"$RUNNER_LOG" 2>&1

View File

@@ -0,0 +1,63 @@
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
start-atvm-runner.sh --build-name <name> --runner-command <text> [options]
Options:
--build-name <name>
--runner-command <text>
--workdir <path> Default: /root/cdc-e2e-cyp-12.17.4
--log-path <path> Default: /tmp/<build-name>.log
--state-root <path> Default: /var/lib/atvm-run-watcher
EOF
}
BUILD_NAME=""
RUNNER_COMMAND=""
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
RUNNER_LOG=""
STATE_ROOT="/var/lib/atvm-run-watcher"
while [[ $# -gt 0 ]]; do
case "$1" in
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
--runner-command) RUNNER_COMMAND="${2:-}"; shift 2 ;;
--workdir) RUNNER_WORKDIR="${2:-}"; shift 2 ;;
--log-path) RUNNER_LOG="${2:-}"; shift 2 ;;
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
esac
done
if [[ -z "$BUILD_NAME" ]]; then
echo "--build-name is required" >&2
usage >&2
exit 1
fi
if [[ -z "$RUNNER_COMMAND" ]]; then
echo "--runner-command is required" >&2
usage >&2
exit 1
fi
if [[ -z "$RUNNER_LOG" ]]; then
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
fi
RUN_DIR="${STATE_ROOT}/${BUILD_NAME}"
mkdir -p "$RUN_DIR"
cat >"${RUN_DIR}/run.env" <<EOF
ATVM_RUNNER_COMMAND=${RUNNER_COMMAND@Q}
ATVM_RUNNER_WORKDIR=${RUNNER_WORKDIR@Q}
ATVM_RUNNER_LOG=${RUNNER_LOG@Q}
EOF
systemctl stop "atvm-runner@${BUILD_NAME}.service" >/dev/null 2>&1 || true
systemctl start "atvm-runner@${BUILD_NAME}.service"
systemctl status --no-pager "atvm-runner@${BUILD_NAME}.service" || true