Add ATVM systemd runner service
This commit is contained in:
@@ -1,13 +1,14 @@
|
||||
# ATVM Watcher Service Install Plan
|
||||
|
||||
This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at `192.168.3.190`.
|
||||
This document describes how to deploy the ATVM per-run watcher and runner services to the ATVM Cypress controller at `192.168.3.190`.
|
||||
|
||||
This is a deployment plan only. It does not perform the installation.
|
||||
|
||||
## Goal
|
||||
|
||||
Install the local watcher package so the controller can:
|
||||
Install the local watcher/runner package so the controller can:
|
||||
|
||||
- start one requested ATVM Cypress runner per service instance
|
||||
- watch one requested ATVM run per watcher instance
|
||||
- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
|
||||
- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
|
||||
@@ -20,6 +21,8 @@ Recommended controller paths:
|
||||
|
||||
- package root:
|
||||
- `/opt/atvm-watcher-service`
|
||||
- runner service unit:
|
||||
- `/etc/systemd/system/atvm-runner@.service`
|
||||
- service unit:
|
||||
- `/etc/systemd/system/atvm-run-watcher@.service`
|
||||
- global environment file:
|
||||
@@ -40,6 +43,10 @@ Best-practice rule:
|
||||
From the local workspace:
|
||||
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/run-atvm-runner.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
|
||||
@@ -84,12 +91,18 @@ Recommended permissions:
|
||||
- `/var/lib/atvm-run-watcher`
|
||||
|
||||
2. Copy package files to the controller.
|
||||
- copy the runner wrapper
|
||||
- copy the runner `systemd` unit file
|
||||
- copy the runner helper scripts
|
||||
- copy the Python watcher
|
||||
- copy the `systemd` unit file
|
||||
- copy the helper scripts
|
||||
- copy `vm-inventory.md`
|
||||
|
||||
3. Set executable permissions.
|
||||
- `run-atvm-runner.sh`
|
||||
- `start-atvm-runner.sh`
|
||||
- `cancel-atvm-runner.sh`
|
||||
- `atvm_run_watcher.py`
|
||||
- `start-atvm-run-watcher.sh`
|
||||
- `cancel-atvm-run-watcher.sh`
|
||||
@@ -99,6 +112,7 @@ Recommended permissions:
|
||||
- keep permissions restricted
|
||||
|
||||
5. Install the `systemd` unit file.
|
||||
- copy the runner unit to `/etc/systemd/system/atvm-runner@.service`
|
||||
- copy to `/etc/systemd/system/atvm-run-watcher@.service`
|
||||
|
||||
6. Reload `systemd`.
|
||||
@@ -132,6 +146,9 @@ mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
|
||||
```
|
||||
|
||||
```bash
|
||||
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
|
||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
||||
@@ -139,6 +156,7 @@ chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
||||
|
||||
```bash
|
||||
systemctl daemon-reload
|
||||
systemctl cat atvm-runner@.service
|
||||
systemctl cat atvm-run-watcher@.service
|
||||
```
|
||||
|
||||
@@ -146,6 +164,10 @@ systemctl cat atvm-run-watcher@.service
|
||||
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-runner.sh --help
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
|
||||
```
|
||||
@@ -154,10 +176,10 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||
|
||||
Once installed, the intended workflow is:
|
||||
|
||||
1. Launch the ATVM run as usual.
|
||||
2. Start the watcher for that build name.
|
||||
1. Start the watcher for that build name.
|
||||
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
|
||||
3. Let the watcher run on the controller.
|
||||
2. Start the runner service for that build name.
|
||||
3. Let the runner and watcher run on the controller.
|
||||
4. The watcher exits on terminal state.
|
||||
|
||||
Example:
|
||||
@@ -173,10 +195,19 @@ Example:
|
||||
--integration-plugin "pure with fc" \
|
||||
--categorize \
|
||||
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
||||
|
||||
/opt/atvm-watcher-service/start-atvm-runner.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||
```
|
||||
|
||||
Cancel example:
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/cancel-atvm-runner.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
||||
@@ -192,7 +223,9 @@ The cancel helper should:
|
||||
## Operational Notes
|
||||
|
||||
- This is not a daemon.
|
||||
- One runner instance is started per ATVM run.
|
||||
- One watcher instance is started per ATVM run.
|
||||
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
||||
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
|
||||
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
|
||||
- The watcher exits after the run reaches a terminal state.
|
||||
|
||||
@@ -19,10 +19,18 @@ The watcher does not run indefinitely. It is designed for one run per service in
|
||||
|
||||
## Files
|
||||
|
||||
- `atvm-runner@.service`
|
||||
- `systemd` template unit for one runner instance per build name
|
||||
- `atvm_run_watcher.py`
|
||||
- main watcher implementation
|
||||
- `atvm-run-watcher@.service`
|
||||
- `systemd` template unit for one watcher instance per build name
|
||||
- `run-atvm-runner.sh`
|
||||
- runner wrapper used by the `systemd` runner unit
|
||||
- `start-atvm-runner.sh`
|
||||
- helper to write per-run runner environment data and start a runner instance
|
||||
- `cancel-atvm-runner.sh`
|
||||
- helper to stop a runner instance
|
||||
- `start-atvm-run-watcher.sh`
|
||||
- helper to write per-run environment data and start a watcher instance
|
||||
- `cancel-atvm-run-watcher.sh`
|
||||
@@ -33,6 +41,7 @@ The watcher does not run indefinitely. It is designed for one run per service in
|
||||
These are the default install targets assumed by the included unit file:
|
||||
|
||||
- service package root: `/opt/atvm-watcher-service`
|
||||
- runner unit: `/etc/systemd/system/atvm-runner@.service`
|
||||
- watcher state root: `/var/lib/atvm-run-watcher`
|
||||
- controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
|
||||
- watcher environment file: `/etc/atvm-run-watcher.env`
|
||||
@@ -46,9 +55,9 @@ Each watcher instance is tied to one requested build name.
|
||||
|
||||
Typical workflow:
|
||||
|
||||
1. Launch the ATVM run.
|
||||
2. Start the watcher for that run.
|
||||
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
|
||||
1. Start the watcher for that run.
|
||||
2. Start the runner service for that run.
|
||||
3. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
||||
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
||||
4. For non-categorized runs, when the run reaches a terminal state:
|
||||
- `COMPLETED` or `FAILED`
|
||||
@@ -88,9 +97,18 @@ Optional metadata for better status formatting:
|
||||
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
|
||||
- `ATVM_WATCHER_CATEGORIZED`
|
||||
|
||||
Runner environment required per run:
|
||||
|
||||
- `ATVM_RUNNER_COMMAND`
|
||||
|
||||
Runner environment optional per run:
|
||||
|
||||
- `ATVM_RUNNER_WORKDIR`
|
||||
- `ATVM_RUNNER_LOG`
|
||||
|
||||
## Start Example
|
||||
|
||||
This helper writes a per-run environment file and starts the matching instance:
|
||||
These helpers write per-run environment files and start the matching instances:
|
||||
|
||||
```bash
|
||||
./start-atvm-run-watcher.sh \
|
||||
@@ -103,6 +121,10 @@ This helper writes a per-run environment file and starts the matching instance:
|
||||
--integration-plugin "pure with fc" \
|
||||
--categorize \
|
||||
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
||||
|
||||
./start-atvm-runner.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||
```
|
||||
|
||||
That results in:
|
||||
@@ -111,6 +133,7 @@ That results in:
|
||||
- `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc`
|
||||
- service instance:
|
||||
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
|
||||
- `atvm-runner@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
|
||||
|
||||
The helper also:
|
||||
|
||||
@@ -126,9 +149,16 @@ The helper also:
|
||||
|
||||
This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stops the watcher instance. The watcher will not send Mattermost results for that run.
|
||||
|
||||
Runner cancel example:
|
||||
|
||||
```bash
|
||||
./cancel-atvm-runner.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
||||
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
|
||||
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
||||
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
||||
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
||||
|
||||
14
atvm/watcher-service/atvm-runner@.service
Normal file
14
atvm/watcher-service/atvm-runner@.service
Normal file
@@ -0,0 +1,14 @@
|
||||
[Unit]
|
||||
Description=ATVM Cypress runner for %i
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
WorkingDirectory=/opt/atvm-watcher-service
|
||||
EnvironmentFile=-/var/lib/atvm-run-watcher/%i/run.env
|
||||
ExecStart=/opt/atvm-watcher-service/run-atvm-runner.sh %i
|
||||
Restart=no
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
27
atvm/watcher-service/cancel-atvm-runner.sh
Normal file
27
atvm/watcher-service/cancel-atvm-runner.sh
Normal file
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
Usage:
|
||||
cancel-atvm-runner.sh --build-name <name>
|
||||
EOF
|
||||
}
|
||||
|
||||
BUILD_NAME=""
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
|
||||
-h|--help) usage; exit 0 ;;
|
||||
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$BUILD_NAME" ]]; then
|
||||
echo "--build-name is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
systemctl stop "atvm-runner@${BUILD_NAME}.service" || true
|
||||
37
atvm/watcher-service/run-atvm-runner.sh
Normal file
37
atvm/watcher-service/run-atvm-runner.sh
Normal file
@@ -0,0 +1,37 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
Usage:
|
||||
run-atvm-runner.sh <build-name>
|
||||
|
||||
This script is intended to be launched by systemd for one ATVM run.
|
||||
It expects environment variables from the runner unit/environment files:
|
||||
ATVM_RUNNER_COMMAND
|
||||
ATVM_RUNNER_WORKDIR
|
||||
ATVM_RUNNER_LOG
|
||||
EOF
|
||||
}
|
||||
|
||||
BUILD_NAME="${1:-}"
|
||||
if [[ -z "$BUILD_NAME" ]]; then
|
||||
echo "build name is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
RUNNER_COMMAND="${ATVM_RUNNER_COMMAND:-}"
|
||||
RUNNER_WORKDIR="${ATVM_RUNNER_WORKDIR:-/root/cdc-e2e-cyp-12.17.4}"
|
||||
RUNNER_LOG="${ATVM_RUNNER_LOG:-/tmp/${BUILD_NAME}.log}"
|
||||
|
||||
if [[ -z "$RUNNER_COMMAND" ]]; then
|
||||
echo "ATVM_RUNNER_COMMAND is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$RUNNER_LOG")"
|
||||
: > "$RUNNER_LOG"
|
||||
|
||||
cd "$RUNNER_WORKDIR"
|
||||
exec bash -lc "$RUNNER_COMMAND" >>"$RUNNER_LOG" 2>&1
|
||||
63
atvm/watcher-service/start-atvm-runner.sh
Normal file
63
atvm/watcher-service/start-atvm-runner.sh
Normal file
@@ -0,0 +1,63 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
Usage:
|
||||
start-atvm-runner.sh --build-name <name> --runner-command <text> [options]
|
||||
|
||||
Options:
|
||||
--build-name <name>
|
||||
--runner-command <text>
|
||||
--workdir <path> Default: /root/cdc-e2e-cyp-12.17.4
|
||||
--log-path <path> Default: /tmp/<build-name>.log
|
||||
--state-root <path> Default: /var/lib/atvm-run-watcher
|
||||
EOF
|
||||
}
|
||||
|
||||
BUILD_NAME=""
|
||||
RUNNER_COMMAND=""
|
||||
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
||||
RUNNER_LOG=""
|
||||
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
|
||||
--runner-command) RUNNER_COMMAND="${2:-}"; shift 2 ;;
|
||||
--workdir) RUNNER_WORKDIR="${2:-}"; shift 2 ;;
|
||||
--log-path) RUNNER_LOG="${2:-}"; shift 2 ;;
|
||||
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
|
||||
-h|--help) usage; exit 0 ;;
|
||||
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$BUILD_NAME" ]]; then
|
||||
echo "--build-name is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "$RUNNER_COMMAND" ]]; then
|
||||
echo "--runner-command is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "$RUNNER_LOG" ]]; then
|
||||
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
||||
fi
|
||||
|
||||
RUN_DIR="${STATE_ROOT}/${BUILD_NAME}"
|
||||
mkdir -p "$RUN_DIR"
|
||||
|
||||
cat >"${RUN_DIR}/run.env" <<EOF
|
||||
ATVM_RUNNER_COMMAND=${RUNNER_COMMAND@Q}
|
||||
ATVM_RUNNER_WORKDIR=${RUNNER_WORKDIR@Q}
|
||||
ATVM_RUNNER_LOG=${RUNNER_LOG@Q}
|
||||
EOF
|
||||
|
||||
systemctl stop "atvm-runner@${BUILD_NAME}.service" >/dev/null 2>&1 || true
|
||||
systemctl start "atvm-runner@${BUILD_NAME}.service"
|
||||
systemctl status --no-pager "atvm-runner@${BUILD_NAME}.service" || true
|
||||
Reference in New Issue
Block a user