Add ATVM systemd runner service

This commit is contained in:
2026-04-14 09:49:49 -04:00
parent a22ff8edf1
commit 7cdcbf8cf1
8 changed files with 220 additions and 14 deletions

View File

@@ -1,13 +1,14 @@
# ATVM Watcher Service Install Plan
This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at `192.168.3.190`.
This document describes how to deploy the ATVM per-run watcher and runner services to the ATVM Cypress controller at `192.168.3.190`.
This is a deployment plan only. It does not perform the installation.
## Goal
Install the local watcher package so the controller can:
Install the local watcher/runner package so the controller can:
- start one requested ATVM Cypress runner per service instance
- watch one requested ATVM run per watcher instance
- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
@@ -20,6 +21,8 @@ Recommended controller paths:
- package root:
- `/opt/atvm-watcher-service`
- runner service unit:
- `/etc/systemd/system/atvm-runner@.service`
- service unit:
- `/etc/systemd/system/atvm-run-watcher@.service`
- global environment file:
@@ -40,6 +43,10 @@ Best-practice rule:
From the local workspace:
- `/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py`
- `/home/aw/code/cds/atvm/watcher-service/run-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
@@ -84,12 +91,18 @@ Recommended permissions:
- `/var/lib/atvm-run-watcher`
2. Copy package files to the controller.
- copy the runner wrapper
- copy the runner `systemd` unit file
- copy the runner helper scripts
- copy the Python watcher
- copy the `systemd` unit file
- copy the helper scripts
- copy `vm-inventory.md`
3. Set executable permissions.
- `run-atvm-runner.sh`
- `start-atvm-runner.sh`
- `cancel-atvm-runner.sh`
- `atvm_run_watcher.py`
- `start-atvm-run-watcher.sh`
- `cancel-atvm-run-watcher.sh`
@@ -99,6 +112,7 @@ Recommended permissions:
- keep permissions restricted
5. Install the `systemd` unit file.
- copy the runner unit to `/etc/systemd/system/atvm-runner@.service`
- copy to `/etc/systemd/system/atvm-run-watcher@.service`
6. Reload `systemd`.
@@ -132,6 +146,9 @@ mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
```
```bash
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
@@ -139,6 +156,7 @@ chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
```bash
systemctl daemon-reload
systemctl cat atvm-runner@.service
systemctl cat atvm-run-watcher@.service
```
@@ -146,6 +164,10 @@ systemctl cat atvm-run-watcher@.service
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
```
```bash
/opt/atvm-watcher-service/start-atvm-runner.sh --help
```
```bash
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
```
@@ -154,10 +176,10 @@ python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
Once installed, the intended workflow is:
1. Launch the ATVM run as usual.
2. Start the watcher for that build name.
1. Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
3. Let the watcher run on the controller.
2. Start the runner service for that build name.
3. Let the runner and watcher run on the controller.
4. The watcher exits on terminal state.
Example:
@@ -173,10 +195,19 @@ Example:
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
/opt/atvm-watcher-service/start-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
```
Cancel example:
```bash
/opt/atvm-watcher-service/cancel-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
```
```bash
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
@@ -192,7 +223,9 @@ The cancel helper should:
## Operational Notes
- This is not a daemon.
- One runner instance is started per ATVM run.
- One watcher instance is started per ATVM run.
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
- The watcher exits after the run reaches a terminal state.