Add ATVM watcher service and explicit watcher approval flow
- add the per-run ATVM watcher service package under atvm/watcher-service, including the Python watcher, systemd template unit, helper scripts, and deployment docs - document the watcher-service install and operating model, including one-run-per-instance behavior, Mattermost posting rules, and the best-practice /opt/atvm-watcher-service install path - clarify ATVM run approval semantics so `approve` means run without watcher and `approve with watcher` means run and start the watcher - update the ATVM automation guide and AGENTS rules so watcher usage and approval behavior are explicit and consistent
This commit is contained in:
220
atvm/watcher-service/INSTALL.md
Normal file
220
atvm/watcher-service/INSTALL.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# ATVM Watcher Service Install Plan
|
||||
|
||||
This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at `192.168.3.190`.
|
||||
|
||||
This is a deployment plan only. It does not perform the installation.
|
||||
|
||||
## Goal
|
||||
|
||||
Install the local watcher package so the controller can:
|
||||
|
||||
- watch one ATVM run per watcher instance
|
||||
- send final Mattermost status only for `COMPLETED` or `FAILED`
|
||||
- suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN`
|
||||
- stop automatically after the watched run reaches a terminal state
|
||||
|
||||
## Controller Target Layout
|
||||
|
||||
Recommended controller paths:
|
||||
|
||||
- package root:
|
||||
- `/opt/atvm-watcher-service`
|
||||
- service unit:
|
||||
- `/etc/systemd/system/atvm-run-watcher@.service`
|
||||
- global environment file:
|
||||
- `/etc/atvm-run-watcher.env`
|
||||
- state root:
|
||||
- `/var/lib/atvm-run-watcher`
|
||||
- ATVM automation root:
|
||||
- `/root/cdc-e2e-cyp-12.17.4`
|
||||
|
||||
Best-practice rule:
|
||||
|
||||
- install the watcher service package under `/opt/atvm-watcher-service`
|
||||
- do not use `/root/atvm-watcher-service` as the standard install location
|
||||
- if a temporary `/root/atvm-watcher-service` install exists, replace it with a clean `/opt/atvm-watcher-service` install
|
||||
|
||||
## Files To Install
|
||||
|
||||
From the local workspace:
|
||||
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
|
||||
- `/home/aw/code/cds/atvm/inventory/vm-inventory.md`
|
||||
|
||||
Optional reference docs:
|
||||
|
||||
- `/home/aw/code/cds/atvm/watcher-service/README.md`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/INSTALL.md`
|
||||
|
||||
## Required Controller Environment
|
||||
|
||||
The controller must have:
|
||||
|
||||
- `python3`
|
||||
- `systemd`
|
||||
- outbound network access to the Mattermost webhook
|
||||
- read access to:
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`
|
||||
- `/tmp/<build-name>.log`
|
||||
|
||||
## Required Secrets
|
||||
|
||||
The controller needs a watcher environment file with:
|
||||
|
||||
- `MATTERMOST_ATVM_WEBHOOK`
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
|
||||
Recommended file:
|
||||
|
||||
- `/etc/atvm-run-watcher.env`
|
||||
|
||||
Recommended permissions:
|
||||
|
||||
- owner: `root`
|
||||
- mode: `0600`
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
1. Create controller directories.
|
||||
- `/opt/atvm-watcher-service`
|
||||
- `/var/lib/atvm-run-watcher`
|
||||
|
||||
2. Copy package files to the controller.
|
||||
- copy the Python watcher
|
||||
- copy the `systemd` unit file
|
||||
- copy the helper scripts
|
||||
- copy `vm-inventory.md`
|
||||
|
||||
3. Set executable permissions.
|
||||
- `atvm_run_watcher.py`
|
||||
- `start-atvm-run-watcher.sh`
|
||||
- `cancel-atvm-run-watcher.sh`
|
||||
|
||||
4. Create `/etc/atvm-run-watcher.env`.
|
||||
- add Mattermost webhook/channel
|
||||
- keep permissions restricted
|
||||
|
||||
5. Install the `systemd` unit file.
|
||||
- copy to `/etc/systemd/system/atvm-run-watcher@.service`
|
||||
|
||||
6. Reload `systemd`.
|
||||
- `systemctl daemon-reload`
|
||||
|
||||
7. Run a syntax/smoke validation.
|
||||
- check Python import/launch
|
||||
- check helper script usage
|
||||
- verify the unit resolves
|
||||
|
||||
8. Do a non-production test.
|
||||
- start a watcher for a fake or completed build name
|
||||
- confirm state directory creation
|
||||
- confirm the watcher exits as expected
|
||||
|
||||
9. Do a real ATVM run test.
|
||||
- launch a real run
|
||||
- start the watcher for that build name
|
||||
- confirm final Mattermost delivery for a completed run
|
||||
|
||||
## Recommended Validation Commands
|
||||
|
||||
Examples for later execution on the controller:
|
||||
|
||||
```bash
|
||||
mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
|
||||
```
|
||||
|
||||
```bash
|
||||
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
|
||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
||||
```
|
||||
|
||||
```bash
|
||||
systemctl daemon-reload
|
||||
systemctl cat atvm-run-watcher@.service
|
||||
```
|
||||
|
||||
```bash
|
||||
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
|
||||
```
|
||||
|
||||
## Per-Run Usage After Install
|
||||
|
||||
Once installed, the intended workflow is:
|
||||
|
||||
1. Launch the ATVM run as usual.
|
||||
2. Start the watcher for that build name.
|
||||
3. Let the watcher run on the controller.
|
||||
4. The watcher exits on terminal state.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-run-watcher.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||
--template cmc-e2e \
|
||||
--config-family gold \
|
||||
--migration-style "ATVM end-to-end migration validation" \
|
||||
--integration-plugin "pure with fc" \
|
||||
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
||||
```
|
||||
|
||||
Cancel example:
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
||||
```
|
||||
|
||||
## Operational Notes
|
||||
|
||||
- This is not a daemon.
|
||||
- One watcher instance is started per ATVM run.
|
||||
- The watcher exits after the run reaches a terminal state.
|
||||
- The watcher writes state under `/var/lib/atvm-run-watcher/<build-name>`.
|
||||
- The watcher prevents duplicate Mattermost posts by writing a posted marker.
|
||||
|
||||
## Failure Handling
|
||||
|
||||
Expected terminal behavior:
|
||||
|
||||
- `COMPLETED`
|
||||
- post to Mattermost
|
||||
- verify `ok`
|
||||
- exit
|
||||
- `FAILED`
|
||||
- post to Mattermost
|
||||
- verify `ok`
|
||||
- exit
|
||||
- `CANCELLED`
|
||||
- do not post
|
||||
- exit
|
||||
- `TERMINATED`
|
||||
- do not post
|
||||
- exit
|
||||
- `HUNG`
|
||||
- do not post
|
||||
- exit
|
||||
- `UNKNOWN`
|
||||
- do not post
|
||||
- exit
|
||||
|
||||
## Answer To "Do We Need An Installer README?"
|
||||
|
||||
Not strictly, but yes, it is useful.
|
||||
|
||||
Why:
|
||||
|
||||
- it gives a repeatable controller deployment procedure
|
||||
- it separates local package design from controller installation steps
|
||||
- it makes later install/reinstall safer
|
||||
- it gives you a review checkpoint before anything is installed on `192.168.3.190`
|
||||
|
||||
That is the purpose of this file.
|
||||
Reference in New Issue
Block a user