Add ATVM watcher service and explicit watcher approval flow

- add the per-run ATVM watcher service package under atvm/watcher-service, including the Python watcher, systemd template unit, helper scripts, and deployment docs
- document the watcher-service install and operating model, including one-run-per-instance behavior, Mattermost posting rules, and the best-practice /opt/atvm-watcher-service install path
- clarify ATVM run approval semantics so `approve` means run without watcher and `approve with watcher` means run and start the watcher
- update the ATVM automation guide and AGENTS rules so watcher usage and approval behavior are explicit and consistent
This commit is contained in:
2026-03-25 17:41:50 -04:00
parent fe228ff0e9
commit ba8354b95c
9 changed files with 962 additions and 8 deletions

View File

@@ -0,0 +1,109 @@
# ATVM Watcher Service
This folder contains a per-run ATVM watcher service package that is intended to be reviewed locally first and installed on the ATVM Cypress controller later only when explicitly requested.
## Purpose
Watch a single ATVM automation run until it reaches a terminal state, then:
- post the final status to Mattermost if the run state is `COMPLETED` or `FAILED`
- verify the Mattermost post succeeded
- write durable watcher state
- exit cleanly so the service stops
The watcher does not run indefinitely. It is designed for one run per service instance.
## Files
- `atvm_run_watcher.py`
- main watcher implementation
- `atvm-run-watcher@.service`
- `systemd` template unit for one watcher instance per build name
- `start-atvm-run-watcher.sh`
- helper to write per-run environment data and start a watcher instance
- `cancel-atvm-run-watcher.sh`
- helper to mark a run cancelled and stop the watcher instance
## Intended Controller Paths
These are the default install targets assumed by the included unit file:
- service package root: `/opt/atvm-watcher-service`
- watcher state root: `/var/lib/atvm-run-watcher`
- controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
- watcher environment file: `/etc/atvm-run-watcher.env`
Use `/opt/atvm-watcher-service` as the controller install root for future installs and reinstalls.
Do not treat `/root/atvm-watcher-service` as the preferred long-term install location.
## Per-Run Behavior
Each watcher instance is tied to one build name.
Typical workflow:
1. Launch the ATVM run.
2. Start the watcher for that run.
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
4. When the run reaches a terminal state:
- `COMPLETED` or `FAILED`
- build the final ATVM status
- send the status to Mattermost
- verify Mattermost returned `ok`
- mark the run as posted
- exit
- `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`
- do not post
- mark the final state
- exit
## Required Environment
The service expects the local credentials file values to be made available on the controller through the service environment:
- `MATTERMOST_ATVM_WEBHOOK`
- `MATTERMOST_ATVM_CHANNEL`
Optional metadata for better status formatting:
- `ATVM_WATCHER_TEMPLATE`
- `ATVM_WATCHER_CONFIG_FAMILY`
- `ATVM_WATCHER_MIGRATION_STYLE`
- `ATVM_WATCHER_INTEGRATION_PLUGIN`
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
## Start Example
This helper writes a per-run environment file and starts the matching instance:
```bash
./start-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--template cmc-e2e \
--config-family gold \
--migration-style "ATVM end-to-end migration validation" \
--integration-plugin "pure with fc" \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
```
That results in:
- state dir:
- `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc`
- service instance:
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
## Cancel Example
```bash
./cancel-atvm-run-watcher.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
```
This writes a cancellation marker and stops the watcher instance. The watcher will not send Mattermost results for that run.
## Notes
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
- Best-practice controller install path: `/opt/atvm-watcher-service`.
- This package is local-only right now. Nothing here is installed on the controller yet.