# ATVM Watcher Service Install Plan This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at `192.168.3.190`. This is a deployment plan only. It does not perform the installation. ## Goal Install the local watcher package so the controller can: - watch one requested ATVM run per watcher instance - for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED` - for categorized runs, send one final Mattermost status per completed categorized sub-run/group - suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN` - stop automatically after the watched run reaches a terminal state ## Controller Target Layout Recommended controller paths: - package root: - `/opt/atvm-watcher-service` - service unit: - `/etc/systemd/system/atvm-run-watcher@.service` - global environment file: - `/etc/atvm-run-watcher.env` - state root: - `/var/lib/atvm-run-watcher` - ATVM automation root: - `/root/cdc-e2e-cyp-12.17.4` Best-practice rule: - install the watcher service package under `/opt/atvm-watcher-service` - do not use `/root/atvm-watcher-service` as the standard install location - if a temporary `/root/atvm-watcher-service` install exists, replace it with a clean `/opt/atvm-watcher-service` install ## Files To Install From the local workspace: - `/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py` - `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service` - `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh` - `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh` - `/home/aw/code/cds/atvm/inventory/vm-inventory.md` Optional reference docs: - `/home/aw/code/cds/atvm/watcher-service/README.md` - `/home/aw/code/cds/atvm/watcher-service/INSTALL.md` ## Required Controller Environment The controller must have: - `python3` - `systemd` - outbound network access to the Mattermost webhook - read access to: - `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter` - `/tmp/.log` ## Required Secrets The controller needs a watcher environment file with: - `MATTERMOST_ATVM_WEBHOOK` - `MATTERMOST_ATVM_CHANNEL` Recommended file: - `/etc/atvm-run-watcher.env` Recommended permissions: - owner: `root` - mode: `0600` ## Deployment Steps 1. Create controller directories. - `/opt/atvm-watcher-service` - `/var/lib/atvm-run-watcher` 2. Copy package files to the controller. - copy the Python watcher - copy the `systemd` unit file - copy the helper scripts - copy `vm-inventory.md` 3. Set executable permissions. - `atvm_run_watcher.py` - `start-atvm-run-watcher.sh` - `cancel-atvm-run-watcher.sh` 4. Create `/etc/atvm-run-watcher.env`. - add Mattermost webhook/channel - keep permissions restricted 5. Install the `systemd` unit file. - copy to `/etc/systemd/system/atvm-run-watcher@.service` 6. Reload `systemd`. - `systemctl daemon-reload` 7. Run a syntax/smoke validation. - check Python import/launch - check helper script usage - verify the unit resolves 8. Do a non-production test. - start a watcher for a fake or completed build name - confirm state directory creation - confirm the watcher exits as expected 9. Do a real ATVM run test. - launch a real run - start the watcher for that build name - if the run uses `--categorize`, also pass `--categorize` to the watcher start helper - confirm final Mattermost delivery for a completed run - confirm categorized execution sends one post per completed grouped sub-run ## Recommended Validation Commands Examples for later execution on the controller: ```bash mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher ``` ```bash chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh ``` ```bash systemctl daemon-reload systemctl cat atvm-run-watcher@.service ``` ```bash python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help ``` ```bash /opt/atvm-watcher-service/start-atvm-run-watcher.sh --help ``` ## Per-Run Usage After Install Once installed, the intended workflow is: 1. Launch the ATVM run as usual. 2. Start the watcher for that build name. 3. Let the watcher run on the controller. 4. The watcher exits on terminal state. Example: ```bash /opt/atvm-watcher-service/start-atvm-run-watcher.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --template cmc-e2e \ --config-family gold \ --migration-style "ATVM end-to-end migration validation" \ --integration-plugin "pure with fc" \ --categorize \ --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set" ``` Cancel example: ```bash /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc ``` The cancel helper should: - write `cancelled.marker` - update `state.json` so the final watcher state is `CANCELLED` - stop the watcher instance - avoid any Mattermost post for that run ## Operational Notes - This is not a daemon. - One watcher instance is started per ATVM run. - Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs. - The watcher exits after the run reaches a terminal state. - The watcher writes state under `/var/lib/atvm-run-watcher/`. - The watcher prevents duplicate Mattermost posts by writing posted markers. - Categorized sub-run state is written under `/var/lib/atvm-run-watcher//subruns//`. ## Failure Handling Expected terminal behavior: - `COMPLETED` - post to Mattermost - verify `ok` - exit - `FAILED` - post to Mattermost - verify `ok` - exit - categorized `COMPLETED` / `FAILED` - post once for that grouped sub-run - verify `ok` - continue until the parent request finishes - `CANCELLED` - write final `CANCELLED` state to `state.json` - do not post - exit - `TERMINATED` - do not post - exit - `HUNG` - do not post - exit - `UNKNOWN` - do not post - exit ## Answer To "Do We Need An Installer README?" Not strictly, but yes, it is useful. Why: - it gives a repeatable controller deployment procedure - it separates local package design from controller installation steps - it makes later install/reinstall safer - it gives you a review checkpoint before anything is installed on `192.168.3.190` That is the purpose of this file.