# ATVM Watcher Service This folder contains a per-run ATVM watcher service package that is intended to be reviewed locally first and installed on the ATVM Cypress controller later only when explicitly requested. ## Purpose Watch an ATVM automation request until it reaches a terminal state, then: - for non-categorized runs: - post one final status to Mattermost if the run state is `COMPLETED` or `FAILED` - for categorized runs: - detect each sequential categorized sub-run - post one final status per completed categorized sub-run if that grouped run state is `COMPLETED` or `FAILED` - verify each Mattermost post succeeded - write durable watcher state - exit cleanly so the service stops The watcher does not run indefinitely. It is designed for one run per service instance. ## Files - `atvm-runner@.service` - `systemd` template unit for one runner instance per build name - `atvm_run_watcher.py` - main watcher implementation - `atvm-run-watcher@.service` - `systemd` template unit for one watcher instance per build name - `run-atvm-runner.sh` - runner wrapper used by the `systemd` runner unit - `start-atvm-runner.sh` - helper to write per-run runner environment data and start a runner instance - `cancel-atvm-runner.sh` - helper to stop a runner instance - `start-atvm-run.sh` - wrapper that starts watcher first, waits for it to be active, then starts the runner - `start-atvm-run-watcher.sh` - helper to write per-run environment data and start a watcher instance - `cancel-atvm-run-watcher.sh` - helper to mark a run cancelled and stop the watcher instance ## Intended Controller Paths These are the default install targets assumed by the included unit file: - service package root: `/opt/atvm-watcher-service` - runner unit: `/etc/systemd/system/atvm-runner@.service` - watcher state root: `/var/lib/atvm-run-watcher` - controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4` - watcher environment file: `/etc/atvm-run-watcher.env` Use `/opt/atvm-watcher-service` as the controller install root for future installs and reinstalls. Do not treat `/root/atvm-watcher-service` as the preferred long-term install location. ## Per-Run Behavior Each watcher instance is tied to one requested build name. Typical workflow: 1. Run the approved `cmc-templates.py` command for that run when one is provided. 2. Start the watcher for that run. 3. Start the runner service for that run. 4. The watcher polls the runner log, process state, and `cmcReporter` artifacts. - before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run 5. For non-categorized runs, when the run reaches a terminal state: - `COMPLETED` or `FAILED` - build the final ATVM status - send the status to Mattermost - verify Mattermost returned `ok` - mark the run as posted - exit - `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN` - do not post - mark the final state - exit 6. For categorized runs: - detect each grouped sub-run in sequence from the parent run log - wait for that grouped sub-run to finish - send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED` - keep the watcher alive while the parent categorized runner or related child Cypress process is still active - do not treat one completed grouped sub-run as proof that the whole parent request is finished - continue to the next grouped sub-run - exit after the parent request reaches a terminal state ## Required Environment The service expects the local credentials file values to be made available on the controller through the service environment: - `MATTERMOST_ATVM_WEBHOOK` - `MATTERMOST_ATVM_CHANNEL` Optional metadata for better status formatting: - `ATVM_WATCHER_TEMPLATE` - `ATVM_WATCHER_CONFIG_FAMILY` - `ATVM_WATCHER_MIGRATION_STYLE` - `ATVM_WATCHER_INTEGRATION_PLUGIN` - `ATVM_WATCHER_TEMPLATE_COMMAND` - `ATVM_WATCHER_RUNNER_COMMAND` - `ATVM_WATCHER_SCOPE_DESCRIPTION` - `ATVM_WATCHER_CATEGORIZED` Runner environment required per run: - `ATVM_RUNNER_COMMAND` Runner environment optional per run: - `ATVM_RUNNER_WORKDIR` - `ATVM_RUNNER_LOG` ## Start Example These helpers write per-run environment files and start the matching instances: ```bash ./start-atvm-run-watcher.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --template cmc-e2e \ --template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \ --config-family gold \ --migration-style "ATVM end-to-end migration validation" \ --integration-plugin "pure with fc" \ --categorize \ --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set" ./start-atvm-runner.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" ``` Preferred one-shot wrapper: ```bash ./start-atvm-run.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --template cmc-e2e \ --template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \ --config-family gold \ --config-file cypress.atvm-config-gold.ts \ --migration-style "ATVM end-to-end migration validation" \ --integration-plugin "pure with fc" \ --categorize ``` That results in: - state dir: - `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc` - service instance: - `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service` - `atvm-runner@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service` The helper also: - runs `--template-command` synchronously first when one is provided - writes the template phase output to `/tmp/.launch.log` - exits before watcher/runner startup if the template step fails - stops any stale watcher instance for that same requested build name - removes the old watcher state directory for that requested build name - starts the new watcher with a clean state root for the new run ## Cancel Example ```bash ./cancel-atvm-run-watcher.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc ``` This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stops the watcher instance. The watcher will not send Mattermost results for that run. Runner cancel example: ```bash ./cancel-atvm-runner.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc ``` ## Notes - The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`. - Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`. - Prefer `start-atvm-run.sh` when launching both services together because it prevents the watcher/runner log-path race by enforcing watcher-first ordering. - Kernel values are resolved from `atvm/inventory/vm-inventory.md`. - Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases. - In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately. - In categorized mode, if the child build id label does not match the host/spec actually being executed, the watcher reports the grouped run using the inferred host-based group name instead of trusting the raw child build id label. - In categorized mode, grouped XML can finish with only `check-xml-files.ts`; when that happens, the watcher must recover per-host results from the matching host reporter artifacts. - Do not infer `PASS completed` from host artifact presence alone. Parse the per-host reporter result and preserve real `FAIL` and `RUN/pending` state when reconstructing grouped results. - When the repo copy of the watcher changes, the controller install under `/opt/atvm-watcher-service` must be updated before expecting the new reporting behavior from live runs. - Best-practice controller install path: `/opt/atvm-watcher-service`. - This package is local-only right now. Nothing here is installed on the controller yet.