# ATVM Watcher Service This folder contains a per-run ATVM watcher service package that is intended to be reviewed locally first and installed on the ATVM Cypress controller later only when explicitly requested. ## Purpose Watch an ATVM automation request until it reaches a terminal state, then: - for non-categorized runs: - post one final status to Mattermost if the run state is `COMPLETED` or `FAILED` - for categorized runs: - detect each sequential categorized sub-run - post one final status per completed categorized sub-run if that grouped run state is `COMPLETED` or `FAILED` - verify each Mattermost post succeeded - write durable watcher state - exit cleanly so the service stops The watcher does not run indefinitely. It is designed for one run per service instance. ## Files - `atvm-runner@.service` - `systemd` template unit for one runner instance per build name - `atvm_run_watcher.py` - main watcher implementation - `atvm-run-watcher@.service` - `systemd` template unit for one watcher instance per build name - `run-atvm-runner.sh` - runner wrapper used by the `systemd` runner unit - `start-atvm-runner.sh` - helper to write per-run runner environment data and start a runner instance - `cancel-atvm-runner.sh` - helper to stop a runner instance - `start-atvm-run.sh` - wrapper that starts watcher first, waits for it to be active, then starts the runner - `start-atvm-run-watcher.sh` - helper to write per-run environment data and start a watcher instance - `cancel-atvm-run-watcher.sh` - helper to mark a run cancelled and stop the watcher instance ## Intended Controller Paths These are the default install targets assumed by the included unit file: - service package root: `/opt/atvm-watcher-service` - runner unit: `/etc/systemd/system/atvm-runner@.service` - watcher state root: `/var/lib/atvm-run-watcher` - controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4` - watcher environment file: `/etc/atvm-run-watcher.env` Use `/opt/atvm-watcher-service` as the controller install root for future installs and reinstalls. Do not treat `/root/atvm-watcher-service` as the preferred long-term install location. ## Per-Run Behavior Each watcher instance is tied to one requested build name. Typical workflow: 1. Start the watcher for that run. 2. Start the runner service for that run. 3. The watcher polls the runner log, process state, and `cmcReporter` artifacts. - before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run 4. For non-categorized runs, when the run reaches a terminal state: - `COMPLETED` or `FAILED` - build the final ATVM status - send the status to Mattermost - verify Mattermost returned `ok` - mark the run as posted - exit - `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN` - do not post - mark the final state - exit 5. For categorized runs: - detect each grouped sub-run in sequence from the parent run log - wait for that grouped sub-run to finish - send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED` - keep the watcher alive while the parent categorized runner or related child Cypress process is still active - do not treat one completed grouped sub-run as proof that the whole parent request is finished - continue to the next grouped sub-run - exit after the parent request reaches a terminal state ## Required Environment The service expects the local credentials file values to be made available on the controller through the service environment: - `MATTERMOST_ATVM_WEBHOOK` - `MATTERMOST_ATVM_CHANNEL` Optional metadata for better status formatting: - `ATVM_WATCHER_TEMPLATE` - `ATVM_WATCHER_CONFIG_FAMILY` - `ATVM_WATCHER_MIGRATION_STYLE` - `ATVM_WATCHER_INTEGRATION_PLUGIN` - `ATVM_WATCHER_TEMPLATE_COMMAND` - `ATVM_WATCHER_RUNNER_COMMAND` - `ATVM_WATCHER_SCOPE_DESCRIPTION` - `ATVM_WATCHER_CATEGORIZED` Runner environment required per run: - `ATVM_RUNNER_COMMAND` Runner environment optional per run: - `ATVM_RUNNER_WORKDIR` - `ATVM_RUNNER_LOG` ## Start Example These helpers write per-run environment files and start the matching instances: ```bash ./start-atvm-run-watcher.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --template cmc-e2e \ --template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \ --config-family gold \ --migration-style "ATVM end-to-end migration validation" \ --integration-plugin "pure with fc" \ --categorize \ --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set" ./start-atvm-runner.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" ``` Preferred one-shot wrapper: ```bash ./start-atvm-run.sh \ --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \ --template cmc-e2e \ --template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \ --runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \ --config-family gold \ --config-file cypress.atvm-config-gold.ts \ --migration-style "ATVM end-to-end migration validation" \ --integration-plugin "pure with fc" \ --categorize ``` That results in: - state dir: - `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc` - service instance: - `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service` - `atvm-runner@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service` The helper also: - stops any stale watcher instance for that same requested build name - removes the old watcher state directory for that requested build name - starts the new watcher with a clean state root for the new run ## Cancel Example ```bash ./cancel-atvm-run-watcher.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc ``` This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stops the watcher instance. The watcher will not send Mattermost results for that run. Runner cancel example: ```bash ./cancel-atvm-runner.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc ``` ## Notes - The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`. - Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`. - Prefer `start-atvm-run.sh` when launching both services together because it prevents the watcher/runner log-path race by enforcing watcher-first ordering. - Kernel values are resolved from `atvm/inventory/vm-inventory.md`. - Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases. - In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately. - In categorized mode, if the child build id label does not match the host/spec actually being executed, the watcher reports the grouped run using the inferred host-based group name instead of trusting the raw child build id label. - In categorized mode, grouped XML can finish with only `check-xml-files.ts`; when that happens, the watcher must recover per-host results from the matching host reporter artifacts. - Do not infer `PASS completed` from host artifact presence alone. Parse the per-host reporter result and preserve real `FAIL` and `RUN/pending` state when reconstructing grouped results. - When the repo copy of the watcher changes, the controller install under `/opt/atvm-watcher-service` must be updated before expecting the new reporting behavior from live runs. - Best-practice controller install path: `/opt/atvm-watcher-service`. - This package is local-only right now. Nothing here is installed on the controller yet.