Execute the template step before starting watcher-backed ATVM runs. - run --template-command synchronously in start-atvm-run.sh - write template output to /tmp/<build>.launch.log - stop before watcher/runner startup if template generation fails - document the corrected wrapper behavior in watcher-service docs - record the stale specPattern failure mode in automation run learnings
193 lines
8.3 KiB
Markdown
193 lines
8.3 KiB
Markdown
# ATVM Watcher Service
|
|
|
|
This folder contains a per-run ATVM watcher service package that is intended to be reviewed locally first and installed on the ATVM Cypress controller later only when explicitly requested.
|
|
|
|
## Purpose
|
|
|
|
Watch an ATVM automation request until it reaches a terminal state, then:
|
|
|
|
- for non-categorized runs:
|
|
- post one final status to Mattermost if the run state is `COMPLETED` or `FAILED`
|
|
- for categorized runs:
|
|
- detect each sequential categorized sub-run
|
|
- post one final status per completed categorized sub-run if that grouped run state is `COMPLETED` or `FAILED`
|
|
- verify each Mattermost post succeeded
|
|
- write durable watcher state
|
|
- exit cleanly so the service stops
|
|
|
|
The watcher does not run indefinitely. It is designed for one run per service instance.
|
|
|
|
## Files
|
|
|
|
- `atvm-runner@.service`
|
|
- `systemd` template unit for one runner instance per build name
|
|
- `atvm_run_watcher.py`
|
|
- main watcher implementation
|
|
- `atvm-run-watcher@.service`
|
|
- `systemd` template unit for one watcher instance per build name
|
|
- `run-atvm-runner.sh`
|
|
- runner wrapper used by the `systemd` runner unit
|
|
- `start-atvm-runner.sh`
|
|
- helper to write per-run runner environment data and start a runner instance
|
|
- `cancel-atvm-runner.sh`
|
|
- helper to stop a runner instance
|
|
- `start-atvm-run.sh`
|
|
- wrapper that starts watcher first, waits for it to be active, then starts the runner
|
|
- `start-atvm-run-watcher.sh`
|
|
- helper to write per-run environment data and start a watcher instance
|
|
- `cancel-atvm-run-watcher.sh`
|
|
- helper to mark a run cancelled and stop the watcher instance
|
|
|
|
## Intended Controller Paths
|
|
|
|
These are the default install targets assumed by the included unit file:
|
|
|
|
- service package root: `/opt/atvm-watcher-service`
|
|
- runner unit: `/etc/systemd/system/atvm-runner@.service`
|
|
- watcher state root: `/var/lib/atvm-run-watcher`
|
|
- controller ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
|
|
- watcher environment file: `/etc/atvm-run-watcher.env`
|
|
|
|
Use `/opt/atvm-watcher-service` as the controller install root for future installs and reinstalls.
|
|
Do not treat `/root/atvm-watcher-service` as the preferred long-term install location.
|
|
|
|
## Per-Run Behavior
|
|
|
|
Each watcher instance is tied to one requested build name.
|
|
|
|
Typical workflow:
|
|
|
|
1. Run the approved `cmc-templates.py` command for that run when one is provided.
|
|
2. Start the watcher for that run.
|
|
3. Start the runner service for that run.
|
|
4. The watcher polls the runner log, process state, and `cmcReporter` artifacts.
|
|
- before starting, the helper resets any prior watcher state for the same requested build name so stale cancellation or posted markers do not leak into a new run
|
|
5. For non-categorized runs, when the run reaches a terminal state:
|
|
- `COMPLETED` or `FAILED`
|
|
- build the final ATVM status
|
|
- send the status to Mattermost
|
|
- verify Mattermost returned `ok`
|
|
- mark the run as posted
|
|
- exit
|
|
- `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`
|
|
- do not post
|
|
- mark the final state
|
|
- exit
|
|
6. For categorized runs:
|
|
- detect each grouped sub-run in sequence from the parent run log
|
|
- wait for that grouped sub-run to finish
|
|
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
|
|
- keep the watcher alive while the parent categorized runner or related child Cypress process is still active
|
|
- do not treat one completed grouped sub-run as proof that the whole parent request is finished
|
|
- continue to the next grouped sub-run
|
|
- exit after the parent request reaches a terminal state
|
|
|
|
## Required Environment
|
|
|
|
The service expects the local credentials file values to be made available on the controller through the service environment:
|
|
|
|
- `MATTERMOST_ATVM_WEBHOOK`
|
|
- `MATTERMOST_ATVM_CHANNEL`
|
|
|
|
Optional metadata for better status formatting:
|
|
|
|
- `ATVM_WATCHER_TEMPLATE`
|
|
- `ATVM_WATCHER_CONFIG_FAMILY`
|
|
- `ATVM_WATCHER_MIGRATION_STYLE`
|
|
- `ATVM_WATCHER_INTEGRATION_PLUGIN`
|
|
- `ATVM_WATCHER_TEMPLATE_COMMAND`
|
|
- `ATVM_WATCHER_RUNNER_COMMAND`
|
|
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
|
|
- `ATVM_WATCHER_CATEGORIZED`
|
|
|
|
Runner environment required per run:
|
|
|
|
- `ATVM_RUNNER_COMMAND`
|
|
|
|
Runner environment optional per run:
|
|
|
|
- `ATVM_RUNNER_WORKDIR`
|
|
- `ATVM_RUNNER_LOG`
|
|
|
|
## Start Example
|
|
|
|
These helpers write per-run environment files and start the matching instances:
|
|
|
|
```bash
|
|
./start-atvm-run-watcher.sh \
|
|
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
|
--template cmc-e2e \
|
|
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
|
--config-family gold \
|
|
--migration-style "ATVM end-to-end migration validation" \
|
|
--integration-plugin "pure with fc" \
|
|
--categorize \
|
|
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
|
|
|
./start-atvm-runner.sh \
|
|
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
|
```
|
|
|
|
Preferred one-shot wrapper:
|
|
|
|
```bash
|
|
./start-atvm-run.sh \
|
|
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
|
--template cmc-e2e \
|
|
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
|
--config-family gold \
|
|
--config-file cypress.atvm-config-gold.ts \
|
|
--migration-style "ATVM end-to-end migration validation" \
|
|
--integration-plugin "pure with fc" \
|
|
--categorize
|
|
```
|
|
|
|
That results in:
|
|
|
|
- state dir:
|
|
- `/var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc`
|
|
- service instance:
|
|
- `atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
|
|
- `atvm-runner@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service`
|
|
|
|
The helper also:
|
|
|
|
- runs `--template-command` synchronously first when one is provided
|
|
- writes the template phase output to `/tmp/<build-name>.launch.log`
|
|
- exits before watcher/runner startup if the template step fails
|
|
- stops any stale watcher instance for that same requested build name
|
|
- removes the old watcher state directory for that requested build name
|
|
- starts the new watcher with a clean state root for the new run
|
|
|
|
## Cancel Example
|
|
|
|
```bash
|
|
./cancel-atvm-run-watcher.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
|
```
|
|
|
|
This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stops the watcher instance. The watcher will not send Mattermost results for that run.
|
|
|
|
Runner cancel example:
|
|
|
|
```bash
|
|
./cancel-atvm-runner.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
|
|
```
|
|
|
|
## Notes
|
|
|
|
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
|
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
|
|
- Prefer `start-atvm-run.sh` when launching both services together because it prevents the watcher/runner log-path race by enforcing watcher-first ordering.
|
|
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
|
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
|
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
|
- In categorized mode, if the child build id label does not match the host/spec actually being executed, the watcher reports the grouped run using the inferred host-based group name instead of trusting the raw child build id label.
|
|
- In categorized mode, grouped XML can finish with only `check-xml-files.ts`; when that happens, the watcher must recover per-host results from the matching host reporter artifacts.
|
|
- Do not infer `PASS completed` from host artifact presence alone. Parse the per-host reporter result and preserve real `FAIL` and `RUN/pending` state when reconstructing grouped results.
|
|
- When the repo copy of the watcher changes, the controller install under `/opt/atvm-watcher-service` must be updated before expecting the new reporting behavior from live runs.
|
|
- Best-practice controller install path: `/opt/atvm-watcher-service`.
|
|
- This package is local-only right now. Nothing here is installed on the controller yet.
|