- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases - document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post - add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup - update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
6.4 KiB
ATVM Watcher Service Install Plan
This document describes how to deploy the ATVM per-run watcher service to the ATVM Cypress controller at 192.168.3.190.
This is a deployment plan only. It does not perform the installation.
Goal
Install the local watcher package so the controller can:
- watch one requested ATVM run per watcher instance
- for non-categorized runs, send one final Mattermost status only for
COMPLETEDorFAILED - for categorized runs, send one final Mattermost status per completed categorized sub-run/group
- suppress Mattermost posts for
CANCELLED,TERMINATED,HUNG, andUNKNOWN - stop automatically after the watched run reaches a terminal state
Controller Target Layout
Recommended controller paths:
- package root:
/opt/atvm-watcher-service
- service unit:
/etc/systemd/system/atvm-run-watcher@.service
- global environment file:
/etc/atvm-run-watcher.env
- state root:
/var/lib/atvm-run-watcher
- ATVM automation root:
/root/cdc-e2e-cyp-12.17.4
Best-practice rule:
- install the watcher service package under
/opt/atvm-watcher-service - do not use
/root/atvm-watcher-serviceas the standard install location - if a temporary
/root/atvm-watcher-serviceinstall exists, replace it with a clean/opt/atvm-watcher-serviceinstall
Files To Install
From the local workspace:
/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh/home/aw/code/cds/atvm/inventory/vm-inventory.md
Optional reference docs:
/home/aw/code/cds/atvm/watcher-service/README.md/home/aw/code/cds/atvm/watcher-service/INSTALL.md
Required Controller Environment
The controller must have:
python3systemd- outbound network access to the Mattermost webhook
- read access to:
/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/tmp/<build-name>.log
Required Secrets
The controller needs a watcher environment file with:
MATTERMOST_ATVM_WEBHOOKMATTERMOST_ATVM_CHANNEL
Recommended file:
/etc/atvm-run-watcher.env
Recommended permissions:
- owner:
root - mode:
0600
Deployment Steps
-
Create controller directories.
/opt/atvm-watcher-service/var/lib/atvm-run-watcher
-
Copy package files to the controller.
- copy the Python watcher
- copy the
systemdunit file - copy the helper scripts
- copy
vm-inventory.md
-
Set executable permissions.
atvm_run_watcher.pystart-atvm-run-watcher.shcancel-atvm-run-watcher.sh
-
Create
/etc/atvm-run-watcher.env.- add Mattermost webhook/channel
- keep permissions restricted
-
Install the
systemdunit file.- copy to
/etc/systemd/system/atvm-run-watcher@.service
- copy to
-
Reload
systemd.systemctl daemon-reload
-
Run a syntax/smoke validation.
- check Python import/launch
- check helper script usage
- verify the unit resolves
-
Do a non-production test.
- start a watcher for a fake or completed build name
- confirm state directory creation
- confirm the watcher exits as expected
-
Do a real ATVM run test.
- launch a real run
- start the watcher for that build name
- if the run uses
--categorize, also pass--categorizeto the watcher start helper - confirm final Mattermost delivery for a completed run
- confirm categorized execution sends one post per completed grouped sub-run
Recommended Validation Commands
Examples for later execution on the controller:
mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
systemctl daemon-reload
systemctl cat atvm-run-watcher@.service
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
Per-Run Usage After Install
Once installed, the intended workflow is:
- Launch the ATVM run as usual.
- Start the watcher for that build name.
- Let the watcher run on the controller.
- The watcher exits on terminal state.
Example:
/opt/atvm-watcher-service/start-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--template cmc-e2e \
--config-family gold \
--migration-style "ATVM end-to-end migration validation" \
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
Cancel example:
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
The cancel helper should:
- write
cancelled.marker - update
state.jsonso the final watcher state isCANCELLED - stop the watcher instance
- avoid any Mattermost post for that run
Operational Notes
- This is not a daemon.
- One watcher instance is started per ATVM run.
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
- The watcher exits after the run reaches a terminal state.
- The watcher writes state under
/var/lib/atvm-run-watcher/<build-name>. - The watcher prevents duplicate Mattermost posts by writing posted markers.
- Categorized sub-run state is written under
/var/lib/atvm-run-watcher/<build-name>/subruns/<subrun-key>/.
Failure Handling
Expected terminal behavior:
COMPLETED- post to Mattermost
- verify
ok - exit
FAILED- post to Mattermost
- verify
ok - exit
- categorized
COMPLETED/FAILED- post once for that grouped sub-run
- verify
ok - continue until the parent request finishes
CANCELLED- write final
CANCELLEDstate tostate.json - do not post
- exit
- write final
TERMINATED- do not post
- exit
HUNG- do not post
- exit
UNKNOWN- do not post
- exit
Answer To "Do We Need An Installer README?"
Not strictly, but yes, it is useful.
Why:
- it gives a repeatable controller deployment procedure
- it separates local package design from controller installation steps
- it makes later install/reinstall safer
- it gives you a review checkpoint before anything is installed on
192.168.3.190
That is the purpose of this file.