8.6 KiB
ATVM Watcher Service Install Plan
This document describes how to deploy the ATVM per-run watcher and runner services to the ATVM Cypress controller at 192.168.3.190.
This is a deployment plan only. It does not perform the installation.
Goal
Install the local watcher/runner package so the controller can:
- start one requested ATVM Cypress runner per service instance
- watch one requested ATVM run per watcher instance
- for non-categorized runs, send one final Mattermost status only for
COMPLETEDorFAILED - for categorized runs, send one final Mattermost status per completed categorized sub-run/group
- suppress Mattermost posts for
CANCELLED,TERMINATED,HUNG, andUNKNOWN - stop automatically after the watched run reaches a terminal state
Controller Target Layout
Recommended controller paths:
- package root:
/opt/atvm-watcher-service
- runner service unit:
/etc/systemd/system/atvm-runner@.service
- service unit:
/etc/systemd/system/atvm-run-watcher@.service
- global environment file:
/etc/atvm-run-watcher.env
- state root:
/var/lib/atvm-run-watcher
- ATVM automation root:
/root/cdc-e2e-cyp-12.17.4
Best-practice rule:
- install the watcher service package under
/opt/atvm-watcher-service - do not use
/root/atvm-watcher-serviceas the standard install location - if a temporary
/root/atvm-watcher-serviceinstall exists, replace it with a clean/opt/atvm-watcher-serviceinstall
Files To Install
From the local workspace:
/home/aw/code/cds/atvm/watcher-service/atvm_run_watcher.py/home/aw/code/cds/atvm/watcher-service/run-atvm-runner.sh/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh/home/aw/code/cds/atvm/inventory/vm-inventory.md
Optional reference docs:
/home/aw/code/cds/atvm/watcher-service/README.md/home/aw/code/cds/atvm/watcher-service/INSTALL.md
Required Controller Environment
The controller must have:
python3systemd- outbound network access to the Mattermost webhook
- read access to:
/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/tmp/<build-name>.log
Required Secrets
The controller needs a watcher environment file with:
MATTERMOST_ATVM_WEBHOOKMATTERMOST_ATVM_CHANNEL
Recommended file:
/etc/atvm-run-watcher.env
Recommended permissions:
- owner:
root - mode:
0600
Deployment Steps
-
Create controller directories.
/opt/atvm-watcher-service/var/lib/atvm-run-watcher
-
Copy package files to the controller.
- copy the runner wrapper
- copy the runner
systemdunit file - copy the runner helper scripts
- copy the Python watcher
- copy the
systemdunit file - copy the helper scripts
- copy
vm-inventory.md
-
Set executable permissions.
run-atvm-runner.shstart-atvm-runner.shcancel-atvm-runner.shatvm_run_watcher.pystart-atvm-run-watcher.shcancel-atvm-run-watcher.sh
-
Create
/etc/atvm-run-watcher.env.- add Mattermost webhook/channel
- keep permissions restricted
-
Install the
systemdunit file.- copy the runner unit to
/etc/systemd/system/atvm-runner@.service - copy to
/etc/systemd/system/atvm-run-watcher@.service
- copy the runner unit to
-
Reload
systemd.systemctl daemon-reload
-
Run a syntax/smoke validation.
- check Python import/launch
- check helper script usage
- verify the unit resolves
-
Do a non-production test.
- start a watcher for a fake or completed build name
- confirm state directory creation
- confirm the watcher exits as expected
-
Do a real ATVM run test.
- launch a real run
- start the watcher for that build name
- if the run uses
--categorize, also pass--categorizeto the watcher start helper - confirm final Mattermost delivery for a completed run
- confirm categorized execution sends one post per completed grouped sub-run
- confirm the watcher stays alive between categorized grouped runs while the parent request is still active
- confirm reused parent build names do not inherit stale
cancelled.marker,posted.marker, orsubruns/state from older runs
Recommended Validation Commands
Examples for later execution on the controller:
mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
systemctl daemon-reload
systemctl cat atvm-runner@.service
systemctl cat atvm-run-watcher@.service
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
/opt/atvm-watcher-service/start-atvm-runner.sh --help
/opt/atvm-watcher-service/start-atvm-run-watcher.sh --help
Per-Run Usage After Install
Once installed, the intended workflow is:
- Start the watcher for that build name.
- the start helper must clear any stale watcher state for that same requested build name before starting the new watcher instance
- Start the runner service for that build name.
- Let the runner and watcher run on the controller.
- The watcher exits on terminal state.
Example:
/opt/atvm-watcher-service/start-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--template cmc-e2e \
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
--config-family gold \
--migration-style "ATVM end-to-end migration validation" \
--integration-plugin "pure with fc" \
--categorize \
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
/opt/atvm-watcher-service/start-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
Cancel example:
/opt/atvm-watcher-service/cancel-atvm-runner.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
/opt/atvm-watcher-service/cancel-atvm-run-watcher.sh \
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc
The cancel helper should:
- write
cancelled.marker - update
state.jsonso the final watcher state isCANCELLED - stop the watcher instance
- avoid any Mattermost post for that run
Operational Notes
- This is not a daemon.
- One runner instance is started per ATVM run.
- One watcher instance is started per ATVM run.
- Prefer the
atvm-runner@...service over detached SSH background launch patterns forrun-sorry-cypress.py. - Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
- The watcher exits after the run reaches a terminal state.
- The watcher writes state under
/var/lib/atvm-run-watcher/<build-name>. - The watcher prevents duplicate Mattermost posts by writing posted markers.
- Categorized sub-run state is written under
/var/lib/atvm-run-watcher/<build-name>/subruns/<subrun-key>/.
Failure Handling
Expected terminal behavior:
COMPLETED- post to Mattermost
- verify
ok - exit
FAILED- post to Mattermost
- verify
ok - exit
- categorized
COMPLETED/FAILED- post once for that grouped sub-run
- verify
ok - continue until the parent request finishes
CANCELLED- write final
CANCELLEDstate tostate.json - do not post
- exit
- write final
TERMINATED- do not post
- exit
HUNG- do not post
- exit
UNKNOWN- do not post
- exit
Answer To "Do We Need An Installer README?"
Not strictly, but yes, it is useful.
Why:
- it gives a repeatable controller deployment procedure
- it separates local package design from controller installation steps
- it makes later install/reinstall safer
- it gives you a review checkpoint before anything is installed on
192.168.3.190
That is the purpose of this file.