- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases - document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post - add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup - update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
7.5 KiB
ATVM Mattermost Watcher Design
Purpose
Design a controller-local watcher on the ATVM Cypress machine (192.168.3.190) that monitors an ATVM automation run and posts final run status to Mattermost only after the watched scope has fully completed.
This watcher must continue working even if the local operator machine is offline.
Implementation Approach
Use a systemd-managed watcher on the ATVM Cypress controller.
Recommended structure:
- one watcher script that evaluates a specific ATVM run request
- one
systemdservice to execute the watcher - no always-on daemon
- for categorized ATVM runs, one watcher instance tracks the parent request and posts each categorized sub-run separately as those grouped runs complete
Preferred deployment target:
- controller host:
192.168.3.190 - ATVM automation root:
/root/cdc-e2e-cyp-12.17.4
Mattermost Destination
Use the local credential file in this workspace as the source of defaults:
/home/aw/code/cds/.env.credentials.local
Expected variables:
MATTERMOST_ATVM_WEBHOOKMATTERMOST_ATVM_CHANNEL
Run Completion Rule
The watcher must send Mattermost results only after the watched scope has fully completed.
A non-categorized run is considered fully completed only when:
- there are no active runner processes for the run
- the expected machine scope has final result artifacts
- no machine remains in
RUNNINGorNOT STARTED - final reporter artifacts confirm the run has ended
A categorized run must be treated differently:
--categorizesplits the request into sequential ATVM sub-runs- each categorized group is its own run/job
- the watcher must detect each grouped sub-run in order
- the watcher must wait for that grouped sub-run to complete
- then send that grouped sub-run's final Mattermost status
- then continue watching for the next grouped sub-run
- the watcher must not wait until the very end to send one single parent-only post
Evidence sources:
- live runner processes on
192.168.3.190 /root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs//root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/xml//root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/mochawesome/
Required Run States
The watcher must distinguish these run-level states:
COMPLETEDFAILEDCANCELLEDTERMINATEDHUNGUNKNOWNRUNNING
Definitions:
COMPLETED- the run finished normally
- all machines have final results
- no run-level failure state blocks completion
FAILED- the run finished, but one or more hosts failed
- this is still a completed run
CANCELLED- the run was intentionally cancelled through an explicit cancellation path
TERMINATED- the run was manually killed or stopped before normal completion
HUNG- the run appears stuck and does not meet completion rules within the expected policy window
UNKNOWN- the watcher cannot safely determine the true state
RUNNING- the run is still active and not yet complete
Mattermost Posting Rule
Post to Mattermost only when the watched scope has fully completed.
Send Mattermost status for:
COMPLETEDFAILED
Do not send Mattermost status for:
CANCELLEDTERMINATEDHUNGUNKNOWNRUNNING
Important clarification:
- a completed run with failed hosts should still be posted
- a cancelled, terminated, hung, or unknown run should not be posted
- for categorized execution, this rule applies per categorized sub-run
- one categorized group completion should produce one Mattermost post
- do not send one parent-level aggregate post in place of the per-group posts
Required Cancellation / Termination Handling
If a run is cancelled or terminated, the watcher must:
- detect that the run was cancelled or manually killed
- stop waiting for normal completion
- mark the run as closed without posting final Mattermost status
- prevent any later success/failure post for that same run
State Tracking Requirements
The watcher must track each monitored run by run id or build name.
For each run, keep durable state such as:
- tracked run id / build name
- controller-side watcher state
- completion marker
- cancellation / termination marker
- Mattermost posted marker
- last observed machine summary
- timestamps for first seen, last seen, closed
For categorized runs, keep durable state for:
- the parent request build name
- each detected categorized sub-run
- whether each categorized sub-run has already been posted
Duplicate-Post Prevention
The watcher must prevent duplicate Mattermost posts.
Required behavior:
- for non-categorized execution, only one final post per run
- for categorized execution, only one final post per categorized sub-run
- if a watched scope is already marked as posted, do not send again
- if a run or categorized sub-run is marked
CANCELLED,TERMINATED,HUNG, orUNKNOWN, do not later convert it into a posted completion unless explicitly reset by an operator workflow
Recommended State Files
Use a durable controller-local state directory, for example:
/var/lib/atvm-run-watcher/
Possible contents:
- one parent state file per requested build name
- one posted marker per non-categorized run
- one subdirectory per categorized sub-run with its own state and posted marker
- one cancellation marker per parent run id
- optional lock file to prevent multiple watcher instances from racing
Recommended Operator Workflow
Normal completion workflow:
- ATVM run starts.
- Watcher tracks the requested build name.
- Watcher polls run state and artifacts.
- For non-categorized execution:
- wait for the run to fully complete
- build one final status summary
- post one final Mattermost status
- For categorized execution:
- detect each grouped sub-run in order
- wait for that grouped sub-run to fully complete
- build that grouped sub-run's final status summary
- post that grouped sub-run's final Mattermost status
- continue to the next grouped sub-run
- Watcher marks the completed watched scope as posted and closed.
Cancellation / termination workflow:
- Operator stops the ATVM run.
- Watcher detects cancellation / termination, or an explicit cancellation marker is written.
- Watcher marks the run
CANCELLEDorTERMINATED. - Watcher exits cleanly without posting to Mattermost.
- Watcher prevents later duplicate or misleading final-post behavior.
Failure Semantics
Host-level failures do not suppress Mattermost posting.
If:
- the run has fully completed
- and one or more hosts failed
Then:
- final Mattermost status should still be sent
- final run-level state should be treated as completed-with-failures
Hang / Unknown Semantics
If the run cannot be safely classified as completed, failed, cancelled, or terminated:
- classify it as
HUNGorUNKNOWN - do not post to Mattermost
- require operator review
Logging Requirements
The watcher should log:
- the run id / build name being monitored
- each state transition
- posting decisions
- reasons for suppressing a Mattermost post
- duplicate-post prevention decisions
- final closed state
Summary
This watcher design must satisfy all of the following:
- run on the ATVM Cypress controller
- survive local operator machine downtime
- use
systemd - distinguish run states clearly
- send Mattermost only after full completion of the watched scope
- send completion results whether hosts passed or failed
- never send Mattermost for cancelled, terminated, hung, or unknown runs
- prevent duplicate or misleading posts
- treat
--categorizeas sequential ATVM sub-runs, not as one parent run with internal phases - send one Mattermost post per completed categorized sub-run