- change ATVM status formatting to the approved Markdown-table template with SUMMARY:, HOSTS:, TIMING:, and NOTES: - document that normal status requests print locally only unless explicitly asked to send to Mattermost - document Mattermost defaults and posting rules, including only sending after full run completion - document the controller-side systemd watcher design for future automation - add the secrets migration/cleanup review doc - ignore .env.credentials.local in git and reflect the move toward using that local credentials file instead of hardcoded secrets
5.8 KiB
ATVM Mattermost Watcher Design
Purpose
Design a controller-local watcher on the ATVM Cypress machine (192.168.3.190) that monitors an ATVM automation run and posts the final run status to Mattermost only after the run has fully completed.
This watcher must continue working even if the local operator machine is offline.
Implementation Approach
Use a systemd-managed watcher on the ATVM Cypress controller.
Recommended structure:
- one watcher script that evaluates the state of a specific ATVM run
- one
systemdservice to execute the watcher - optionally one
systemdtimer for periodic polling if the watcher is not implemented as a long-running process
Preferred deployment target:
- controller host:
192.168.3.190 - ATVM automation root:
/root/cdc-e2e-cyp-12.17.4
Mattermost Destination
Use the local credential file in this workspace as the source of defaults:
/home/aw/code/cds/.env.credentials.local
Expected variables:
MATTERMOST_ATVM_WEBHOOKMATTERMOST_ATVM_CHANNEL
Run Completion Rule
The watcher must send Mattermost results only after the ATVM run has fully completed.
A run is considered fully completed only when:
- there are no active runner processes for the run
- the expected machine scope has final result artifacts
- no machine remains in
RUNNINGorNOT STARTED - final reporter artifacts confirm the run has ended
Evidence sources:
- live runner processes on
192.168.3.190 /root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs//root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/xml//root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/mochawesome/
Required Run States
The watcher must distinguish these run-level states:
COMPLETEDFAILEDCANCELLEDTERMINATEDHUNGUNKNOWNRUNNING
Definitions:
COMPLETED- the run finished normally
- all machines have final results
- no run-level failure state blocks completion
FAILED- the run finished, but one or more hosts failed
- this is still a completed run
CANCELLED- the run was intentionally cancelled through an explicit cancellation path
TERMINATED- the run was manually killed or stopped before normal completion
HUNG- the run appears stuck and does not meet completion rules within the expected policy window
UNKNOWN- the watcher cannot safely determine the true state
RUNNING- the run is still active and not yet complete
Mattermost Posting Rule
Post to Mattermost only when the run has fully completed.
Send Mattermost status for:
COMPLETEDFAILED
Do not send Mattermost status for:
CANCELLEDTERMINATEDHUNGUNKNOWNRUNNING
Important clarification:
- a completed run with failed hosts should still be posted
- a cancelled, terminated, hung, or unknown run should not be posted
Required Cancellation / Termination Handling
If a run is cancelled or terminated, the watcher must:
- detect that the run was cancelled or manually killed
- stop waiting for normal completion
- mark the run as closed without posting final Mattermost status
- prevent any later success/failure post for that same run
State Tracking Requirements
The watcher must track each monitored run by run id or build name.
For each run, keep durable state such as:
- tracked run id / build name
- controller-side watcher state
- completion marker
- cancellation / termination marker
- Mattermost posted marker
- last observed machine summary
- timestamps for first seen, last seen, closed
Duplicate-Post Prevention
The watcher must prevent duplicate Mattermost posts.
Required behavior:
- only one final post per run
- if a run is already marked as posted, do not send again
- if a run is marked
CANCELLED,TERMINATED,HUNG, orUNKNOWN, do not later convert it into a posted completion unless explicitly reset by an operator workflow
Recommended State Files
Use a durable controller-local state directory, for example:
/var/lib/atvm-run-watcher/
Possible contents:
- one state file per run id
- one posted marker per run id
- one cancellation marker per run id
- optional lock file to prevent multiple watcher instances from racing
Recommended Operator Workflow
Normal completion workflow:
- ATVM run starts.
- Watcher tracks the run id / build name.
- Watcher polls run state and artifacts.
- Run fully completes.
- Watcher builds final status summary.
- Watcher posts final status to Mattermost once.
- Watcher marks the run as posted and closed.
Cancellation / termination workflow:
- Operator stops the ATVM run.
- Watcher detects cancellation / termination, or an explicit cancellation marker is written.
- Watcher marks the run
CANCELLEDorTERMINATED. - Watcher exits cleanly without posting to Mattermost.
- Watcher prevents later duplicate or misleading final-post behavior.
Failure Semantics
Host-level failures do not suppress Mattermost posting.
If:
- the run has fully completed
- and one or more hosts failed
Then:
- final Mattermost status should still be sent
- final run-level state should be treated as completed-with-failures
Hang / Unknown Semantics
If the run cannot be safely classified as completed, failed, cancelled, or terminated:
- classify it as
HUNGorUNKNOWN - do not post to Mattermost
- require operator review
Logging Requirements
The watcher should log:
- the run id / build name being monitored
- each state transition
- posting decisions
- reasons for suppressing a Mattermost post
- duplicate-post prevention decisions
- final closed state
Summary
This watcher design must satisfy all of the following:
- run on the ATVM Cypress controller
- survive local operator machine downtime
- use
systemd - distinguish run states clearly
- send Mattermost only after full completion
- send completion results whether hosts passed or failed
- never send Mattermost for cancelled, terminated, hung, or unknown runs
- prevent duplicate or misleading posts