Update ATVM watcher for categorized sub-run posting
- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases - document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post - add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup - update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# ATVM Mattermost Watcher Design
|
||||
|
||||
## Purpose
|
||||
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts the final run status to Mattermost only after the run has fully completed.
|
||||
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts final run status to Mattermost only after the watched scope has fully completed.
|
||||
|
||||
This watcher must continue working even if the local operator machine is offline.
|
||||
|
||||
@@ -9,9 +9,10 @@ This watcher must continue working even if the local operator machine is offline
|
||||
Use a `systemd`-managed watcher on the ATVM Cypress controller.
|
||||
|
||||
Recommended structure:
|
||||
- one watcher script that evaluates the state of a specific ATVM run
|
||||
- one watcher script that evaluates a specific ATVM run request
|
||||
- one `systemd` service to execute the watcher
|
||||
- optionally one `systemd` timer for periodic polling if the watcher is not implemented as a long-running process
|
||||
- no always-on daemon
|
||||
- for categorized ATVM runs, one watcher instance tracks the parent request and posts each categorized sub-run separately as those grouped runs complete
|
||||
|
||||
Preferred deployment target:
|
||||
- controller host: `192.168.3.190`
|
||||
@@ -26,14 +27,23 @@ Expected variables:
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
|
||||
## Run Completion Rule
|
||||
The watcher must send Mattermost results only after the ATVM run has fully completed.
|
||||
The watcher must send Mattermost results only after the watched scope has fully completed.
|
||||
|
||||
A run is considered fully completed only when:
|
||||
A non-categorized run is considered fully completed only when:
|
||||
- there are no active runner processes for the run
|
||||
- the expected machine scope has final result artifacts
|
||||
- no machine remains in `RUNNING` or `NOT STARTED`
|
||||
- final reporter artifacts confirm the run has ended
|
||||
|
||||
A categorized run must be treated differently:
|
||||
- `--categorize` splits the request into sequential ATVM sub-runs
|
||||
- each categorized group is its own run/job
|
||||
- the watcher must detect each grouped sub-run in order
|
||||
- the watcher must wait for that grouped sub-run to complete
|
||||
- then send that grouped sub-run's final Mattermost status
|
||||
- then continue watching for the next grouped sub-run
|
||||
- the watcher must not wait until the very end to send one single parent-only post
|
||||
|
||||
Evidence sources:
|
||||
- live runner processes on `192.168.3.190`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/`
|
||||
@@ -70,7 +80,7 @@ Definitions:
|
||||
- the run is still active and not yet complete
|
||||
|
||||
## Mattermost Posting Rule
|
||||
Post to Mattermost only when the run has fully completed.
|
||||
Post to Mattermost only when the watched scope has fully completed.
|
||||
|
||||
Send Mattermost status for:
|
||||
- `COMPLETED`
|
||||
@@ -86,6 +96,9 @@ Do not send Mattermost status for:
|
||||
Important clarification:
|
||||
- a completed run with failed hosts should still be posted
|
||||
- a cancelled, terminated, hung, or unknown run should not be posted
|
||||
- for categorized execution, this rule applies per categorized sub-run
|
||||
- one categorized group completion should produce one Mattermost post
|
||||
- do not send one parent-level aggregate post in place of the per-group posts
|
||||
|
||||
## Required Cancellation / Termination Handling
|
||||
If a run is cancelled or terminated, the watcher must:
|
||||
@@ -106,33 +119,47 @@ For each run, keep durable state such as:
|
||||
- last observed machine summary
|
||||
- timestamps for first seen, last seen, closed
|
||||
|
||||
For categorized runs, keep durable state for:
|
||||
- the parent request build name
|
||||
- each detected categorized sub-run
|
||||
- whether each categorized sub-run has already been posted
|
||||
|
||||
## Duplicate-Post Prevention
|
||||
The watcher must prevent duplicate Mattermost posts.
|
||||
|
||||
Required behavior:
|
||||
- only one final post per run
|
||||
- if a run is already marked as posted, do not send again
|
||||
- if a run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
||||
- for non-categorized execution, only one final post per run
|
||||
- for categorized execution, only one final post per categorized sub-run
|
||||
- if a watched scope is already marked as posted, do not send again
|
||||
- if a run or categorized sub-run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
||||
|
||||
## Recommended State Files
|
||||
Use a durable controller-local state directory, for example:
|
||||
- `/var/lib/atvm-run-watcher/`
|
||||
|
||||
Possible contents:
|
||||
- one state file per run id
|
||||
- one posted marker per run id
|
||||
- one cancellation marker per run id
|
||||
- one parent state file per requested build name
|
||||
- one posted marker per non-categorized run
|
||||
- one subdirectory per categorized sub-run with its own state and posted marker
|
||||
- one cancellation marker per parent run id
|
||||
- optional lock file to prevent multiple watcher instances from racing
|
||||
|
||||
## Recommended Operator Workflow
|
||||
Normal completion workflow:
|
||||
1. ATVM run starts.
|
||||
2. Watcher tracks the run id / build name.
|
||||
2. Watcher tracks the requested build name.
|
||||
3. Watcher polls run state and artifacts.
|
||||
4. Run fully completes.
|
||||
5. Watcher builds final status summary.
|
||||
6. Watcher posts final status to Mattermost once.
|
||||
7. Watcher marks the run as posted and closed.
|
||||
4. For non-categorized execution:
|
||||
- wait for the run to fully complete
|
||||
- build one final status summary
|
||||
- post one final Mattermost status
|
||||
5. For categorized execution:
|
||||
- detect each grouped sub-run in order
|
||||
- wait for that grouped sub-run to fully complete
|
||||
- build that grouped sub-run's final status summary
|
||||
- post that grouped sub-run's final Mattermost status
|
||||
- continue to the next grouped sub-run
|
||||
6. Watcher marks the completed watched scope as posted and closed.
|
||||
|
||||
Cancellation / termination workflow:
|
||||
1. Operator stops the ATVM run.
|
||||
@@ -173,7 +200,9 @@ This watcher design must satisfy all of the following:
|
||||
- survive local operator machine downtime
|
||||
- use `systemd`
|
||||
- distinguish run states clearly
|
||||
- send Mattermost only after full completion
|
||||
- send Mattermost only after full completion of the watched scope
|
||||
- send completion results whether hosts passed or failed
|
||||
- never send Mattermost for cancelled, terminated, hung, or unknown runs
|
||||
- prevent duplicate or misleading posts
|
||||
- treat `--categorize` as sequential ATVM sub-runs, not as one parent run with internal phases
|
||||
- send one Mattermost post per completed categorized sub-run
|
||||
|
||||
Reference in New Issue
Block a user