Update ATVM status reporting and credential handling docs
- change ATVM status formatting to the approved Markdown-table template with SUMMARY:, HOSTS:, TIMING:, and NOTES: - document that normal status requests print locally only unless explicitly asked to send to Mattermost - document Mattermost defaults and posting rules, including only sending after full run completion - document the controller-side systemd watcher design for future automation - add the secrets migration/cleanup review doc - ignore .env.credentials.local in git and reflect the move toward using that local credentials file instead of hardcoded secrets
This commit is contained in:
@@ -174,6 +174,19 @@ When asked for one VM or a VM set:
|
||||
- If monitoring was not requested, run commands and report execution success/failure and any errors.
|
||||
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
|
||||
|
||||
## Mattermost Status Posting
|
||||
- Treat a normal ATVM status request as local-only output by default.
|
||||
- When the operator asks to send ATVM automation run status to Mattermost, use the local defaults from `/home/aw/code/cds/.env.credentials.local`.
|
||||
- Default Mattermost variables:
|
||||
- `MATTERMOST_ATVM_WEBHOOK`
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
- Treat these as the default destination for ATVM automation run-status posts unless the operator explicitly overrides them.
|
||||
- Send the final ATVM run status only after the run has fully completed, regardless of whether the run passed or failed.
|
||||
- Do not send interim or in-progress ATVM run status updates to Mattermost unless the operator explicitly asks for that.
|
||||
- Use the same ATVM status layout that would be shown to the operator locally when posting to Mattermost.
|
||||
- Default status template: `/home/aw/code/cds/atvm/docs/automation/status-template.md`
|
||||
- Do not post to Mattermost unless the operator explicitly asks for the run status to be sent there.
|
||||
|
||||
## Status Reporting Format
|
||||
When the operator asks for the status of an ATVM automation run, report in this order:
|
||||
1. Heading/title using the run `build_name`.
|
||||
@@ -193,8 +206,11 @@ When the operator asks for the status of an ATVM automation run, report in this
|
||||
|
||||
Status-report expectations:
|
||||
- Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others).
|
||||
- Use `/home/aw/code/cds/atvm/docs/automation/status-template.md` as the default template for both local status output and Mattermost status posts.
|
||||
- The default ATVM status template uses Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:` and uses `NOTES:` for flat operator-facing notes.
|
||||
- Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at `192.168.3.190`, not to Cirrus project operations such as the `atvm - cypress` project.
|
||||
- Treat a status request as a request for live status by default.
|
||||
- Unless the operator explicitly asks to send the status to Mattermost, print the status only in the local terminal response.
|
||||
- Use the live automation VM state when available.
|
||||
- If no automation is currently running, fall back to the most recent historical run artifacts and logs.
|
||||
- Prefer local automation evidence in this order: active runner processes, live automation-VM files, shell history for the last launch command, then historical reporter artifacts.
|
||||
@@ -228,3 +244,4 @@ Status-report expectations:
|
||||
- Use `Notes` for extra context beyond the machine-specific same-line failure description.
|
||||
- Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
|
||||
- Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.
|
||||
- When the operator also asks to send the status to Mattermost, send this same final status output to the configured Mattermost destination only after the run has fully completed.
|
||||
|
||||
179
atvm/docs/automation/mattermost-watcher-design.md
Normal file
179
atvm/docs/automation/mattermost-watcher-design.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# ATVM Mattermost Watcher Design
|
||||
|
||||
## Purpose
|
||||
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts the final run status to Mattermost only after the run has fully completed.
|
||||
|
||||
This watcher must continue working even if the local operator machine is offline.
|
||||
|
||||
## Implementation Approach
|
||||
Use a `systemd`-managed watcher on the ATVM Cypress controller.
|
||||
|
||||
Recommended structure:
|
||||
- one watcher script that evaluates the state of a specific ATVM run
|
||||
- one `systemd` service to execute the watcher
|
||||
- optionally one `systemd` timer for periodic polling if the watcher is not implemented as a long-running process
|
||||
|
||||
Preferred deployment target:
|
||||
- controller host: `192.168.3.190`
|
||||
- ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
|
||||
|
||||
## Mattermost Destination
|
||||
Use the local credential file in this workspace as the source of defaults:
|
||||
- `/home/aw/code/cds/.env.credentials.local`
|
||||
|
||||
Expected variables:
|
||||
- `MATTERMOST_ATVM_WEBHOOK`
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
|
||||
## Run Completion Rule
|
||||
The watcher must send Mattermost results only after the ATVM run has fully completed.
|
||||
|
||||
A run is considered fully completed only when:
|
||||
- there are no active runner processes for the run
|
||||
- the expected machine scope has final result artifacts
|
||||
- no machine remains in `RUNNING` or `NOT STARTED`
|
||||
- final reporter artifacts confirm the run has ended
|
||||
|
||||
Evidence sources:
|
||||
- live runner processes on `192.168.3.190`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/xml/`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/mochawesome/`
|
||||
|
||||
## Required Run States
|
||||
The watcher must distinguish these run-level states:
|
||||
- `COMPLETED`
|
||||
- `FAILED`
|
||||
- `CANCELLED`
|
||||
- `TERMINATED`
|
||||
- `HUNG`
|
||||
- `UNKNOWN`
|
||||
- `RUNNING`
|
||||
|
||||
Definitions:
|
||||
- `COMPLETED`
|
||||
- the run finished normally
|
||||
- all machines have final results
|
||||
- no run-level failure state blocks completion
|
||||
- `FAILED`
|
||||
- the run finished, but one or more hosts failed
|
||||
- this is still a completed run
|
||||
- `CANCELLED`
|
||||
- the run was intentionally cancelled through an explicit cancellation path
|
||||
- `TERMINATED`
|
||||
- the run was manually killed or stopped before normal completion
|
||||
- `HUNG`
|
||||
- the run appears stuck and does not meet completion rules within the expected policy window
|
||||
- `UNKNOWN`
|
||||
- the watcher cannot safely determine the true state
|
||||
- `RUNNING`
|
||||
- the run is still active and not yet complete
|
||||
|
||||
## Mattermost Posting Rule
|
||||
Post to Mattermost only when the run has fully completed.
|
||||
|
||||
Send Mattermost status for:
|
||||
- `COMPLETED`
|
||||
- `FAILED`
|
||||
|
||||
Do not send Mattermost status for:
|
||||
- `CANCELLED`
|
||||
- `TERMINATED`
|
||||
- `HUNG`
|
||||
- `UNKNOWN`
|
||||
- `RUNNING`
|
||||
|
||||
Important clarification:
|
||||
- a completed run with failed hosts should still be posted
|
||||
- a cancelled, terminated, hung, or unknown run should not be posted
|
||||
|
||||
## Required Cancellation / Termination Handling
|
||||
If a run is cancelled or terminated, the watcher must:
|
||||
- detect that the run was cancelled or manually killed
|
||||
- stop waiting for normal completion
|
||||
- mark the run as closed without posting final Mattermost status
|
||||
- prevent any later success/failure post for that same run
|
||||
|
||||
## State Tracking Requirements
|
||||
The watcher must track each monitored run by run id or build name.
|
||||
|
||||
For each run, keep durable state such as:
|
||||
- tracked run id / build name
|
||||
- controller-side watcher state
|
||||
- completion marker
|
||||
- cancellation / termination marker
|
||||
- Mattermost posted marker
|
||||
- last observed machine summary
|
||||
- timestamps for first seen, last seen, closed
|
||||
|
||||
## Duplicate-Post Prevention
|
||||
The watcher must prevent duplicate Mattermost posts.
|
||||
|
||||
Required behavior:
|
||||
- only one final post per run
|
||||
- if a run is already marked as posted, do not send again
|
||||
- if a run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
||||
|
||||
## Recommended State Files
|
||||
Use a durable controller-local state directory, for example:
|
||||
- `/var/lib/atvm-run-watcher/`
|
||||
|
||||
Possible contents:
|
||||
- one state file per run id
|
||||
- one posted marker per run id
|
||||
- one cancellation marker per run id
|
||||
- optional lock file to prevent multiple watcher instances from racing
|
||||
|
||||
## Recommended Operator Workflow
|
||||
Normal completion workflow:
|
||||
1. ATVM run starts.
|
||||
2. Watcher tracks the run id / build name.
|
||||
3. Watcher polls run state and artifacts.
|
||||
4. Run fully completes.
|
||||
5. Watcher builds final status summary.
|
||||
6. Watcher posts final status to Mattermost once.
|
||||
7. Watcher marks the run as posted and closed.
|
||||
|
||||
Cancellation / termination workflow:
|
||||
1. Operator stops the ATVM run.
|
||||
2. Watcher detects cancellation / termination, or an explicit cancellation marker is written.
|
||||
3. Watcher marks the run `CANCELLED` or `TERMINATED`.
|
||||
4. Watcher exits cleanly without posting to Mattermost.
|
||||
5. Watcher prevents later duplicate or misleading final-post behavior.
|
||||
|
||||
## Failure Semantics
|
||||
Host-level failures do not suppress Mattermost posting.
|
||||
|
||||
If:
|
||||
- the run has fully completed
|
||||
- and one or more hosts failed
|
||||
|
||||
Then:
|
||||
- final Mattermost status should still be sent
|
||||
- final run-level state should be treated as completed-with-failures
|
||||
|
||||
## Hang / Unknown Semantics
|
||||
If the run cannot be safely classified as completed, failed, cancelled, or terminated:
|
||||
- classify it as `HUNG` or `UNKNOWN`
|
||||
- do not post to Mattermost
|
||||
- require operator review
|
||||
|
||||
## Logging Requirements
|
||||
The watcher should log:
|
||||
- the run id / build name being monitored
|
||||
- each state transition
|
||||
- posting decisions
|
||||
- reasons for suppressing a Mattermost post
|
||||
- duplicate-post prevention decisions
|
||||
- final closed state
|
||||
|
||||
## Summary
|
||||
This watcher design must satisfy all of the following:
|
||||
- run on the ATVM Cypress controller
|
||||
- survive local operator machine downtime
|
||||
- use `systemd`
|
||||
- distinguish run states clearly
|
||||
- send Mattermost only after full completion
|
||||
- send completion results whether hosts passed or failed
|
||||
- never send Mattermost for cancelled, terminated, hung, or unknown runs
|
||||
- prevent duplicate or misleading posts
|
||||
62
atvm/docs/automation/status-template.md
Normal file
62
atvm/docs/automation/status-template.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# ATVM Status Template
|
||||
|
||||
Use this as the default ATVM automation run-status template for:
|
||||
- local status responses in the terminal
|
||||
- Mattermost status posts after a completed run
|
||||
|
||||
## Layout
|
||||
|
||||
```md
|
||||
## ATVM Run Status
|
||||
### <build_name>
|
||||
|
||||
**SUMMARY:**
|
||||
|
||||
| Metric | Value |
|
||||
|---|---:|
|
||||
| finished | <n> |
|
||||
| passed | <n> |
|
||||
| failed | <n> |
|
||||
| skipped | <n> |
|
||||
|
||||
**HOSTS:**
|
||||
|
||||
| Host | Status | Detail |
|
||||
|---|---|---|
|
||||
| <host-name> | ✅ PASS | completed |
|
||||
| <host-name> | ⚠️ FAIL | <useful failure description> |
|
||||
| <host-name> | ⏳ RUN | in progress |
|
||||
| <host-name> | ⏭️ SKIP | <skip reason> |
|
||||
|
||||
**TIMING:**
|
||||
|
||||
| Metric | Value |
|
||||
|---|---|
|
||||
| start | <start time> |
|
||||
| end | <end time or n/a> |
|
||||
| total | <total or elapsed runtime> |
|
||||
| quickest | <host> - <runtime> or n/a |
|
||||
| longest | <host> - <runtime> or n/a |
|
||||
| average | <runtime> or n/a |
|
||||
|
||||
**NOTES:**
|
||||
- <note>
|
||||
- <note>
|
||||
```
|
||||
|
||||
## Rules
|
||||
- Keep `SUMMARY:`, `HOSTS:`, `TIMING:`, and `NOTES:` in that order.
|
||||
- Use the title format:
|
||||
- `## ATVM Run Status`
|
||||
- `### <build_name>`
|
||||
- Use Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:`.
|
||||
- Use one host per row in the `HOSTS:` section.
|
||||
- For completed hosts, prefer:
|
||||
- `✅ PASS`
|
||||
- `⚠️ FAIL`
|
||||
- For in-progress or skipped hosts, use:
|
||||
- `⏳ RUN`
|
||||
- `⏭️ SKIP`
|
||||
- Keep `Detail` concise.
|
||||
- Put broader context under `NOTES:`, not in the host table.
|
||||
- Use the same template for Mattermost and local operator-visible status output.
|
||||
Reference in New Issue
Block a user