Update ATVM status reporting and credential handling docs
- change ATVM status formatting to the approved Markdown-table template with SUMMARY:, HOSTS:, TIMING:, and NOTES: - document that normal status requests print locally only unless explicitly asked to send to Mattermost - document Mattermost defaults and posting rules, including only sending after full run completion - document the controller-side systemd watcher design for future automation - add the secrets migration/cleanup review doc - ignore .env.credentials.local in git and reflect the move toward using that local credentials file instead of hardcoded secrets
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1 +1,2 @@
|
||||
log/
|
||||
.env.credentials.local
|
||||
|
||||
@@ -47,6 +47,7 @@ This file defines how to operate and maintain the ATVM workspace in `/home/aw/co
|
||||
- Controller IP: `192.168.3.190`
|
||||
- Controller credentials: `root / atvmcdsi2012`
|
||||
- Detailed test artifact root on controller: `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`
|
||||
- Default Mattermost status destination config: `/home/aw/code/cds/.env.credentials.local`
|
||||
- Default plugin: `--use_specified_plugin iscsi`
|
||||
- Always include `--ignore_force_shutdown` unless explicitly told not to.
|
||||
- Default config family: `gold`
|
||||
@@ -58,6 +59,8 @@ This file defines how to operate and maintain the ATVM workspace in `/home/aw/co
|
||||
- Always show exact planned ATVM commands before execution.
|
||||
- Never execute setup or automation commands that require approval until the operator explicitly approves them.
|
||||
- For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`.
|
||||
- If the operator asks for ATVM run status without mentioning Mattermost, respond locally only and do not post externally.
|
||||
- If the operator asks to send ATVM run status to Mattermost, use `MATTERMOST_ATVM_WEBHOOK` and `MATTERMOST_ATVM_CHANNEL` from `/home/aw/code/cds/.env.credentials.local` by default and send the final status only after the run has fully completed, whether the run passed or failed.
|
||||
- Treat `docs/automation/examples.md` as reference-only, not default operator intent.
|
||||
- Put reusable workflow rules in `guide.md` files.
|
||||
- Put dated lessons only in `run-learnings.md` files.
|
||||
|
||||
@@ -174,6 +174,19 @@ When asked for one VM or a VM set:
|
||||
- If monitoring was not requested, run commands and report execution success/failure and any errors.
|
||||
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
|
||||
|
||||
## Mattermost Status Posting
|
||||
- Treat a normal ATVM status request as local-only output by default.
|
||||
- When the operator asks to send ATVM automation run status to Mattermost, use the local defaults from `/home/aw/code/cds/.env.credentials.local`.
|
||||
- Default Mattermost variables:
|
||||
- `MATTERMOST_ATVM_WEBHOOK`
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
- Treat these as the default destination for ATVM automation run-status posts unless the operator explicitly overrides them.
|
||||
- Send the final ATVM run status only after the run has fully completed, regardless of whether the run passed or failed.
|
||||
- Do not send interim or in-progress ATVM run status updates to Mattermost unless the operator explicitly asks for that.
|
||||
- Use the same ATVM status layout that would be shown to the operator locally when posting to Mattermost.
|
||||
- Default status template: `/home/aw/code/cds/atvm/docs/automation/status-template.md`
|
||||
- Do not post to Mattermost unless the operator explicitly asks for the run status to be sent there.
|
||||
|
||||
## Status Reporting Format
|
||||
When the operator asks for the status of an ATVM automation run, report in this order:
|
||||
1. Heading/title using the run `build_name`.
|
||||
@@ -193,8 +206,11 @@ When the operator asks for the status of an ATVM automation run, report in this
|
||||
|
||||
Status-report expectations:
|
||||
- Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others).
|
||||
- Use `/home/aw/code/cds/atvm/docs/automation/status-template.md` as the default template for both local status output and Mattermost status posts.
|
||||
- The default ATVM status template uses Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:` and uses `NOTES:` for flat operator-facing notes.
|
||||
- Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at `192.168.3.190`, not to Cirrus project operations such as the `atvm - cypress` project.
|
||||
- Treat a status request as a request for live status by default.
|
||||
- Unless the operator explicitly asks to send the status to Mattermost, print the status only in the local terminal response.
|
||||
- Use the live automation VM state when available.
|
||||
- If no automation is currently running, fall back to the most recent historical run artifacts and logs.
|
||||
- Prefer local automation evidence in this order: active runner processes, live automation-VM files, shell history for the last launch command, then historical reporter artifacts.
|
||||
@@ -228,3 +244,4 @@ Status-report expectations:
|
||||
- Use `Notes` for extra context beyond the machine-specific same-line failure description.
|
||||
- Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
|
||||
- Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.
|
||||
- When the operator also asks to send the status to Mattermost, send this same final status output to the configured Mattermost destination only after the run has fully completed.
|
||||
|
||||
179
atvm/docs/automation/mattermost-watcher-design.md
Normal file
179
atvm/docs/automation/mattermost-watcher-design.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# ATVM Mattermost Watcher Design
|
||||
|
||||
## Purpose
|
||||
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts the final run status to Mattermost only after the run has fully completed.
|
||||
|
||||
This watcher must continue working even if the local operator machine is offline.
|
||||
|
||||
## Implementation Approach
|
||||
Use a `systemd`-managed watcher on the ATVM Cypress controller.
|
||||
|
||||
Recommended structure:
|
||||
- one watcher script that evaluates the state of a specific ATVM run
|
||||
- one `systemd` service to execute the watcher
|
||||
- optionally one `systemd` timer for periodic polling if the watcher is not implemented as a long-running process
|
||||
|
||||
Preferred deployment target:
|
||||
- controller host: `192.168.3.190`
|
||||
- ATVM automation root: `/root/cdc-e2e-cyp-12.17.4`
|
||||
|
||||
## Mattermost Destination
|
||||
Use the local credential file in this workspace as the source of defaults:
|
||||
- `/home/aw/code/cds/.env.credentials.local`
|
||||
|
||||
Expected variables:
|
||||
- `MATTERMOST_ATVM_WEBHOOK`
|
||||
- `MATTERMOST_ATVM_CHANNEL`
|
||||
|
||||
## Run Completion Rule
|
||||
The watcher must send Mattermost results only after the ATVM run has fully completed.
|
||||
|
||||
A run is considered fully completed only when:
|
||||
- there are no active runner processes for the run
|
||||
- the expected machine scope has final result artifacts
|
||||
- no machine remains in `RUNNING` or `NOT STARTED`
|
||||
- final reporter artifacts confirm the run has ended
|
||||
|
||||
Evidence sources:
|
||||
- live runner processes on `192.168.3.190`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/xml/`
|
||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/mochawesome/`
|
||||
|
||||
## Required Run States
|
||||
The watcher must distinguish these run-level states:
|
||||
- `COMPLETED`
|
||||
- `FAILED`
|
||||
- `CANCELLED`
|
||||
- `TERMINATED`
|
||||
- `HUNG`
|
||||
- `UNKNOWN`
|
||||
- `RUNNING`
|
||||
|
||||
Definitions:
|
||||
- `COMPLETED`
|
||||
- the run finished normally
|
||||
- all machines have final results
|
||||
- no run-level failure state blocks completion
|
||||
- `FAILED`
|
||||
- the run finished, but one or more hosts failed
|
||||
- this is still a completed run
|
||||
- `CANCELLED`
|
||||
- the run was intentionally cancelled through an explicit cancellation path
|
||||
- `TERMINATED`
|
||||
- the run was manually killed or stopped before normal completion
|
||||
- `HUNG`
|
||||
- the run appears stuck and does not meet completion rules within the expected policy window
|
||||
- `UNKNOWN`
|
||||
- the watcher cannot safely determine the true state
|
||||
- `RUNNING`
|
||||
- the run is still active and not yet complete
|
||||
|
||||
## Mattermost Posting Rule
|
||||
Post to Mattermost only when the run has fully completed.
|
||||
|
||||
Send Mattermost status for:
|
||||
- `COMPLETED`
|
||||
- `FAILED`
|
||||
|
||||
Do not send Mattermost status for:
|
||||
- `CANCELLED`
|
||||
- `TERMINATED`
|
||||
- `HUNG`
|
||||
- `UNKNOWN`
|
||||
- `RUNNING`
|
||||
|
||||
Important clarification:
|
||||
- a completed run with failed hosts should still be posted
|
||||
- a cancelled, terminated, hung, or unknown run should not be posted
|
||||
|
||||
## Required Cancellation / Termination Handling
|
||||
If a run is cancelled or terminated, the watcher must:
|
||||
- detect that the run was cancelled or manually killed
|
||||
- stop waiting for normal completion
|
||||
- mark the run as closed without posting final Mattermost status
|
||||
- prevent any later success/failure post for that same run
|
||||
|
||||
## State Tracking Requirements
|
||||
The watcher must track each monitored run by run id or build name.
|
||||
|
||||
For each run, keep durable state such as:
|
||||
- tracked run id / build name
|
||||
- controller-side watcher state
|
||||
- completion marker
|
||||
- cancellation / termination marker
|
||||
- Mattermost posted marker
|
||||
- last observed machine summary
|
||||
- timestamps for first seen, last seen, closed
|
||||
|
||||
## Duplicate-Post Prevention
|
||||
The watcher must prevent duplicate Mattermost posts.
|
||||
|
||||
Required behavior:
|
||||
- only one final post per run
|
||||
- if a run is already marked as posted, do not send again
|
||||
- if a run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
||||
|
||||
## Recommended State Files
|
||||
Use a durable controller-local state directory, for example:
|
||||
- `/var/lib/atvm-run-watcher/`
|
||||
|
||||
Possible contents:
|
||||
- one state file per run id
|
||||
- one posted marker per run id
|
||||
- one cancellation marker per run id
|
||||
- optional lock file to prevent multiple watcher instances from racing
|
||||
|
||||
## Recommended Operator Workflow
|
||||
Normal completion workflow:
|
||||
1. ATVM run starts.
|
||||
2. Watcher tracks the run id / build name.
|
||||
3. Watcher polls run state and artifacts.
|
||||
4. Run fully completes.
|
||||
5. Watcher builds final status summary.
|
||||
6. Watcher posts final status to Mattermost once.
|
||||
7. Watcher marks the run as posted and closed.
|
||||
|
||||
Cancellation / termination workflow:
|
||||
1. Operator stops the ATVM run.
|
||||
2. Watcher detects cancellation / termination, or an explicit cancellation marker is written.
|
||||
3. Watcher marks the run `CANCELLED` or `TERMINATED`.
|
||||
4. Watcher exits cleanly without posting to Mattermost.
|
||||
5. Watcher prevents later duplicate or misleading final-post behavior.
|
||||
|
||||
## Failure Semantics
|
||||
Host-level failures do not suppress Mattermost posting.
|
||||
|
||||
If:
|
||||
- the run has fully completed
|
||||
- and one or more hosts failed
|
||||
|
||||
Then:
|
||||
- final Mattermost status should still be sent
|
||||
- final run-level state should be treated as completed-with-failures
|
||||
|
||||
## Hang / Unknown Semantics
|
||||
If the run cannot be safely classified as completed, failed, cancelled, or terminated:
|
||||
- classify it as `HUNG` or `UNKNOWN`
|
||||
- do not post to Mattermost
|
||||
- require operator review
|
||||
|
||||
## Logging Requirements
|
||||
The watcher should log:
|
||||
- the run id / build name being monitored
|
||||
- each state transition
|
||||
- posting decisions
|
||||
- reasons for suppressing a Mattermost post
|
||||
- duplicate-post prevention decisions
|
||||
- final closed state
|
||||
|
||||
## Summary
|
||||
This watcher design must satisfy all of the following:
|
||||
- run on the ATVM Cypress controller
|
||||
- survive local operator machine downtime
|
||||
- use `systemd`
|
||||
- distinguish run states clearly
|
||||
- send Mattermost only after full completion
|
||||
- send completion results whether hosts passed or failed
|
||||
- never send Mattermost for cancelled, terminated, hung, or unknown runs
|
||||
- prevent duplicate or misleading posts
|
||||
62
atvm/docs/automation/status-template.md
Normal file
62
atvm/docs/automation/status-template.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# ATVM Status Template
|
||||
|
||||
Use this as the default ATVM automation run-status template for:
|
||||
- local status responses in the terminal
|
||||
- Mattermost status posts after a completed run
|
||||
|
||||
## Layout
|
||||
|
||||
```md
|
||||
## ATVM Run Status
|
||||
### <build_name>
|
||||
|
||||
**SUMMARY:**
|
||||
|
||||
| Metric | Value |
|
||||
|---|---:|
|
||||
| finished | <n> |
|
||||
| passed | <n> |
|
||||
| failed | <n> |
|
||||
| skipped | <n> |
|
||||
|
||||
**HOSTS:**
|
||||
|
||||
| Host | Status | Detail |
|
||||
|---|---|---|
|
||||
| <host-name> | ✅ PASS | completed |
|
||||
| <host-name> | ⚠️ FAIL | <useful failure description> |
|
||||
| <host-name> | ⏳ RUN | in progress |
|
||||
| <host-name> | ⏭️ SKIP | <skip reason> |
|
||||
|
||||
**TIMING:**
|
||||
|
||||
| Metric | Value |
|
||||
|---|---|
|
||||
| start | <start time> |
|
||||
| end | <end time or n/a> |
|
||||
| total | <total or elapsed runtime> |
|
||||
| quickest | <host> - <runtime> or n/a |
|
||||
| longest | <host> - <runtime> or n/a |
|
||||
| average | <runtime> or n/a |
|
||||
|
||||
**NOTES:**
|
||||
- <note>
|
||||
- <note>
|
||||
```
|
||||
|
||||
## Rules
|
||||
- Keep `SUMMARY:`, `HOSTS:`, `TIMING:`, and `NOTES:` in that order.
|
||||
- Use the title format:
|
||||
- `## ATVM Run Status`
|
||||
- `### <build_name>`
|
||||
- Use Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:`.
|
||||
- Use one host per row in the `HOSTS:` section.
|
||||
- For completed hosts, prefer:
|
||||
- `✅ PASS`
|
||||
- `⚠️ FAIL`
|
||||
- For in-progress or skipped hosts, use:
|
||||
- `⏳ RUN`
|
||||
- `⏭️ SKIP`
|
||||
- Keep `Detail` concise.
|
||||
- Put broader context under `NOTES:`, not in the host table.
|
||||
- Use the same template for Mattermost and local operator-visible status output.
|
||||
188
atvm/docs/workflow/secrets-migration-and-cleanup.md
Normal file
188
atvm/docs/workflow/secrets-migration-and-cleanup.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# Secrets Migration And Cleanup
|
||||
|
||||
## Purpose
|
||||
This document explains:
|
||||
- whether the workspace can be cleaned up to stop storing credentials and tokens in tracked files
|
||||
- how `.env.credentials.local` should be used
|
||||
- what has to happen to remove already-committed secrets from git history and the remote repository
|
||||
|
||||
## 1. Can the workspace be cleaned up to stop referencing raw secrets in tracked files?
|
||||
Yes.
|
||||
|
||||
The intended cleanup is:
|
||||
- remove hardcoded credentials, API tokens, webhook URLs, and similar secrets from tracked docs and files
|
||||
- replace those values with references to `/home/aw/code/cds/.env.credentials.local`
|
||||
- keep only non-secret metadata in tracked files, such as:
|
||||
- hostnames
|
||||
- IPs
|
||||
- usernames when acceptable
|
||||
- variable names
|
||||
- usage instructions
|
||||
|
||||
Examples of what tracked docs should say instead of storing raw values:
|
||||
- `Use ATVM_CONTROLLER_PASSWORD from /home/aw/code/cds/.env.credentials.local`
|
||||
- `Use VCENTER_USER and VCENTER_PASSWORD from /home/aw/code/cds/.env.credentials.local`
|
||||
- `Use MATTERMOST_ATVM_WEBHOOK from /home/aw/code/cds/.env.credentials.local`
|
||||
|
||||
Recommended scope of cleanup:
|
||||
- `atvm/inventory/accounts-and-credentials.md`
|
||||
- `atvm/inventory/infrastructure.md`
|
||||
- any other tracked docs or scripts that contain:
|
||||
- passwords
|
||||
- API tokens
|
||||
- TOTP secrets
|
||||
- webhook URLs
|
||||
- install codes or secrets that should not remain in git
|
||||
|
||||
## 2. What do I need to do for the assistant to use `.env.credentials.local`?
|
||||
The file exists on disk, but the assistant does not automatically import shell environment files unless one of the following is done:
|
||||
|
||||
### Option A: Explicitly source it in the shell session
|
||||
Example:
|
||||
|
||||
```bash
|
||||
source /home/aw/code/cds/.env.credentials.local
|
||||
```
|
||||
|
||||
This is the simplest and most reliable option for interactive terminal work.
|
||||
|
||||
### Option B: Scripts explicitly read it
|
||||
A script can do:
|
||||
|
||||
```bash
|
||||
source /home/aw/code/cds/.env.credentials.local
|
||||
```
|
||||
|
||||
before using any secret-backed variables.
|
||||
|
||||
### Option C: The workflow documentation tells the assistant to load it
|
||||
The workspace docs can instruct the assistant to use `/home/aw/code/cds/.env.credentials.local` when credentials are required, but the assistant still needs an execution path that actually loads those variables into the shell or reads them directly from the file.
|
||||
|
||||
## Practical rule
|
||||
If you want the assistant to reliably use these values during execution, the safest approach is:
|
||||
- either explicitly source the file first
|
||||
- or instruct the assistant to source it as part of the command/script it runs
|
||||
|
||||
## Important limitation
|
||||
The existence of `.env.credentials.local` does not automatically make every shell command aware of those variables.
|
||||
|
||||
The assistant needs one of these:
|
||||
- the current shell environment already contains the exported variables
|
||||
- the command explicitly sources the file
|
||||
- the script being executed explicitly sources the file
|
||||
|
||||
## 3. What do I need to do if secrets were already committed and pushed to the remote repository?
|
||||
If secrets were already committed to git history and pushed, `.gitignore` does not fix that.
|
||||
|
||||
You need to treat those secrets as exposed.
|
||||
|
||||
## Required response
|
||||
Do these in this order:
|
||||
|
||||
### Step 1: Rotate or revoke the exposed secrets
|
||||
This is the most important step.
|
||||
|
||||
Examples:
|
||||
- regenerate Mattermost webhook URLs
|
||||
- replace API tokens
|
||||
- rotate passwords
|
||||
- regenerate TOTP/shared secrets if applicable
|
||||
- replace any service registration or install tokens that should be considered exposed
|
||||
|
||||
Even if you later remove them from git history, assume they were already copied.
|
||||
|
||||
### Step 2: Remove secrets from the current tracked files
|
||||
Edit the tracked docs and scripts so they no longer contain raw secrets.
|
||||
|
||||
Replace them with:
|
||||
- references to `.env.credentials.local`
|
||||
- redacted placeholders
|
||||
- variable names
|
||||
|
||||
### Step 3: Rewrite git history to remove the secrets from all commits
|
||||
This is a history-rewrite operation.
|
||||
|
||||
Typical tools:
|
||||
- `git filter-repo` (preferred)
|
||||
- BFG Repo-Cleaner
|
||||
|
||||
High-level workflow:
|
||||
1. identify all tracked files and literal secrets that must be removed
|
||||
2. rewrite repository history to remove or replace them
|
||||
3. verify the secrets no longer exist in any commit
|
||||
4. force-push the rewritten history to the remote
|
||||
|
||||
### Step 4: Force-push the cleaned history
|
||||
After rewriting history, the remote must be updated with a force push.
|
||||
|
||||
That usually means:
|
||||
- `git push --force-with-lease origin <branch>`
|
||||
|
||||
### Step 5: Coordinate with anyone else using the repo
|
||||
Anyone with an old clone will still have the old history unless they reset or reclone.
|
||||
|
||||
They need instructions to:
|
||||
- stop using the old history
|
||||
- fetch the rewritten branch
|
||||
- hard reset or reclone as appropriate
|
||||
|
||||
## Important caution about remote cleanup
|
||||
Cleaning the git remote history does not guarantee that every copy is gone.
|
||||
|
||||
Secrets may still exist in:
|
||||
- old clones
|
||||
- forks
|
||||
- CI logs
|
||||
- code review systems
|
||||
- backups
|
||||
- screenshots or pasted chat logs
|
||||
|
||||
That is why secret rotation must happen first.
|
||||
|
||||
## Recommended cleanup policy for this workspace
|
||||
For this workspace, the correct policy should be:
|
||||
- keep real secrets only in `/home/aw/code/cds/.env.credentials.local`
|
||||
- keep that file gitignored
|
||||
- remove raw secrets from tracked docs
|
||||
- document variable names and usage instead of values
|
||||
- rotate any secrets that were ever committed
|
||||
- rewrite history if the repository should no longer retain those secret values
|
||||
|
||||
## Proposed next implementation work
|
||||
When approved, the cleanup work would likely be:
|
||||
1. inventory all tracked files containing secrets
|
||||
2. patch those files to reference `.env.credentials.local`
|
||||
3. update docs so the credential source is explicit
|
||||
4. prepare a history-rewrite plan
|
||||
5. prepare exact git commands for review before any destructive git action
|
||||
|
||||
## Git-history cleanup note
|
||||
History rewriting is disruptive and should not be done casually.
|
||||
|
||||
Before doing it, prepare:
|
||||
- the list of files and secrets to purge
|
||||
- the exact rewrite tool and command
|
||||
- the exact verification commands
|
||||
- the exact force-push command
|
||||
- the operator communication plan for other users of the repo
|
||||
|
||||
## Summary
|
||||
Answers to the three direct questions:
|
||||
|
||||
### Question 1
|
||||
Yes, the workspace can be cleaned up to stop storing secrets in tracked files and instead reference `/home/aw/code/cds/.env.credentials.local`.
|
||||
|
||||
### Question 2
|
||||
To have the assistant reliably use `.env.credentials.local`, either:
|
||||
- explicitly source it
|
||||
- or ensure the script/command being run sources it
|
||||
|
||||
The assistant does not automatically inherit its contents just because the file exists.
|
||||
|
||||
### Question 3
|
||||
If secrets were already committed and pushed:
|
||||
- rotate them first
|
||||
- remove them from current files
|
||||
- rewrite git history
|
||||
- force-push the cleaned history
|
||||
- coordinate with anyone else who has a clone
|
||||
Reference in New Issue
Block a user