Reorganize ATVM workspace into scripts, docs, inventory, and archive

Restructure the ATVM folder to separate executable scripts from workflow documentation and long-form environment reference material. Move setup and automation scripts into scripts/, move setup and automation guides into docs/, add top-level README and workflow conventions, and organize durable environment details into inventory/ while preserving the original long-form ATVM notes under archive/imported-notes/. Update internal documentation paths to match the new layout and remove the archived Zone.Identifier metadata file.
2026-03-21 20:39:23 -04:00
parent 08b2ab3104
commit 274b920b40
17 changed files with 332 additions and 191 deletions
--- a/atvm/docs/automation/examples.md
+++ b/atvm/docs/automation/examples.md
@@ -0,0 +1,85 @@
+## Examples
+
+### E2E: Pure iscsi+fc with specific VMs
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --test_partition --integration_type pure --use_specified_plugin both --specify_vms atvm3-ubuntu18.04 atvm109-w2k12R2; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-e2e-pure-plugin
+```
+
+### E2E: Infinibox fc with specific VMs
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --test_partition --integration_type infinibox --use_specified_plugin fc --specify_vms atvm51-redhat6.10 atvm110-w2k16; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-e2e-infinibox-plugin
+```
+
+### E2E: Regular cutover
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --test_partition --integration_type pure --use_specified_plugin fc --specify_vms atvm93-oracle7.9 atvm111-w2k19 --regular_cutover; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-e2e-regular-cutover
+```
+
+### Reboot test
+```bash
+python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm37-rocky8.8 atvm112-w2k22 --wait_for_power_on 120; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-reboot
+```
+
+### SystemOS test
+```bash
+python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --specify_vms atvm118-oracle9.3 atvm145-w2k25; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-systemOS
+```
+
+### MigrateOPS test
+```bash
+python3 cmc-templates.py --template cmc-migrateops --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm139-redhat9.5 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-migrateOPS
+```
+
+### Compute MigrateOPS: vmware
+```bash
+python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --vm_platforms vmware --test_partition --specify_vms atvm138-oracle9.4-opt atvm112-w2k22 --set_static_ip_dest; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-computeMigrateOPS-vmware
+```
+
+### Compute MigrateOPS: ovirt
+```bash
+python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --vm_platforms ovirt --test_partition --specify_vms atvm124-redhat8.8 atvm111-w2k19 --set_static_ip_dest; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-computeMigrateOPS-ovirt
+```
+
+### Group consistency
+```bash
+python3 cmc-templates.py --template cmc-group-consistency --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm4-ubuntu20.04 atvm112-w2k22 --enable_uuid; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-consistentyGroup
+```
+
+### H2H same platform
+```bash
+python3 cmc-templates.py --template cmc-h2h-same-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm38-rocky9.0 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-h2hSamePlatform
+```
+
+### H2H different platform
+```bash
+python3 cmc-templates.py --template cmc-h2h-diff-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm65-redhat8.3 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name nightly-h2hDifferentPlatform
+```
+
+### Randomized reboot sanity
+```bash
+python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --test_partition --integration_type pure --use_specified_plugin fc --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 --wait_for_power_on 120; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name sanity-reboot-iscsi
+```
+
+### Randomized e2e sanity
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --test_partition --integration_type pure --use_specified_plugin both --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name sanity-e2e
+```
+
+### Randomized systemOS sanity
+```bash
+python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config-gold.ts --randomize 1 --exclude_partial_match suse15.0 fedora34 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name sanity-systemOS
+```
--- a/atvm/docs/automation/guide.md
+++ b/atvm/docs/automation/guide.md
@@ -0,0 +1,216 @@
+# Run ATVM Automation Guide
+
+This file is guide-only documentation for operating ATVM CMC automation.
+Do not put specific run examples here.
+For reusable command examples and common option combinations, use `examples.md`.
+Treat `examples.md` as reference-only.
+Do not assume the operator wants the extra options shown in examples unless they explicitly request them.
+
+## Purpose
+Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.
+
+## ATVM Cypress Automation Controller Client
+- Hostname: `atvm-cypres-vm-1`
+- IP: `192.168.3.190`
+- Credentials: `root / atvmcdsi2012`
+
+## ATVM Target Host Default
+- Treat `192.168.3.191` as the default ATVM target host reference.
+- For SSH to `192.168.3.191`, ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`.
+- For SSH to `192.168.3.191`, use default credentials `root / cdsi2012` unless the operator explicitly overrides them.
+
+## Operating Constraints
+- Run only scripts/commands explicitly requested.
+- Do not make manual system configuration changes on the client.
+- Do not edit client files unless explicitly requested.
+
+## Operator Preferences
+- Do not include Gold Disk identifiers in `--build_name`.
+- `--build_name` must not contain spaces; use `-` between words.
+- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
+- Always include `--ignore_force_shutdown` on `cmc-templates.py` commands unless the operator explicitly asks not to.
+- Default to `--use_specified_plugin iscsi` unless the operator explicitly requests a different plugin.
+- Before preparing a new run, always check whether automation is already running.
+- Always report whether automation is currently running.
+- If running, ask whether to terminate; terminate only with explicit approval.
+- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
+- Before any run, always show exact planned command(s) exactly as they will be executed and wait for explicit approval.
+- Never execute `cmc-templates.py`, `run-sorry-cypress.py`, or any other ATVM run command until the operator explicitly approves the displayed command(s).
+- Approval is required even for preparation-only steps such as template generation.
+- If the operator changes any part of the request after commands are displayed, rebuild the commands, show the updated commands, and wait for fresh approval before executing anything.
+- Execute only after explicit approval (for example `approve`).
+- After execution, report immediate success/failure only.
+- Do not actively monitor completion unless explicitly requested.
+- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
+- Report command errors immediately.
+- `sshpass` may be used where password-based SSH automation is required.
+
+## Core Scripts
+- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
+- Test execution: `./run-sorry-cypress.py`
+
+Typical sequence:
+1. Build the exact `cmc-templates.py` and `run-sorry-cypress.py` commands for the request.
+2. Show those exact commands to the operator.
+3. Wait for explicit approval.
+4. Run `cmc-templates.py` with the approved options.
+5. Wait for `cmc-templates.py` to fully finish and confirm success.
+6. Run `run-sorry-cypress.py` with the matching approved config and build name.
+
+## Config File / Gold Disk Mapping
+- `cypress.atvm-config-gold.ts` -> Gold Disk 1
+- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
+- Additional numbered config variants map to corresponding Gold Disks.
+- Do not default to `cypress.atvm-config.ts`.
+- Unless the operator explicitly requests another config, use a config file with `gold` in the filename.
+- If the operator-specified config file is missing, stop immediately and report the missing file.
+- Do not search for substitute ATVM config files and do not switch to another config unless the operator explicitly instructs it.
+
+## Available Templates
+- `cmc-e2e`
+- `cmc-group-consistency`
+- `cmc-h2h-diff-platf`
+- `cmc-h2h-same-platf`
+- `cmc-migrateops`
+- `cmc-migrateops-compute-migration`
+- `cmc-reboot`
+- `cmc-systemOS`
+
+## Command Pattern
+```bash
+python3 cmc-templates.py --template <template> --ignore_force_shutdown --config_file_path ./<config-file> --use_specified_plugin iscsi [template options or explicit plugin override...]; \
+python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces> [--categorize]
+```
+
+## Examples Reference
+- Commonly used command examples: `examples.md`
+- Keep this guide focused on run-control rules and workflow constraints.
+- Use examples as reference material only, not as default intent for new operator requests.
+- Keep `examples.md` limited to reusable example commands; keep workflow rules, defaults, blacklist policy, and reporting rules in this guide or `run-learnings.md`.
+
+## Example Option Patterns (Guide-Only)
+- Distro-scoped VM selection:
+  - `--containsVm redhat`
+  - `--containsVm redhat9`
+- Explicit VM selection:
+  - `--specify_vms <vm1> <vm2> ...`
+- Compute migrateops platform:
+  - `--vm_platforms vmware|ovirt|openshift|proxmox`
+
+## Blacklisted Machines
+Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.
+
+Permanently blacklisted because CMC cannot compile:
+- `atvm6-centos6.0`
+- `atvm41-redhat6.0`
+- `atvm73-oracle6.0`
+
+Temporarily blacklisted because the run crashes when creating a migration session:
+- `atvm144-suse15.0`
+
+Temporarily blacklisted while support requests are waiting:
+- `atvm113-debian9.0.0`
+- `atvm115-debian9.1.0`
+- `atvm116-debian9.2.0`
+
+Temporarily blacklisted because re-creation might be needed:
+- `atvm156-debian9.3.0`
+
+Preferred exclude list:
+- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0`
+
+## Running-Automation Check (Mandatory)
+Before any new automation request:
+1. SSH to `root@192.168.3.190`.
+2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
+3. Report:
+   - `Running` with process details, or
+   - `Not running`.
+4. If `Running`, ask operator whether to terminate.
+5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
+6. If termination is not approved, do not start a new run.
+
+## Approval Workflow (Mandatory)
+1. Build exact command(s) for the request.
+2. Present them verbatim as planned commands before running anything.
+3. Wait for explicit approval.
+4. Run only approved command(s), no extra options and no silent substitutions.
+5. When both template generation and the Cypress runner are requested, run them sequentially, not in parallel.
+6. Do not launch `run-sorry-cypress.py` until `cmc-templates.py` has exited successfully and finished updating the intended config/spec files.
+7. Treat displayed commands as a review gate: do not execute either command until the operator has had a chance to review them and explicitly approve.
+8. If the operator asks to change plugin, config, filters, build name, Gold Disk, or scope after commands are shown, discard the old plan, show the revised commands, and wait for new approval.
+9. If monitoring was not requested, report immediate success/failure for each command.
+10. If monitoring was requested, keep monitoring until completion and report final outcome.
+
+## Requested Test Style
+When asked for one VM or a VM set:
+- choose requested template/options,
+- choose correct config file for intended Gold Disk,
+- default to a config filename containing `gold` unless the operator explicitly says otherwise,
+- always include `--ignore_force_shutdown` on the template-generation command unless the operator explicitly overrides that default,
+- default to `--use_specified_plugin iscsi` unless the operator explicitly requests another plugin or the template does not use plugin selection,
+- use a descriptive `--build_name` without Gold Disk IDs.
+
+## Update Rule
+- After each run, update this guide only for workflow/rule/default changes.
+- Update `examples.md` for reusable command/option examples.
+- Add run-specific learnings only to `run-learnings.md` when the run produced new information.
+
+## Monitoring Policy
+- Monitor only when the operator explicitly asks to monitor.
+- If monitoring was not requested, run commands and report execution success/failure and any errors.
+- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
+
+## Status Reporting Format
+When the operator asks for the status of an ATVM automation run, report in this order:
+1. Heading/title using the run `build_name`.
+2. Completed machines with machine name first and status second for each machine.
+3. Notes.
+4. Skipped machines with reason.
+5. Remaining machines still to run.
+6. Summary counts for finished, passed, failed, and skipped machines.
+7. Timing details:
+   - start time
+   - end time if complete
+   - total run time if complete, or elapsed run time if still running
+   - quickest completed test runtime
+   - longest completed test runtime
+   - average completed test runtime
+8. Estimated completion time.
+
+Status-report expectations:
+- Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others).
+- Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at `192.168.3.190`, not to Cirrus project operations such as the `atvm - cypress` project.
+- Treat a status request as a request for live status by default.
+- Use the live automation VM state when available.
+- If no automation is currently running, fall back to the most recent historical run artifacts and logs.
+- Prefer local automation evidence in this order: active runner processes, live automation-VM files, shell history for the last launch command, then historical reporter artifacts.
+- Derive the heading/title from the run `build_name` when available.
+- Format every machine entry as `machine-name - STATUS`.
+- Put each machine on its own line; never combine multiple machines into one paragraph or comma-separated line.
+- Use a separate `Notes` section for failure reasons, anomalies, or operator-relevant context rather than cramming those details into the completed-machine list.
+- For categorized runs, reconstruct the whole run across all category batches; do not treat the current live category batch as the full run scope.
+- For categorized runs with no active automation, reconstruct the status from the full historical run across all category batches, not only the most recent category batch.
+- Always report the status of the entire requested run, even when the runner split execution into multiple category batches or cloud sub-runs.
+- Derive completed-machine status from completed spec results already written during the same run.
+- Parse all same-run `test-result-*.xml` files, not only machine-named `test-result-atvm*.xml` files.
+- When XML filenames are hash-named, extract the machine name from XML contents such as `testsuite file=`, `testsuite name=`, or `testcase name=`.
+- Ignore `check-xml-files.ts` XML outputs when counting machine completion because they are bookkeeping steps, not machine runs.
+- When multiple same-run XML files exist for one machine, use the most recently written XML for that machine.
+- Include the run start time in every status response when it can be derived from the run log.
+- If the run is complete, include the end time and total run time.
+- If the run is still active, include the elapsed run time so far.
+- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
+- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
+- For skipped machines, include the reason category:
+  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
+  - `BLACKLISTED: SUPPORT REQUEST - WAITING`
+  - `BLACKLISTED: RE-CREATE MIGHT BE NEEDED`
+  - `BLACKLISTED: RE-CREATE NEEDED`
+- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
+- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
+- If no failures are present in completed spec results, report those completed machines as `PASS`.
+- If a completed spec result shows a failure, report that machine as `FAIL` in the completed list and append a longer same-line failure description when the extra detail is useful to the operator.
+- Use `Notes` for extra context beyond the machine-specific same-line failure description.
+- Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
+- Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.
--- a/atvm/docs/automation/run-learnings.md
+++ b/atvm/docs/automation/run-learnings.md
@@ -0,0 +1,176 @@
+# Run ATVM Automation Runs
+
+This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.
+
+## Entry Rule
+- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
+- Do not add routine runs with no new learning.
+
+## Current State
+- No run-learning entries recorded yet from `guide.md` source material.
+
+## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
+- Request:
+  - template: `cmc-e2e`
+  - filter: `--containsVm redhat9.7`
+  - integration: `--integration_type pure`
+  - plugin: `--use_specified_plugin fc`
+- Observed result:
+  - Cypress spec execution passed (`1` test, `1` passing, `0` failing).
+  - Cloud run URL was produced and marked uploaded.
+  - `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
+- Action for future runs:
+  - If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
+  - Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.
+
+## Run Learning: 2026-03-09 (Blacklist handling and status format)
+- Observed requirement:
+  - Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
+- Machines to blacklist via `--exclude_partial_match`:
+  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
+    - `atvm6-centos6.0`
+    - `atvm41-redhat6.0`
+    - `atvm73-oracle6.0`
+  - `BLACKLISTED: SUPPORT REQUEST - WAITING`:
+    - `atvm113-debian9.0.0`
+    - `atvm115-debian9.1.0`
+    - `atvm116-debian9.2.0`
+  - `BLACKLISTED: RE-CREATE MIGHT BE NEEDED`:
+    - `atvm156-debian9.3.0`
+- Action for future runs:
+  - Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
+  - When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
+  - Use the run `build_name` as the heading/title for status responses so the test type is obvious.
+  - For failed machines in status responses, include the failure reason taken from the run log.
+  - Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
+  - Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.
+
+## Run Learning: 2026-03-11 (Machine-first status lines and whole-run ETA)
+- Observed requirement:
+  - Status output must list each machine first and then its status, rather than leading with the status label.
+  - Estimated completion time must refer to the entire remaining automation run, not only the currently running machine.
+- Action for future runs:
+  - Format machine entries as `machine-name - STATUS`.
+  - Keep failure reasons after the machine/status entry when a machine failed.
+  - When giving ETA, explicitly state it is the estimate for completion of the full remaining run.
+
+## Run Learning: 2026-03-11 (Categorized run status must be reconstructed across batches)
+- Observed failure mode:
+  - `run-sorry-cypress.py --categorize` mutates the active config to the current category batch, so live files such as `specPattern`, `current_vm`, and the newest `/tmp` Cypress JSON only describe the current category, not the full automation run.
+  - Answering from only the current live batch underreports the run and misses already-finished machines from earlier category batches.
+- Action for future runs:
+  - Reconstruct whole-run status from the generated machine scope plus all machine result artifacts written since the run start time.
+  - Use the current batch only to identify the live `RUNNING` machine and immediate next machine(s), not as the full run scope.
+  - Do not answer status requests for categorized runs until earlier category results have been checked as part of the same run.
+
+## Run Learning: 2026-03-11 (Hash-named XML files still belong to machine runs)
+- Observed failure mode:
+  - Same-run JUnit output is not consistently named `test-result-atvm...xml`.
+  - Many machine results for the same automation run were written as hash-named files such as `test-result-01fe412894862398d06d9cc4bc7e81a0.xml`.
+  - Limiting status reconstruction to machine-named XML files causes major undercounting of completed machines.
+- Action for future runs:
+  - Parse all `test-result-*.xml` files written since the run start time, not only `test-result-atvm*.xml`.
+  - Extract the machine name from XML contents such as `testsuite file=`, `testsuite name=`, or `testcase name=` when the filename does not include the machine name.
+  - Treat `check-xml-files.ts` XML outputs as bookkeeping steps, not machine results.
+  - Prefer the most recently written same-run XML per machine when multiple XML files exist for that machine.
+
+## Run Learning: 2026-03-12 (Status output must be one machine per line with notes separated)
+- Observed requirement:
+  - Listing multiple completed machines on one line makes run status harder to scan and does not meet the expected reporting format.
+  - Failure reasons and extra context should be separated from the machine status list so the list stays clean.
+- Action for future runs:
+  - Under completed, skipped, and remaining sections, put exactly one machine status on each line.
+  - Add a `Notes` section after completed machines for failure reasons, anomalies, and other operator-relevant context.
+  - Keep completed machine lines in the form `machine-name - STATUS` and avoid appending long explanations inline.
+
+## Run Learning: 2026-03-12 (Add suse15.0 machine to blacklist)
+- Observed requirement:
+  - `atvm144-suse15.0` must be excluded from automation runs because it crashes while creating the migration session.
+- Action for future runs:
+  - Add `atvm144-suse15.0` to the maintained blacklist.
+  - Record the reason as `CRASHES WHEN CREATING MIGRATION SESSION - BUG`.
+  - Include it in reusable `--exclude_partial_match` command examples.
+
+## Run Learning: 2026-03-12 (Default to gold-named ATVM config files)
+- Observed requirement:
+  - The automation VM does not reliably have `cypress.atvm-config.ts`, and defaulting to that filename can break runs before they start.
+  - Operator preference is to use ATVM config files with `gold` in the filename unless explicitly told otherwise.
+- Action for future runs:
+  - Do not reference `cypress.atvm-config.ts` by default in commands or examples.
+  - Default to `cypress.atvm-config-gold.ts` unless the operator explicitly requests another config.
+
+## Run Learning: 2026-03-12 (Examples are reference-only, not default intent)
+- Observed requirement:
+  - Reusable examples may contain extra excludes or options that the operator did not ask for.
+  - Carrying those example details into a new run without confirmation can change the requested scope.
+- Action for future runs:
+  - Treat `examples.md` as reference-only.
+  - Use only the options the operator explicitly requested, plus maintained mandatory blacklist handling.
+  - Do not assume extra example exclusions such as distro filters are desired unless the operator asks for them.
+
+## Run Learning: 2026-03-12 (Use one status format for all automation run types)
+- Observed requirement:
+  - The operator wants the same ATVM run status display every time, regardless of whether the run is `e2e`, `systemOS`, `reboot`, or another template.
+  - Changing the display style between run types makes the status harder to scan and compare.
+- Action for future runs:
+  - Use one consistent ATVM status layout for all automation status responses.
+  - Keep the order the same: build name, completed machines, notes, skipped machines, remaining machines, summary, timing, estimated completion time.
+  - Keep machine entries one per line as `machine-name - STATUS` regardless of test type.
+
+## Run Learning: 2026-03-13 (Put longer failure description on failed machine line)
+- Observed requirement:
+  - Failed machines are easier to scan when the failure description appears directly on the same line as the machine status.
+  - A longer same-line description works better than a very short label when the extra detail helps explain what actually failed.
+- Action for future runs:
+  - Format failed machine lines as `machine-name - FAIL - <failure description>`.
+  - Prefer the longer same-line description when it adds useful operator-facing context.
+  - Keep `Notes` for broader context, anomalies, and extra follow-up detail beyond the machine-specific failure description.
+
+## Run Learning: 2026-03-14 (Missing requested ATVM config must fail fast)
+- Observed requirement:
+  - If the operator asks for a specific ATVM config file and that file is missing on the automation VM, looking for other config files or substituting a different one creates the wrong next step.
+  - The operator wants to decide what to do after a missing-config failure.
+- Action for future runs:
+  - If the requested config file is missing, stop immediately and report the missing filename.
+  - Do not search the automation VM for alternate config files.
+  - Do not switch to another config unless the operator explicitly instructs it.
+
+## Run Learning: 2026-03-16 (Status requests default to live view with whole-run historical fallback)
+- Observed requirement:
+  - When the operator asks for ATVM automation run status, they want live status by default.
+  - If no automation is currently running, the status response must fall back to the most recent historical run.
+  - For categorized runs, the response must still cover the entire run rather than only the latest category batch or cloud sub-run.
+- Action for future runs:
+  - Treat every ATVM status request as a request for live run status unless the operator explicitly asks for something else.
+  - If no automation is active, reconstruct status from the most recent historical run artifacts and logs.
+  - For categorized runs, always aggregate all same-run category batches so the response covers the full run scope.
+
+## Run Learning: 2026-03-17 (Default ignore-force-shutdown and iscsi plugin)
+- Observed requirement:
+  - The operator wants `--ignore_force_shutdown` included on every ATVM automation run by default.
+  - The operator wants plugin selection to default to `--use_specified_plugin iscsi` unless a different plugin is explicitly requested.
+- Action for future runs:
+  - Add `--ignore_force_shutdown` to every `cmc-templates.py` command unless the operator explicitly asks not to use it.
+  - Default plugin-bearing ATVM automation commands to `--use_specified_plugin iscsi`.
+  - Only switch away from `iscsi` when the operator explicitly requests `fc`, `both`, or another applicable override.
+
+## Run Learning: 2026-03-18 (ATVM status requests must resolve from the local ATVM workflow, not Cirrus project operations)
+- Observed failure mode:
+  - Interpreting "status of the ATVM automation run" as a request about Cirrus project operations can return the wrong source entirely.
+  - The operator uses "ATVM automation" to mean the automation contained in the local `atvm` folder and the corresponding automation VM workflow.
+- Action for future runs:
+  - Resolve ATVM status requests from the local ATVM workflow first.
+  - Check the automation VM at `192.168.3.190` for live runner processes and live files before looking at historical artifacts.
+  - If no automation is active, reconstruct the most recent historical run from the automation VM shell history and reporter artifacts.
+  - Do not use Cirrus project operations such as `atvm - cypress` as the source for ATVM automation status unless the operator explicitly asks for project-operation status.
+
+## Run Learning: 2026-03-20 (Display exact ATVM commands and wait for approval before any execution)
+- Observed failure mode:
+  - ATVM run commands were executed before the operator had a chance to review and approve them.
+  - This happened even though the operator expects a review gate before any ATVM automation command is launched.
+- Action for future runs:
+  - Always display the exact planned ATVM commands before execution.
+  - Do not run `cmc-templates.py` until the operator explicitly approves the displayed commands.
+  - Do not run `run-sorry-cypress.py` until the operator explicitly approves the displayed commands.
+  - Treat template generation as execution that also requires operator approval.
+  - If any requested option changes after commands are displayed, rebuild and redisplay the commands and wait for fresh approval.
--- a/atvm/docs/setup/guide.md
+++ b/atvm/docs/setup/guide.md
@@ -0,0 +1,168 @@
+# ATVM Setup Script Guide
+
+This file is guide-only documentation for running and maintaining the ATVM setup workflow.
+Do not put dated run examples here.
+
+## Scope
+- Client setup script: `/home/aw/code/cds/atvm/scripts/atvm-setup-script.sh`
+- Controller wrapper: `/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh`
+- Run-learnings log: `/home/aw/code/cds/atvm/docs/setup/run-learnings.md`
+
+## Purpose
+The setup flow performs a controlled bootstrap across supported Linux distributions:
+1. Validate target host identity using expected IP + expected hostname before any configuration.
+2. Fix repositories (especially CD/DVD media repo entries).
+3. On Ubuntu, configure root SSH password-login workflow (`root/cdsi2012`) for follow-on root operations.
+4. On Oracle Linux, set default boot kernel to non-UEK when available.
+5. Disable unattended auto-upgrades on Ubuntu.
+6. Remove specific storage-related packages and install base tooling.
+7. Disable SELinux on Red Hat-family systems.
+8. Configure static IP as the final step.
+9. Print final summary and write logs to `atvm_setup_script.log`.
+10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
+11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
+12. Power off from controller only after successful verification and no setup errors.
+
+## Execution Model
+- Shell safety flags: `set -euo pipefail`
+- Logging: colorized console + plain text log file
+- Entry point: `main "$@"`
+- Default operator assumption for setup access: `root / cdsi2012` unless explicitly overridden.
+- When the operator refers to `192.168.3.191`, treat it as the default ATVM target host.
+- For SSH to `192.168.3.191`, ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`.
+- For SSH to `192.168.3.191`, use `root / cdsi2012` unless the operator explicitly provides different credentials.
+
+## Mandatory Identity Gate
+Setup must not start unless operator explicitly provides both values:
+- `--expected-ip <ip>`
+- `--expected-hostname <hostname>`
+
+Rules:
+- Connect to the operator-provided target IP directly.
+- Do not pre-scan alternate candidate IPs.
+- Do not infer hostname from target.
+- If hostname is missing from request, stop and ask for it.
+- If detected hostname does not exactly match expected hostname, stop immediately.
+- If expected IP is not assigned on target, stop immediately.
+
+## Canonical Run Order
+1. `parse_args`
+2. `validate_target_host_identity`
+3. `check_sudo`
+4. `fix_repositories`
+5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
+6. `install_sudo_if_needed`
+7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
+8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
+9. `run_package_installation`
+10. `disable_selinux` (RHEL-family only)
+11. `configure_static_ip` (final configuration step)
+12. `print_final_summary`
+13. `reboot_and_verify_selinux_if_needed`
+14. `poweroff_client_if_successful` (controller-driven after verification)
+
+## Core Behavior By Step
+
+### Repository Fix
+- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
+- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
+- Fedora: same model via `dnf clean all && dnf makecache`.
+- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
+
+### Oracle Linux Kernel Handling
+- Oracle Linux only.
+- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
+- Track whether default changed and whether reboot is required.
+
+### Ubuntu Root SSH Workflow
+- Ubuntu only.
+- Set root password `cdsi2012`, unlock root account.
+- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
+- Validate config and restart SSH service.
+
+### Ubuntu Auto-Upgrade Disable
+- Ubuntu only.
+- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
+
+### Package Installation
+- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
+- Pre-cleanup removes multipath/iSCSI packages where applicable.
+- Installs kernel headers per distro.
+- Base package set includes:
+  `curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
+
+### SELinux Disable
+- RHEL-family only.
+- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
+- Marks reboot recommendation/requirement in summary.
+
+### Static IP Configuration (Final Step)
+Hardcoded target values:
+- IP: `192.168.3.191`
+- Prefix: `22`
+- Gateway: `192.168.0.1`
+- DNS: `8.8.8.8`, `8.8.4.4`
+
+Interface detection priority:
+1. default-route interface
+2. first non-loopback interface with IPv4
+3. first non-loopback interface from link list
+
+Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
+
+### SELinux Reboot Verification
+- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
+- Creates one-time systemd verifier service before reboot.
+- Post-reboot service records runtime `getenforce` and self-removes.
+- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
+- On errors, leaves client on for manual inspection.
+
+## Power-State Rules
+- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
+- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
+- If any real error lines exist, keep client powered on.
+
+## Logging and Verification
+- Client log filename: `atvm_setup_script.log`
+- Common client log path when run as root: `/root/atvm_setup_script.log`
+- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
+
+Required post-run validation:
+1. Copy client log to controller `atvm/log/` path.
+2. Compare SHA256 between client and copied controller log.
+3. Require exact match.
+
+## Preferred Execution Commands
+Direct client execution:
+```bash
+sudo bash /home/cirrususer/atvm-setup-script.sh \
+  --expected-ip <current-client-ip> \
+  --expected-hostname <exact-hostname>
+```
+
+Controller run + collect:
+```bash
+EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
+/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh
+```
+
+Controller collect-only after client run:
+```bash
+/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh --collect-after-complete
+```
+
+## Troubleshooting
+- If local collected log is missing, do not rerun full setup just for log recovery.
+- Use collect-only mode and verify SHA256 after copy.
+- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
+- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
+
+## Operational Caveats
+- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
+- Static IP values are hardcoded; adjust before use in other environments.
+- Run in maintenance windows because network changes can interrupt active sessions.
+- Preserve host identity gating; do not weaken expected IP/hostname checks.
+
+## Update Rule
+- After each run, update this file only for guide/rule/checklist/default behavior changes.
+- Put run-specific outcomes in `run-learnings.md` only when the run produced a new learning.
--- a/atvm/docs/setup/run-learnings.md
+++ b/atvm/docs/setup/run-learnings.md
@@ -0,0 +1,40 @@
+# ATVM Setup Script Runs
+
+This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
+
+## Entry Rule
+- Add an entry only when the run changed workflow behavior, exposed a new failure mode, or confirmed a new required check.
+- Do not add routine runs with no new learning.
+
+## Run Learning: 2026-03-03 (Ubuntu 24.04)
+- Environment:
+  - Initial IP: `192.168.0.89`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm-1`
+- Learning:
+  - Root SSH password workflow (`root/cdsi2012`) and log copy/hash verification path are valid end-to-end.
+  - Wrapper must enforce identity arguments for run-and-collect mode.
+- Action for future runs:
+  - Require `EXPECTED_IP_ARG` and `EXPECTED_HOSTNAME_ARG` for wrapper run-and-collect.
+
+## Run Learning: 2026-03-05 (RHEL 9)
+- Environment:
+  - Initial IP: `192.168.3.212`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm-2`
+- Learning:
+  - SELinux disable path with reboot + post-reboot verifier worked.
+  - Auto power-off can race controller-side log collection if done too early.
+- Action for future runs:
+  - Keep client powered on until controller log copy + SHA256 verification completes.
+  - Only then perform controller-side power-off when no real error lines are present.
+
+## Run Learning: 2026-03-06 (Oracle Linux 9)
+- Environment:
+  - Initial IP: `192.168.0.121`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm`
+- Learning:
+  - Wrapper auto power-off was blocked by false-positive error detection from instructional text.
+- Action for future runs:
+  - Match only real error log lines using `^\[ERROR\]` for power-off gating.
--- a/atvm/docs/workflow/conventions.md
+++ b/atvm/docs/workflow/conventions.md
@@ -0,0 +1,30 @@
+# ATVM Workspace Conventions
+
+## File Roles
+- `guide.md`
+  - authoritative workflow rules, defaults, and checklists
+- `examples.md`
+  - reusable command examples only
+- `run-learnings.md`
+  - dated lessons captured only when a run produced a new lasting insight
+- `inventory/*.md`
+  - durable environment reference and listings
+- `archive/imported-notes/*.md`
+  - preserved source material kept for completeness and traceability
+
+## Update Policy
+- Update `docs/setup/guide.md` when setup/bootstrap workflow behavior changes.
+- Update `docs/automation/guide.md` when automation workflow behavior changes.
+- Update `docs/automation/examples.md` when a reusable example pattern changes.
+- Update `run-learnings.md` files only when a run created a new lesson worth preserving.
+- Keep inventory details in `inventory/` rather than mixing them into workflow guides.
+
+## Path Conventions
+- Prefer actual repo paths under `/home/aw/code/cds/atvm/...` in documentation.
+- Keep top-level navigation in `README.md` and `AGENTS.md`.
+- Keep executable assets under `scripts/`.
+
+## Archive Policy
+- Preserve imported long-form notes under `archive/imported-notes/`.
+- Do not rely on archived notes as the primary operational runbook when a current guide exists.
+- Keep detailed listings available; reorganization should improve navigation, not remove information.