Initial commit

2026-03-11 15:19:25 -04:00
commit 93b6d7acb8
16 changed files with 4454 additions and 0 deletions
--- a/atvm/AGENTS.md
+++ b/atvm/AGENTS.md
@@ -0,0 +1,162 @@
+# ATVM AGENTS Guide
+
+This file defines how to operate and maintain the ATVM folder workflows.
+It is rebuilt from current files in `/home/aw/code/cds/atvm`.
+
+## Scope
+Two operational tracks exist in this folder:
+- Setup/bootstrap track:
+  - `atvm-setup-script.sh`
+  - `run-atvm-setup-and-collect-log.sh`
+  - `atvm-setup-script-guide.md`
+  - `atvm-setup-script-runs.md`
+- Cypress automation track:
+  - `atvm-automation-guide.md`
+  - `atvm-automation-examples.md`
+  - `atvm-automation-runs.md`
+
+Reference/inventory material:
+- `cypress-automation-for-cmc.md`
+- `cypress-automation-for-cmc.md:Zone.Identifier`
+
+## File Roles
+- `*-guide.md` files:
+  - Guide-only procedures, rules, defaults, and checklists.
+  - No dated or one-off run examples.
+- `*-runs.md` files:
+  - Run-specific learnings only when a run introduces new information.
+  - No routine/no-change run logs.
+- `*-examples.md` files:
+  - Reusable command examples and commonly used option combinations.
+  - Keep generic; avoid dated one-off run outcomes.
+
+## Setup Track: Required Behavior
+Use `atvm-setup-script-guide.md` as the procedure source and keep behavior aligned with `atvm-setup-script.sh`.
+
+### Safety-Critical Rules
+1. Never run setup without operator-provided `--expected-ip` and `--expected-hostname`.
+2. Never infer expected hostname from target host output.
+3. Stop immediately on hostname mismatch or expected-IP-not-assigned.
+4. Keep static IP configuration as a final step to avoid mid-run connection loss.
+
+### Canonical Setup Order
+1. Parse args.
+2. Validate host identity.
+3. Check sudo/privileges.
+4. Fix repositories.
+5. Configure Ubuntu root SSH/password workflow (Ubuntu only).
+6. Install sudo if needed.
+7. Configure Oracle default non-UEK kernel (Oracle Linux only).
+8. Disable Ubuntu auto-upgrades (Ubuntu only).
+9. Run package cleanup/install.
+10. Disable SELinux (RHEL-family).
+11. Configure static IP.
+12. Print summary.
+13. Reboot + post-reboot SELinux verifier when applicable.
+14. Keep client on until controller log copy + SHA256 verification completes.
+15. Power off only after verified success and no real error log lines.
+
+### Setup Defaults
+- ATVM static IP target: `192.168.3.191/22`
+- Gateway: `192.168.0.1`
+- DNS: `8.8.8.8`, `8.8.4.4`
+- Ubuntu root SSH workflow credential in docs/script: `root / cdsi2012`
+- Client log file: `atvm_setup_script.log` (typically `/root/atvm_setup_script.log` when run as root)
+
+### Setup Controller Wrapper Rules
+- Wrapper supports:
+  - run-and-collect (default)
+  - `--collect-after-complete`
+- `run-and-collect` requires env vars:
+  - `EXPECTED_IP_ARG`
+  - `EXPECTED_HOSTNAME_ARG`
+- Wrapper validates success marker and SHA256 before success.
+- Wrapper powers off only when log has no lines matching `^\[ERROR\]`.
+
+## Cypress Automation Track: Required Behavior
+Use `atvm-automation-guide.md` as the execution source.
+Use `atvm-automation-examples.md` as the common options/command reference.
+
+### Controller Client
+- Hostname: `atvm-cypres-vm-1`
+- IP: `192.168.3.190`
+- Credentials: `root / atvmcdsi2012`
+
+### Mandatory Run Control
+1. Before planning a new run, check for active automation processes.
+2. Report running/not-running status.
+3. If running, ask before termination; terminate only with explicit approval.
+4. Always show exact planned command(s) before execution.
+5. Execute only after explicit approval.
+6. If monitoring is not requested, report immediate command success/failure and any errors.
+7. Monitor completion only when explicitly requested by the operator.
+8. For monitored runs, allow long runtime windows (15-30+ minutes or longer) and continue until completion unless operator instructs otherwise.
+9. Do not terminate monitored runs unless the operator explicitly instructs termination.
+
+### Status Request Format
+When the operator asks for run status, report in this order:
+1. Heading/title using the run `build_name`.
+2. Completed machines with pass/fail state for each machine.
+3. Skipped machines with reason.
+4. Remaining machines still to run.
+5. Summary counts for finished, passed, failed, and skipped machines.
+6. Timing details:
+   - start time
+   - end time if complete
+   - total run time if complete, or elapsed run time if still running
+   - quickest completed test runtime
+   - longest completed test runtime
+   - average completed test runtime
+7. Estimated completion time.
+
+Status details:
+- Use the live run log on the automation VM when available.
+- Use the run `build_name` as the heading/title when available.
+- Show blacklisted machines under skipped machines when they are part of the requested scope.
+- Show in-progress machines under remaining machines as `RUNNING`.
+- Show not-yet-started machines as `NOT STARTED`.
+- Use completed spec results already recorded in the log to determine machine pass/fail state.
+- For failed machines, include the failure reason from the run log in the status output.
+- Include start time in status output when it can be derived from the log.
+- Include end time and total runtime for completed runs, or elapsed runtime for active runs.
+- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the log.
+
+### Automation Blacklist
+Always exclude these machines with `--exclude_partial_match` when building ATVM automation commands.
+
+CMC install blacklist (`BLACKLISTED: CMC INSTALL - CAN'T COMPILE`):
+- `atvm6-centos6.0`
+- `atvm41-redhat6.0`
+- `atvm73-oracle6.0`
+
+Support-request blacklist (`BLACKLISTED: SUPPORT REQUEST - WAITING`):
+- `atvm113-debian9.0.0`
+- `atvm115-debian9.1.0`
+- `atvm116-debian9.2.0`
+- `atvm156-debian9.3.0`
+
+Re-create blacklist:
+- `atvm157-debian13.0.0`
+
+### Operator Preferences
+- Do not include Gold Disk IDs in `--build_name`.
+- `--build_name` must not contain spaces; use `-` between words.
+- Prefer distro-scoped filtering (for example `--containsVm redhat9`) when possible.
+
+## Update Policy (Both Tracks)
+After each run:
+- Update corresponding `*-guide.md` only if workflow/rules/default behavior changed.
+- Update corresponding `*-examples.md` when common command patterns/options change.
+- Update corresponding `*-runs.md` only if the run produced new learning.
+
+## Path and Naming Consistency Note
+Current repo filenames use hyphen style, but some script text/defaults still show underscore-style paths (for example `atvm_setup_script.sh`, `run_atvm_setup_and_collect_log.sh`, `/home/aw/code/atvm`).
+
+When operating:
+1. Use actual filesystem paths in this repo first (`/home/aw/code/cds/atvm/...`).
+2. If script defaults are used, verify they match existing files before execution.
+3. If changing path conventions, update scripts and guides in the same change.
+
+## Non-Goals
+- Do not treat `cypress-automation-for-cmc.md` as executable runbook logic.
+- Do not record secrets/tokens into new guide or runs entries.
--- a/atvm/atvm-automation-examples.md
+++ b/atvm/atvm-automation-examples.md
@@ -0,0 +1,97 @@
+## Examples
+
+- `--build_name` values must not include spaces; use `-` between words.
+- Add the maintained blacklist to `--exclude_partial_match` for runs that use broad selection or randomization.
+- Maintained blacklist:
+  - `atvm6-centos6.0`
+  - `atvm41-redhat6.0`
+  - `atvm73-oracle6.0`
+  - `atvm113-debian9.0.0`
+  - `atvm115-debian9.1.0`
+  - `atvm116-debian9.2.0`
+  - `atvm156-debian9.3.0`
+  - `atvm157-debian13.0.0`
+
+### E2E: Pure iscsi+fc with specific VMs
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --specify_vms atvm3-ubuntu18.04 atvm109-w2k12R2; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-pure-plugin
+```
+
+### E2E: Infinibox fc with specific VMs
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type infinibox --use_specified_plugin fc --specify_vms atvm51-redhat6.10 atvm110-w2k16; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-infinibox-plugin
+```
+
+### E2E: Regular cutover
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --specify_vms atvm93-oracle7.9 atvm111-w2k19 --regular_cutover; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-regular-cutover
+```
+
+### Reboot test
+```bash
+python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm37-rocky8.8 atvm112-w2k22 --wait_for_power_on 120; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-reboot
+```
+
+### SystemOS test
+```bash
+python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --specify_vms atvm118-oracle9.3 atvm145-w2k25; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-systemOS
+```
+
+### MigrateOPS test
+```bash
+python3 cmc-templates.py --template cmc-migrateops --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm139-redhat9.5 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-migrateOPS
+```
+
+### Compute MigrateOPS: vmware
+```bash
+python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms vmware --test_partition --specify_vms atvm138-oracle9.4-opt atvm112-w2k22 --set_static_ip_dest; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-vmware
+```
+
+### Compute MigrateOPS: ovirt
+```bash
+python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms ovirt --test_partition --specify_vms atvm124-redhat8.8 atvm111-w2k19 --set_static_ip_dest; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-ovirt
+```
+
+### Group consistency
+```bash
+python3 cmc-templates.py --template cmc-group-consistency --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm4-ubuntu20.04 atvm112-w2k22 --enable_uuid; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-consistentyGroup
+```
+
+### H2H same platform
+```bash
+python3 cmc-templates.py --template cmc-h2h-same-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm38-rocky9.0 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hSamePlatform
+```
+
+### H2H different platform
+```bash
+python3 cmc-templates.py --template cmc-h2h-diff-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm65-redhat8.3 atvm112-w2k22; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hDifferentPlatform
+```
+
+### Randomized reboot sanity
+```bash
+python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0 --wait_for_power_on 120; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-reboot-iscsi
+```
+
+### Randomized e2e sanity
+```bash
+python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-e2e
+```
+
+### Randomized systemOS sanity
+```bash
+python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --randomize 1 --exclude_partial_match suse15.0 fedora34 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
+python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-systemOS
+```
--- a/atvm/atvm-automation-guide.md
+++ b/atvm/atvm-automation-guide.md
@@ -0,0 +1,166 @@
+# Run ATVM Automation Guide
+
+This file is guide-only documentation for operating ATVM CMC automation.
+Do not put specific run examples here.
+For reusable command examples and common option combinations, use `atvm-automation-examples.md`.
+
+## Purpose
+Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.
+
+## ATVM Cypress Automation Controller Client
+- Hostname: `atvm-cypres-vm-1`
+- IP: `192.168.3.190`
+- Credentials: `root / atvmcdsi2012`
+
+## Operating Constraints
+- Run only scripts/commands explicitly requested.
+- Do not make manual system configuration changes on the client.
+- Do not edit client files unless explicitly requested.
+
+## Operator Preferences
+- Do not include Gold Disk identifiers in `--build_name`.
+- `--build_name` must not contain spaces; use `-` between words.
+- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
+- Before preparing a new run, always check whether automation is already running.
+- Always report whether automation is currently running.
+- If running, ask whether to terminate; terminate only with explicit approval.
+- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
+- Before any run, always show exact planned command(s) and wait for explicit approval.
+- Execute only after explicit approval (for example `approve`).
+- After execution, report immediate success/failure only.
+- Do not actively monitor completion unless explicitly requested.
+- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
+- Report command errors immediately.
+- `sshpass` may be used where password-based SSH automation is required.
+
+## Core Scripts
+- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
+- Test execution: `./run-sorry-cypress.py`
+
+Typical sequence:
+1. Run `cmc-templates.py` with requested template/options.
+2. Run `run-sorry-cypress.py` with matching config and build name.
+
+## Config File / Gold Disk Mapping
+- `cypress.atvm-config-gold.ts` -> Gold Disk 1
+- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
+- Additional numbered config variants map to corresponding Gold Disks.
+
+## Available Templates
+- `cmc-e2e`
+- `cmc-group-consistency`
+- `cmc-h2h-diff-platf`
+- `cmc-h2h-same-platf`
+- `cmc-migrateops`
+- `cmc-migrateops-compute-migration`
+- `cmc-reboot`
+- `cmc-systemOS`
+
+## Command Pattern
+```bash
+python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
+python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces>
+```
+
+## Examples Reference
+- Commonly used command examples: `atvm-automation-examples.md`
+- Keep this guide focused on run-control rules and workflow constraints.
+
+## Example Option Patterns (Guide-Only)
+- Distro-scoped VM selection:
+  - `--containsVm redhat`
+  - `--containsVm redhat9`
+- Explicit VM selection:
+  - `--specify_vms <vm1> <vm2> ...`
+- Compute migrateops platform:
+  - `--vm_platforms vmware|ovirt|openshift|proxmox`
+
+## Blacklisted Machines
+Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.
+
+Permanently blacklisted because CMC cannot compile:
+- `atvm6-centos6.0`
+- `atvm41-redhat6.0`
+- `atvm73-oracle6.0`
+
+Temporarily blacklisted while support requests are waiting:
+- `atvm113-debian9.0.0`
+- `atvm115-debian9.1.0`
+- `atvm116-debian9.2.0`
+- `atvm156-debian9.3.0`
+
+Temporarily blacklisted until re-created:
+- `atvm157-debian13.0.0`
+
+Preferred exclude list:
+- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0`
+
+## Running-Automation Check (Mandatory)
+Before any new automation request:
+1. SSH to `root@192.168.3.190`.
+2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
+3. Report:
+   - `Running` with process details, or
+   - `Not running`.
+4. If `Running`, ask operator whether to terminate.
+5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
+6. If termination is not approved, do not start a new run.
+
+## Approval Workflow (Mandatory)
+1. Build exact command(s) for the request.
+2. Present them verbatim as planned commands.
+3. Wait for explicit approval.
+4. Run only approved command(s), no extra options.
+5. If monitoring was not requested, report immediate success/failure for each command.
+6. If monitoring was requested, keep monitoring until completion and report final outcome.
+
+## Requested Test Style
+When asked for one VM or a VM set:
+- choose requested template/options,
+- choose correct config file for intended Gold Disk,
+- use a descriptive `--build_name` without Gold Disk IDs.
+
+## Update Rule
+- After each run, update this guide only for workflow/rule/default changes.
+- Update `atvm-automation-examples.md` for reusable command/option examples.
+- Add run-specific learnings only to `atvm-automation-runs.md` when the run produced new information.
+
+## Monitoring Policy
+- Monitor only when the operator explicitly asks to monitor.
+- If monitoring was not requested, run commands and report execution success/failure and any errors.
+- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
+
+## Status Reporting Format
+When the operator asks for the status of an ATVM automation run, report in this order:
+1. Heading/title using the run `build_name`.
+2. Completed machines with pass/fail state for each machine.
+3. Skipped machines with reason.
+4. Remaining machines still to run.
+5. Summary counts for finished, passed, failed, and skipped machines.
+6. Timing details:
+   - start time
+   - end time if complete
+   - total run time if complete, or elapsed run time if still running
+   - quickest completed test runtime
+   - longest completed test runtime
+   - average completed test runtime
+7. Estimated completion time.
+
+Status-report expectations:
+- Use the live automation VM state when available.
+- Derive the heading/title from the run `build_name` when available.
+- Derive completed-machine status from completed spec results already written to the run log.
+- Include the run start time in every status response when it can be derived from the run log.
+- If the run is complete, include the end time and total run time.
+- If the run is still active, include the elapsed run time so far.
+- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
+- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
+- For skipped machines, include the reason category:
+  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
+  - `BLACKLISTED: SUPPORT REQUEST - WAITING`
+  - `BLACKLISTED: RE-CREATE NEEDED`
+- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
+- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
+- If no failures are present in completed spec results, report those completed machines as `PASS`.
+- If a completed spec result shows a failure, report that machine as `FAIL` and include the failure reason from the run log.
+- Base the completion estimate on the current remaining machine count and recent per-machine runtime visible in the run log.
--- a/atvm/atvm-automation-runs.md
+++ b/atvm/atvm-automation-runs.md
@@ -0,0 +1,47 @@
+# Run ATVM Automation Runs
+
+This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.
+
+## Entry Rule
+- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
+- Do not add routine runs with no new learning.
+
+## Current State
+- No run-learning entries recorded yet from `atvm-automation-guide.md` source material.
+
+## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
+- Request:
+  - template: `cmc-e2e`
+  - filter: `--containsVm redhat9.7`
+  - integration: `--integration_type pure`
+  - plugin: `--use_specified_plugin fc`
+- Observed result:
+  - Cypress spec execution passed (`1` test, `1` passing, `0` failing).
+  - Cloud run URL was produced and marked uploaded.
+  - `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
+- Action for future runs:
+  - If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
+  - Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.
+
+## Run Learning: 2026-03-09 (Blacklist handling and status format)
+- Observed requirement:
+  - Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
+- Machines to blacklist via `--exclude_partial_match`:
+  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
+    - `atvm6-centos6.0`
+    - `atvm41-redhat6.0`
+    - `atvm73-oracle6.0`
+  - `BLACKLISTED: SUPPORT REQUEST - WAITING`:
+    - `atvm113-debian9.0.0`
+    - `atvm115-debian9.1.0`
+    - `atvm116-debian9.2.0`
+    - `atvm156-debian9.3.0`
+  - Needs re-creation:
+    - `atvm157-debian13.0.0`
+- Action for future runs:
+  - Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
+  - When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
+  - Use the run `build_name` as the heading/title for status responses so the test type is obvious.
+  - For failed machines in status responses, include the failure reason taken from the run log.
+  - Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
+  - Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.
--- a/atvm/atvm-setup-script-guide.md
+++ b/atvm/atvm-setup-script-guide.md
@@ -0,0 +1,165 @@
+# ATVM Setup Script Guide
+
+This file is guide-only documentation for running and maintaining the ATVM setup workflow.
+Do not put dated run examples here.
+
+## Scope
+- Client setup script: `/home/aw/code/cds/atvm/atvm-setup-script.sh`
+- Controller wrapper: `/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh`
+- Run-learnings log: `/home/aw/code/cds/atvm/atvm-setup-script-runs.md`
+
+## Purpose
+The setup flow performs a controlled bootstrap across supported Linux distributions:
+1. Validate target host identity using expected IP + expected hostname before any configuration.
+2. Fix repositories (especially CD/DVD media repo entries).
+3. On Ubuntu, configure root SSH password-login workflow (`root/cdsi2012`) for follow-on root operations.
+4. On Oracle Linux, set default boot kernel to non-UEK when available.
+5. Disable unattended auto-upgrades on Ubuntu.
+6. Remove specific storage-related packages and install base tooling.
+7. Disable SELinux on Red Hat-family systems.
+8. Configure static IP as the final step.
+9. Print final summary and write logs to `atvm_setup_script.log`.
+10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
+11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
+12. Power off from controller only after successful verification and no setup errors.
+
+## Execution Model
+- Shell safety flags: `set -euo pipefail`
+- Logging: colorized console + plain text log file
+- Entry point: `main "$@"`
+- Default operator assumption for setup access: `root / cdsi2012` unless explicitly overridden.
+
+## Mandatory Identity Gate
+Setup must not start unless operator explicitly provides both values:
+- `--expected-ip <ip>`
+- `--expected-hostname <hostname>`
+
+Rules:
+- Connect to the operator-provided target IP directly.
+- Do not pre-scan alternate candidate IPs.
+- Do not infer hostname from target.
+- If hostname is missing from request, stop and ask for it.
+- If detected hostname does not exactly match expected hostname, stop immediately.
+- If expected IP is not assigned on target, stop immediately.
+
+## Canonical Run Order
+1. `parse_args`
+2. `validate_target_host_identity`
+3. `check_sudo`
+4. `fix_repositories`
+5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
+6. `install_sudo_if_needed`
+7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
+8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
+9. `run_package_installation`
+10. `disable_selinux` (RHEL-family only)
+11. `configure_static_ip` (final configuration step)
+12. `print_final_summary`
+13. `reboot_and_verify_selinux_if_needed`
+14. `poweroff_client_if_successful` (controller-driven after verification)
+
+## Core Behavior By Step
+
+### Repository Fix
+- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
+- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
+- Fedora: same model via `dnf clean all && dnf makecache`.
+- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
+
+### Oracle Linux Kernel Handling
+- Oracle Linux only.
+- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
+- Track whether default changed and whether reboot is required.
+
+### Ubuntu Root SSH Workflow
+- Ubuntu only.
+- Set root password `cdsi2012`, unlock root account.
+- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
+- Validate config and restart SSH service.
+
+### Ubuntu Auto-Upgrade Disable
+- Ubuntu only.
+- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
+
+### Package Installation
+- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
+- Pre-cleanup removes multipath/iSCSI packages where applicable.
+- Installs kernel headers per distro.
+- Base package set includes:
+  `curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
+
+### SELinux Disable
+- RHEL-family only.
+- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
+- Marks reboot recommendation/requirement in summary.
+
+### Static IP Configuration (Final Step)
+Hardcoded target values:
+- IP: `192.168.3.191`
+- Prefix: `22`
+- Gateway: `192.168.0.1`
+- DNS: `8.8.8.8`, `8.8.4.4`
+
+Interface detection priority:
+1. default-route interface
+2. first non-loopback interface with IPv4
+3. first non-loopback interface from link list
+
+Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
+
+### SELinux Reboot Verification
+- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
+- Creates one-time systemd verifier service before reboot.
+- Post-reboot service records runtime `getenforce` and self-removes.
+- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
+- On errors, leaves client on for manual inspection.
+
+## Power-State Rules
+- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
+- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
+- If any real error lines exist, keep client powered on.
+
+## Logging and Verification
+- Client log filename: `atvm_setup_script.log`
+- Common client log path when run as root: `/root/atvm_setup_script.log`
+- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
+
+Required post-run validation:
+1. Copy client log to controller `atvm/log/` path.
+2. Compare SHA256 between client and copied controller log.
+3. Require exact match.
+
+## Preferred Execution Commands
+Direct client execution:
+```bash
+sudo bash /home/cirrususer/atvm-setup-script.sh \
+  --expected-ip <current-client-ip> \
+  --expected-hostname <exact-hostname>
+```
+
+Controller run + collect:
+```bash
+EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
+/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh
+```
+
+Controller collect-only after client run:
+```bash
+/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh --collect-after-complete
+```
+
+## Troubleshooting
+- If local collected log is missing, do not rerun full setup just for log recovery.
+- Use collect-only mode and verify SHA256 after copy.
+- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
+- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
+
+## Operational Caveats
+- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
+- Static IP values are hardcoded; adjust before use in other environments.
+- Run in maintenance windows because network changes can interrupt active sessions.
+- Preserve host identity gating; do not weaken expected IP/hostname checks.
+
+## Update Rule
+- After each run, update this file only for guide/rule/checklist/default behavior changes.
+- Put run-specific outcomes in `atvm-setup-script-runs.md` only when the run produced a new learning.
--- a/atvm/atvm-setup-script-runs.md
+++ b/atvm/atvm-setup-script-runs.md
@@ -0,0 +1,40 @@
+# ATVM Setup Script Runs
+
+This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
+
+## Entry Rule
+- Add an entry only when the run changed workflow behavior, exposed a new failure mode, or confirmed a new required check.
+- Do not add routine runs with no new learning.
+
+## Run Learning: 2026-03-03 (Ubuntu 24.04)
+- Environment:
+  - Initial IP: `192.168.0.89`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm-1`
+- Learning:
+  - Root SSH password workflow (`root/cdsi2012`) and log copy/hash verification path are valid end-to-end.
+  - Wrapper must enforce identity arguments for run-and-collect mode.
+- Action for future runs:
+  - Require `EXPECTED_IP_ARG` and `EXPECTED_HOSTNAME_ARG` for wrapper run-and-collect.
+
+## Run Learning: 2026-03-05 (RHEL 9)
+- Environment:
+  - Initial IP: `192.168.3.212`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm-2`
+- Learning:
+  - SELinux disable path with reboot + post-reboot verifier worked.
+  - Auto power-off can race controller-side log collection if done too early.
+- Action for future runs:
+  - Keep client powered on until controller log copy + SHA256 verification completes.
+  - Only then perform controller-side power-off when no real error lines are present.
+
+## Run Learning: 2026-03-06 (Oracle Linux 9)
+- Environment:
+  - Initial IP: `192.168.0.121`
+  - Final static IP: `192.168.3.191`
+  - Hostname: `atvm-codextest-vm`
+- Learning:
+  - Wrapper auto power-off was blocked by false-positive error detection from instructional text.
+- Action for future runs:
+  - Match only real error log lines using `^\[ERROR\]` for power-off gating.
--- a/atvm/atvm-setup-script.sh
+++ b/atvm/atvm-setup-script.sh
--- a/atvm/cypress-automation-for-cmc.md
+++ b/atvm/cypress-automation-for-cmc.md
--- a/atvm/cypress-automation-for-cmc.md:Zone.Identifier
+++ b/atvm/cypress-automation-for-cmc.md:Zone.Identifier
--- a/atvm/run-atvm-setup-and-collect-log.sh
+++ b/atvm/run-atvm-setup-and-collect-log.sh
@@ -0,0 +1,228 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+REMOTE_IP_PRIMARY="${REMOTE_IP_PRIMARY:-192.168.0.121}"
+REMOTE_IP_SECONDARY="${REMOTE_IP_SECONDARY:-192.168.3.191}"
+REMOTE_USER="${REMOTE_USER:-root}"
+PROJECT_DIR="${PROJECT_DIR:-/home/aw/code/atvm}"
+LOCAL_LOG_DIR="${LOCAL_LOG_DIR:-$PROJECT_DIR/log}"
+LOCAL_SETUP_SCRIPT="${LOCAL_SETUP_SCRIPT:-$PROJECT_DIR/atvm_setup_script.sh}"
+REMOTE_SETUP_SCRIPT="${REMOTE_SETUP_SCRIPT:-/root/atvm_setup_script.sh}"
+REMOTE_LOG_FILE="${REMOTE_LOG_FILE:-/root/atvm_setup_script.log}"
+WAIT_TIMEOUT_SECONDS="${WAIT_TIMEOUT_SECONDS:-600}"
+MODE="${1:-run-and-collect}"
+EXPECTED_IP_ARG="${EXPECTED_IP_ARG:-}"
+EXPECTED_HOSTNAME_ARG="${EXPECTED_HOSTNAME_ARG:-}"
+
+SSH_OPTS=(-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5)
+
+if [[ ! -f "$LOCAL_SETUP_SCRIPT" ]]; then
+    echo "ERROR: Local setup script not found: $LOCAL_SETUP_SCRIPT" >&2
+    exit 1
+fi
+
+mkdir -p "$LOCAL_LOG_DIR"
+
+if ! command -v ssh >/dev/null 2>&1 || ! command -v scp >/dev/null 2>&1; then
+    echo "ERROR: ssh/scp is required." >&2
+    exit 1
+fi
+
+SSH_CMD=(ssh "${SSH_OPTS[@]}")
+SCP_CMD=(scp "${SSH_OPTS[@]}")
+
+if [[ -n "${ATVM_PASSWORD:-}" ]]; then
+    if command -v sshpass >/dev/null 2>&1; then
+        SSH_CMD=(sshpass -p "$ATVM_PASSWORD" ssh "${SSH_OPTS[@]}")
+        SCP_CMD=(sshpass -p "$ATVM_PASSWORD" scp "${SSH_OPTS[@]}")
+    else
+        echo "WARNING: ATVM_PASSWORD is set, but sshpass is not installed. Falling back to interactive password prompts."
+    fi
+fi
+
+run_ssh() {
+    local host="$1"
+    shift
+    "${SSH_CMD[@]}" "${REMOTE_USER}@${host}" "$@"
+}
+
+run_scp_to_remote() {
+    local src="$1"
+    local host="$2"
+    local dst="$3"
+    "${SCP_CMD[@]}" "$src" "${REMOTE_USER}@${host}:${dst}"
+}
+
+run_scp_from_remote() {
+    local host="$1"
+    local src="$2"
+    local dst="$3"
+    "${SCP_CMD[@]}" "${REMOTE_USER}@${host}:${src}" "$dst"
+}
+
+wait_for_reachable_host() {
+    local start_ts current_ts elapsed
+    start_ts="$(date +%s)"
+
+    while true; do
+        for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
+            if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
+                echo "$host"
+                return 0
+            fi
+        done
+
+        current_ts="$(date +%s)"
+        elapsed=$((current_ts - start_ts))
+        if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
+            return 1
+        fi
+        sleep 5
+    done
+}
+
+pick_initial_host() {
+    for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
+        if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
+            echo "$host"
+            return 0
+        fi
+    done
+    return 1
+}
+
+wait_for_completed_task() {
+    local start_ts current_ts elapsed
+    start_ts="$(date +%s)"
+
+    while true; do
+        for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
+            if run_ssh "$host" "test -f '$REMOTE_LOG_FILE' && grep -q 'SUCCESS: ATVM VM Setup Complete!' '$REMOTE_LOG_FILE'" >/dev/null 2>&1; then
+                echo "$host"
+                return 0
+            fi
+        done
+
+        current_ts="$(date +%s)"
+        elapsed=$((current_ts - start_ts))
+        if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
+            return 1
+        fi
+        sleep 5
+    done
+}
+
+wait_for_host_offline() {
+    local host="$1"
+    local start_ts current_ts elapsed
+    start_ts="$(date +%s)"
+
+    while true; do
+        if ! run_ssh "$host" "echo still-up" >/dev/null 2>&1; then
+            return 0
+        fi
+
+        current_ts="$(date +%s)"
+        elapsed=$((current_ts - start_ts))
+        if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
+            return 1
+        fi
+        sleep 5
+    done
+}
+
+if [[ "$MODE" != "run-and-collect" && "$MODE" != "--collect-after-complete" ]]; then
+    echo "Usage:"
+    echo "  $0                      # run setup on client, then collect log"
+    echo "  $0 --collect-after-complete  # wait for completed client task, then collect log only"
+    exit 1
+fi
+
+if [[ "$MODE" == "run-and-collect" ]]; then
+    if [[ -z "$EXPECTED_IP_ARG" || -z "$EXPECTED_HOSTNAME_ARG" ]]; then
+        echo "ERROR: run-and-collect requires EXPECTED_IP_ARG and EXPECTED_HOSTNAME_ARG." >&2
+        echo "Example:" >&2
+        echo "  EXPECTED_IP_ARG=192.168.0.121 EXPECTED_HOSTNAME_ARG=atvm-codextest-vm $0" >&2
+        exit 1
+    fi
+
+    INITIAL_HOST="$(pick_initial_host)" || {
+        echo "ERROR: Could not reach ${REMOTE_IP_PRIMARY} or ${REMOTE_IP_SECONDARY} for initial setup." >&2
+        exit 1
+    }
+
+    echo "Copying setup script to ${REMOTE_USER}@${INITIAL_HOST}:${REMOTE_SETUP_SCRIPT}"
+    run_scp_to_remote "$LOCAL_SETUP_SCRIPT" "$INITIAL_HOST" "$REMOTE_SETUP_SCRIPT"
+
+    echo "Running remote setup script on ${INITIAL_HOST} (disconnect is expected during IP/reboot steps)"
+    set +e
+    run_ssh "$INITIAL_HOST" "chmod +x '$REMOTE_SETUP_SCRIPT' && bash '$REMOTE_SETUP_SCRIPT' --expected-ip '$EXPECTED_IP_ARG' --expected-hostname '$EXPECTED_HOSTNAME_ARG'"
+    run_status=$?
+    set -e
+    if (( run_status != 0 )); then
+        echo "INFO: Remote run returned non-zero (${run_status}). Continuing because network reconfiguration/reboot can interrupt SSH."
+    fi
+
+    echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
+    ACTIVE_HOST="$(wait_for_completed_task)" || {
+        echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
+        exit 1
+    }
+else
+    echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
+    ACTIVE_HOST="$(wait_for_completed_task)" || {
+        echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
+        exit 1
+    }
+fi
+
+echo "Host reachable at: ${ACTIVE_HOST}"
+
+REMOTE_HOSTNAME="$(run_ssh "$ACTIVE_HOST" "hostname" | tr -d '\r' | tail -n1)"
+RUN_TS="$(date +%Y%m%d_%H%M%S)"
+LOCAL_LOG_FILE="${LOCAL_LOG_DIR}/atvm_configuration_${REMOTE_HOSTNAME}_${RUN_TS}.log"
+
+echo "Collecting remote log: ${REMOTE_LOG_FILE}"
+run_scp_from_remote "$ACTIVE_HOST" "$REMOTE_LOG_FILE" "$LOCAL_LOG_FILE"
+
+REMOTE_HASH="$(run_ssh "$ACTIVE_HOST" "sha256sum '$REMOTE_LOG_FILE' | awk '{print \$1}'" | tr -d '\r' | tail -n1)"
+LOCAL_HASH="$(sha256sum "$LOCAL_LOG_FILE" | awk '{print $1}')"
+
+if [[ "$REMOTE_HASH" != "$LOCAL_HASH" ]]; then
+    echo "ERROR: Hash mismatch after log copy." >&2
+    echo "Remote: $REMOTE_HASH" >&2
+    echo "Local:  $LOCAL_HASH" >&2
+    exit 1
+fi
+
+HAS_ERRORS_IN_LOG=false
+# Match only real error log records. Do not match instructional text that mentions "[ERROR]".
+if run_ssh "$ACTIVE_HOST" "grep -Eq '^\\[ERROR\\]' '$REMOTE_LOG_FILE'"; then
+    HAS_ERRORS_IN_LOG=true
+fi
+
+if [[ "$HAS_ERRORS_IN_LOG" == true ]]; then
+    echo "WARNING: [ERROR] entries detected in remote log. VM will remain powered on for manual inspection."
+else
+    echo "Log indicates success with no [ERROR] entries. Powering off ${ACTIVE_HOST}."
+    set +e
+    run_ssh "$ACTIVE_HOST" "shutdown -h now"
+    shutdown_status=$?
+    set -e
+    if (( shutdown_status != 0 )); then
+        echo "INFO: Shutdown command returned non-zero (${shutdown_status}); this can occur if SSH disconnects during shutdown."
+    fi
+
+    echo "Waiting for ${ACTIVE_HOST} to go offline (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
+    if wait_for_host_offline "$ACTIVE_HOST"; then
+        echo "Power-off confirmed: ${ACTIVE_HOST} is offline."
+    else
+        echo "WARNING: Could not confirm ${ACTIVE_HOST} offline within timeout."
+    fi
+fi
+
+echo "Success"
+echo "Active host: ${ACTIVE_HOST}"
+echo "Local log: ${LOCAL_LOG_FILE}"
+echo "SHA256: ${LOCAL_HASH}"