Initial commit

This commit is contained in:
2026-03-11 15:19:25 -04:00
commit 93b6d7acb8
16 changed files with 4454 additions and 0 deletions

162
atvm/AGENTS.md Normal file
View File

@@ -0,0 +1,162 @@
# ATVM AGENTS Guide
This file defines how to operate and maintain the ATVM folder workflows.
It is rebuilt from current files in `/home/aw/code/cds/atvm`.
## Scope
Two operational tracks exist in this folder:
- Setup/bootstrap track:
- `atvm-setup-script.sh`
- `run-atvm-setup-and-collect-log.sh`
- `atvm-setup-script-guide.md`
- `atvm-setup-script-runs.md`
- Cypress automation track:
- `atvm-automation-guide.md`
- `atvm-automation-examples.md`
- `atvm-automation-runs.md`
Reference/inventory material:
- `cypress-automation-for-cmc.md`
- `cypress-automation-for-cmc.md:Zone.Identifier`
## File Roles
- `*-guide.md` files:
- Guide-only procedures, rules, defaults, and checklists.
- No dated or one-off run examples.
- `*-runs.md` files:
- Run-specific learnings only when a run introduces new information.
- No routine/no-change run logs.
- `*-examples.md` files:
- Reusable command examples and commonly used option combinations.
- Keep generic; avoid dated one-off run outcomes.
## Setup Track: Required Behavior
Use `atvm-setup-script-guide.md` as the procedure source and keep behavior aligned with `atvm-setup-script.sh`.
### Safety-Critical Rules
1. Never run setup without operator-provided `--expected-ip` and `--expected-hostname`.
2. Never infer expected hostname from target host output.
3. Stop immediately on hostname mismatch or expected-IP-not-assigned.
4. Keep static IP configuration as a final step to avoid mid-run connection loss.
### Canonical Setup Order
1. Parse args.
2. Validate host identity.
3. Check sudo/privileges.
4. Fix repositories.
5. Configure Ubuntu root SSH/password workflow (Ubuntu only).
6. Install sudo if needed.
7. Configure Oracle default non-UEK kernel (Oracle Linux only).
8. Disable Ubuntu auto-upgrades (Ubuntu only).
9. Run package cleanup/install.
10. Disable SELinux (RHEL-family).
11. Configure static IP.
12. Print summary.
13. Reboot + post-reboot SELinux verifier when applicable.
14. Keep client on until controller log copy + SHA256 verification completes.
15. Power off only after verified success and no real error log lines.
### Setup Defaults
- ATVM static IP target: `192.168.3.191/22`
- Gateway: `192.168.0.1`
- DNS: `8.8.8.8`, `8.8.4.4`
- Ubuntu root SSH workflow credential in docs/script: `root / cdsi2012`
- Client log file: `atvm_setup_script.log` (typically `/root/atvm_setup_script.log` when run as root)
### Setup Controller Wrapper Rules
- Wrapper supports:
- run-and-collect (default)
- `--collect-after-complete`
- `run-and-collect` requires env vars:
- `EXPECTED_IP_ARG`
- `EXPECTED_HOSTNAME_ARG`
- Wrapper validates success marker and SHA256 before success.
- Wrapper powers off only when log has no lines matching `^\[ERROR\]`.
## Cypress Automation Track: Required Behavior
Use `atvm-automation-guide.md` as the execution source.
Use `atvm-automation-examples.md` as the common options/command reference.
### Controller Client
- Hostname: `atvm-cypres-vm-1`
- IP: `192.168.3.190`
- Credentials: `root / atvmcdsi2012`
### Mandatory Run Control
1. Before planning a new run, check for active automation processes.
2. Report running/not-running status.
3. If running, ask before termination; terminate only with explicit approval.
4. Always show exact planned command(s) before execution.
5. Execute only after explicit approval.
6. If monitoring is not requested, report immediate command success/failure and any errors.
7. Monitor completion only when explicitly requested by the operator.
8. For monitored runs, allow long runtime windows (15-30+ minutes or longer) and continue until completion unless operator instructs otherwise.
9. Do not terminate monitored runs unless the operator explicitly instructs termination.
### Status Request Format
When the operator asks for run status, report in this order:
1. Heading/title using the run `build_name`.
2. Completed machines with pass/fail state for each machine.
3. Skipped machines with reason.
4. Remaining machines still to run.
5. Summary counts for finished, passed, failed, and skipped machines.
6. Timing details:
- start time
- end time if complete
- total run time if complete, or elapsed run time if still running
- quickest completed test runtime
- longest completed test runtime
- average completed test runtime
7. Estimated completion time.
Status details:
- Use the live run log on the automation VM when available.
- Use the run `build_name` as the heading/title when available.
- Show blacklisted machines under skipped machines when they are part of the requested scope.
- Show in-progress machines under remaining machines as `RUNNING`.
- Show not-yet-started machines as `NOT STARTED`.
- Use completed spec results already recorded in the log to determine machine pass/fail state.
- For failed machines, include the failure reason from the run log in the status output.
- Include start time in status output when it can be derived from the log.
- Include end time and total runtime for completed runs, or elapsed runtime for active runs.
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the log.
### Automation Blacklist
Always exclude these machines with `--exclude_partial_match` when building ATVM automation commands.
CMC install blacklist (`BLACKLISTED: CMC INSTALL - CAN'T COMPILE`):
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
Support-request blacklist (`BLACKLISTED: SUPPORT REQUEST - WAITING`):
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
Re-create blacklist:
- `atvm157-debian13.0.0`
### Operator Preferences
- Do not include Gold Disk IDs in `--build_name`.
- `--build_name` must not contain spaces; use `-` between words.
- Prefer distro-scoped filtering (for example `--containsVm redhat9`) when possible.
## Update Policy (Both Tracks)
After each run:
- Update corresponding `*-guide.md` only if workflow/rules/default behavior changed.
- Update corresponding `*-examples.md` when common command patterns/options change.
- Update corresponding `*-runs.md` only if the run produced new learning.
## Path and Naming Consistency Note
Current repo filenames use hyphen style, but some script text/defaults still show underscore-style paths (for example `atvm_setup_script.sh`, `run_atvm_setup_and_collect_log.sh`, `/home/aw/code/atvm`).
When operating:
1. Use actual filesystem paths in this repo first (`/home/aw/code/cds/atvm/...`).
2. If script defaults are used, verify they match existing files before execution.
3. If changing path conventions, update scripts and guides in the same change.
## Non-Goals
- Do not treat `cypress-automation-for-cmc.md` as executable runbook logic.
- Do not record secrets/tokens into new guide or runs entries.

View File

@@ -0,0 +1,97 @@
## Examples
- `--build_name` values must not include spaces; use `-` between words.
- Add the maintained blacklist to `--exclude_partial_match` for runs that use broad selection or randomization.
- Maintained blacklist:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
- `atvm157-debian13.0.0`
### E2E: Pure iscsi+fc with specific VMs
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --specify_vms atvm3-ubuntu18.04 atvm109-w2k12R2; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-pure-plugin
```
### E2E: Infinibox fc with specific VMs
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type infinibox --use_specified_plugin fc --specify_vms atvm51-redhat6.10 atvm110-w2k16; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-infinibox-plugin
```
### E2E: Regular cutover
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --specify_vms atvm93-oracle7.9 atvm111-w2k19 --regular_cutover; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-regular-cutover
```
### Reboot test
```bash
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm37-rocky8.8 atvm112-w2k22 --wait_for_power_on 120; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-reboot
```
### SystemOS test
```bash
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --specify_vms atvm118-oracle9.3 atvm145-w2k25; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-systemOS
```
### MigrateOPS test
```bash
python3 cmc-templates.py --template cmc-migrateops --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm139-redhat9.5 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-migrateOPS
```
### Compute MigrateOPS: vmware
```bash
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms vmware --test_partition --specify_vms atvm138-oracle9.4-opt atvm112-w2k22 --set_static_ip_dest; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-vmware
```
### Compute MigrateOPS: ovirt
```bash
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms ovirt --test_partition --specify_vms atvm124-redhat8.8 atvm111-w2k19 --set_static_ip_dest; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-ovirt
```
### Group consistency
```bash
python3 cmc-templates.py --template cmc-group-consistency --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm4-ubuntu20.04 atvm112-w2k22 --enable_uuid; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-consistentyGroup
```
### H2H same platform
```bash
python3 cmc-templates.py --template cmc-h2h-same-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm38-rocky9.0 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hSamePlatform
```
### H2H different platform
```bash
python3 cmc-templates.py --template cmc-h2h-diff-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm65-redhat8.3 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hDifferentPlatform
```
### Randomized reboot sanity
```bash
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0 --wait_for_power_on 120; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-reboot-iscsi
```
### Randomized e2e sanity
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-e2e
```
### Randomized systemOS sanity
```bash
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --randomize 1 --exclude_partial_match suse15.0 fedora34 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-systemOS
```

View File

@@ -0,0 +1,166 @@
# Run ATVM Automation Guide
This file is guide-only documentation for operating ATVM CMC automation.
Do not put specific run examples here.
For reusable command examples and common option combinations, use `atvm-automation-examples.md`.
## Purpose
Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.
## ATVM Cypress Automation Controller Client
- Hostname: `atvm-cypres-vm-1`
- IP: `192.168.3.190`
- Credentials: `root / atvmcdsi2012`
## Operating Constraints
- Run only scripts/commands explicitly requested.
- Do not make manual system configuration changes on the client.
- Do not edit client files unless explicitly requested.
## Operator Preferences
- Do not include Gold Disk identifiers in `--build_name`.
- `--build_name` must not contain spaces; use `-` between words.
- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
- Before preparing a new run, always check whether automation is already running.
- Always report whether automation is currently running.
- If running, ask whether to terminate; terminate only with explicit approval.
- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
- Before any run, always show exact planned command(s) and wait for explicit approval.
- Execute only after explicit approval (for example `approve`).
- After execution, report immediate success/failure only.
- Do not actively monitor completion unless explicitly requested.
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
- Report command errors immediately.
- `sshpass` may be used where password-based SSH automation is required.
## Core Scripts
- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
- Test execution: `./run-sorry-cypress.py`
Typical sequence:
1. Run `cmc-templates.py` with requested template/options.
2. Run `run-sorry-cypress.py` with matching config and build name.
## Config File / Gold Disk Mapping
- `cypress.atvm-config-gold.ts` -> Gold Disk 1
- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
- Additional numbered config variants map to corresponding Gold Disks.
## Available Templates
- `cmc-e2e`
- `cmc-group-consistency`
- `cmc-h2h-diff-platf`
- `cmc-h2h-same-platf`
- `cmc-migrateops`
- `cmc-migrateops-compute-migration`
- `cmc-reboot`
- `cmc-systemOS`
## Command Pattern
```bash
python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces>
```
## Examples Reference
- Commonly used command examples: `atvm-automation-examples.md`
- Keep this guide focused on run-control rules and workflow constraints.
## Example Option Patterns (Guide-Only)
- Distro-scoped VM selection:
- `--containsVm redhat`
- `--containsVm redhat9`
- Explicit VM selection:
- `--specify_vms <vm1> <vm2> ...`
- Compute migrateops platform:
- `--vm_platforms vmware|ovirt|openshift|proxmox`
## Blacklisted Machines
Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.
Permanently blacklisted because CMC cannot compile:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
Temporarily blacklisted while support requests are waiting:
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
Temporarily blacklisted until re-created:
- `atvm157-debian13.0.0`
Preferred exclude list:
- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0`
## Running-Automation Check (Mandatory)
Before any new automation request:
1. SSH to `root@192.168.3.190`.
2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
3. Report:
- `Running` with process details, or
- `Not running`.
4. If `Running`, ask operator whether to terminate.
5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
6. If termination is not approved, do not start a new run.
## Approval Workflow (Mandatory)
1. Build exact command(s) for the request.
2. Present them verbatim as planned commands.
3. Wait for explicit approval.
4. Run only approved command(s), no extra options.
5. If monitoring was not requested, report immediate success/failure for each command.
6. If monitoring was requested, keep monitoring until completion and report final outcome.
## Requested Test Style
When asked for one VM or a VM set:
- choose requested template/options,
- choose correct config file for intended Gold Disk,
- use a descriptive `--build_name` without Gold Disk IDs.
## Update Rule
- After each run, update this guide only for workflow/rule/default changes.
- Update `atvm-automation-examples.md` for reusable command/option examples.
- Add run-specific learnings only to `atvm-automation-runs.md` when the run produced new information.
## Monitoring Policy
- Monitor only when the operator explicitly asks to monitor.
- If monitoring was not requested, run commands and report execution success/failure and any errors.
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
## Status Reporting Format
When the operator asks for the status of an ATVM automation run, report in this order:
1. Heading/title using the run `build_name`.
2. Completed machines with pass/fail state for each machine.
3. Skipped machines with reason.
4. Remaining machines still to run.
5. Summary counts for finished, passed, failed, and skipped machines.
6. Timing details:
- start time
- end time if complete
- total run time if complete, or elapsed run time if still running
- quickest completed test runtime
- longest completed test runtime
- average completed test runtime
7. Estimated completion time.
Status-report expectations:
- Use the live automation VM state when available.
- Derive the heading/title from the run `build_name` when available.
- Derive completed-machine status from completed spec results already written to the run log.
- Include the run start time in every status response when it can be derived from the run log.
- If the run is complete, include the end time and total run time.
- If the run is still active, include the elapsed run time so far.
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
- For skipped machines, include the reason category:
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
- `BLACKLISTED: SUPPORT REQUEST - WAITING`
- `BLACKLISTED: RE-CREATE NEEDED`
- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
- If no failures are present in completed spec results, report those completed machines as `PASS`.
- If a completed spec result shows a failure, report that machine as `FAIL` and include the failure reason from the run log.
- Base the completion estimate on the current remaining machine count and recent per-machine runtime visible in the run log.

View File

@@ -0,0 +1,47 @@
# Run ATVM Automation Runs
This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.
## Entry Rule
- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
- Do not add routine runs with no new learning.
## Current State
- No run-learning entries recorded yet from `atvm-automation-guide.md` source material.
## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
- Request:
- template: `cmc-e2e`
- filter: `--containsVm redhat9.7`
- integration: `--integration_type pure`
- plugin: `--use_specified_plugin fc`
- Observed result:
- Cypress spec execution passed (`1` test, `1` passing, `0` failing).
- Cloud run URL was produced and marked uploaded.
- `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
- Action for future runs:
- If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
- Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.
## Run Learning: 2026-03-09 (Blacklist handling and status format)
- Observed requirement:
- Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
- Machines to blacklist via `--exclude_partial_match`:
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
- `BLACKLISTED: SUPPORT REQUEST - WAITING`:
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
- Needs re-creation:
- `atvm157-debian13.0.0`
- Action for future runs:
- Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
- When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
- Use the run `build_name` as the heading/title for status responses so the test type is obvious.
- For failed machines in status responses, include the failure reason taken from the run log.
- Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
- Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.

View File

@@ -0,0 +1,165 @@
# ATVM Setup Script Guide
This file is guide-only documentation for running and maintaining the ATVM setup workflow.
Do not put dated run examples here.
## Scope
- Client setup script: `/home/aw/code/cds/atvm/atvm-setup-script.sh`
- Controller wrapper: `/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh`
- Run-learnings log: `/home/aw/code/cds/atvm/atvm-setup-script-runs.md`
## Purpose
The setup flow performs a controlled bootstrap across supported Linux distributions:
1. Validate target host identity using expected IP + expected hostname before any configuration.
2. Fix repositories (especially CD/DVD media repo entries).
3. On Ubuntu, configure root SSH password-login workflow (`root/cdsi2012`) for follow-on root operations.
4. On Oracle Linux, set default boot kernel to non-UEK when available.
5. Disable unattended auto-upgrades on Ubuntu.
6. Remove specific storage-related packages and install base tooling.
7. Disable SELinux on Red Hat-family systems.
8. Configure static IP as the final step.
9. Print final summary and write logs to `atvm_setup_script.log`.
10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
12. Power off from controller only after successful verification and no setup errors.
## Execution Model
- Shell safety flags: `set -euo pipefail`
- Logging: colorized console + plain text log file
- Entry point: `main "$@"`
- Default operator assumption for setup access: `root / cdsi2012` unless explicitly overridden.
## Mandatory Identity Gate
Setup must not start unless operator explicitly provides both values:
- `--expected-ip <ip>`
- `--expected-hostname <hostname>`
Rules:
- Connect to the operator-provided target IP directly.
- Do not pre-scan alternate candidate IPs.
- Do not infer hostname from target.
- If hostname is missing from request, stop and ask for it.
- If detected hostname does not exactly match expected hostname, stop immediately.
- If expected IP is not assigned on target, stop immediately.
## Canonical Run Order
1. `parse_args`
2. `validate_target_host_identity`
3. `check_sudo`
4. `fix_repositories`
5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
6. `install_sudo_if_needed`
7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
9. `run_package_installation`
10. `disable_selinux` (RHEL-family only)
11. `configure_static_ip` (final configuration step)
12. `print_final_summary`
13. `reboot_and_verify_selinux_if_needed`
14. `poweroff_client_if_successful` (controller-driven after verification)
## Core Behavior By Step
### Repository Fix
- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
- Fedora: same model via `dnf clean all && dnf makecache`.
- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
### Oracle Linux Kernel Handling
- Oracle Linux only.
- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
- Track whether default changed and whether reboot is required.
### Ubuntu Root SSH Workflow
- Ubuntu only.
- Set root password `cdsi2012`, unlock root account.
- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
- Validate config and restart SSH service.
### Ubuntu Auto-Upgrade Disable
- Ubuntu only.
- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
### Package Installation
- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
- Pre-cleanup removes multipath/iSCSI packages where applicable.
- Installs kernel headers per distro.
- Base package set includes:
`curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
### SELinux Disable
- RHEL-family only.
- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
- Marks reboot recommendation/requirement in summary.
### Static IP Configuration (Final Step)
Hardcoded target values:
- IP: `192.168.3.191`
- Prefix: `22`
- Gateway: `192.168.0.1`
- DNS: `8.8.8.8`, `8.8.4.4`
Interface detection priority:
1. default-route interface
2. first non-loopback interface with IPv4
3. first non-loopback interface from link list
Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
### SELinux Reboot Verification
- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
- Creates one-time systemd verifier service before reboot.
- Post-reboot service records runtime `getenforce` and self-removes.
- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
- On errors, leaves client on for manual inspection.
## Power-State Rules
- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
- If any real error lines exist, keep client powered on.
## Logging and Verification
- Client log filename: `atvm_setup_script.log`
- Common client log path when run as root: `/root/atvm_setup_script.log`
- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
Required post-run validation:
1. Copy client log to controller `atvm/log/` path.
2. Compare SHA256 between client and copied controller log.
3. Require exact match.
## Preferred Execution Commands
Direct client execution:
```bash
sudo bash /home/cirrususer/atvm-setup-script.sh \
--expected-ip <current-client-ip> \
--expected-hostname <exact-hostname>
```
Controller run + collect:
```bash
EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh
```
Controller collect-only after client run:
```bash
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh --collect-after-complete
```
## Troubleshooting
- If local collected log is missing, do not rerun full setup just for log recovery.
- Use collect-only mode and verify SHA256 after copy.
- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
## Operational Caveats
- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
- Static IP values are hardcoded; adjust before use in other environments.
- Run in maintenance windows because network changes can interrupt active sessions.
- Preserve host identity gating; do not weaken expected IP/hostname checks.
## Update Rule
- After each run, update this file only for guide/rule/checklist/default behavior changes.
- Put run-specific outcomes in `atvm-setup-script-runs.md` only when the run produced a new learning.

View File

@@ -0,0 +1,40 @@
# ATVM Setup Script Runs
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, exposed a new failure mode, or confirmed a new required check.
- Do not add routine runs with no new learning.
## Run Learning: 2026-03-03 (Ubuntu 24.04)
- Environment:
- Initial IP: `192.168.0.89`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm-1`
- Learning:
- Root SSH password workflow (`root/cdsi2012`) and log copy/hash verification path are valid end-to-end.
- Wrapper must enforce identity arguments for run-and-collect mode.
- Action for future runs:
- Require `EXPECTED_IP_ARG` and `EXPECTED_HOSTNAME_ARG` for wrapper run-and-collect.
## Run Learning: 2026-03-05 (RHEL 9)
- Environment:
- Initial IP: `192.168.3.212`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm-2`
- Learning:
- SELinux disable path with reboot + post-reboot verifier worked.
- Auto power-off can race controller-side log collection if done too early.
- Action for future runs:
- Keep client powered on until controller log copy + SHA256 verification completes.
- Only then perform controller-side power-off when no real error lines are present.
## Run Learning: 2026-03-06 (Oracle Linux 9)
- Environment:
- Initial IP: `192.168.0.121`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm`
- Learning:
- Wrapper auto power-off was blocked by false-positive error detection from instructional text.
- Action for future runs:
- Match only real error log lines using `^\[ERROR\]` for power-off gating.

1867
atvm/atvm-setup-script.sh Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -0,0 +1,228 @@
#!/usr/bin/env bash
set -euo pipefail
REMOTE_IP_PRIMARY="${REMOTE_IP_PRIMARY:-192.168.0.121}"
REMOTE_IP_SECONDARY="${REMOTE_IP_SECONDARY:-192.168.3.191}"
REMOTE_USER="${REMOTE_USER:-root}"
PROJECT_DIR="${PROJECT_DIR:-/home/aw/code/atvm}"
LOCAL_LOG_DIR="${LOCAL_LOG_DIR:-$PROJECT_DIR/log}"
LOCAL_SETUP_SCRIPT="${LOCAL_SETUP_SCRIPT:-$PROJECT_DIR/atvm_setup_script.sh}"
REMOTE_SETUP_SCRIPT="${REMOTE_SETUP_SCRIPT:-/root/atvm_setup_script.sh}"
REMOTE_LOG_FILE="${REMOTE_LOG_FILE:-/root/atvm_setup_script.log}"
WAIT_TIMEOUT_SECONDS="${WAIT_TIMEOUT_SECONDS:-600}"
MODE="${1:-run-and-collect}"
EXPECTED_IP_ARG="${EXPECTED_IP_ARG:-}"
EXPECTED_HOSTNAME_ARG="${EXPECTED_HOSTNAME_ARG:-}"
SSH_OPTS=(-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5)
if [[ ! -f "$LOCAL_SETUP_SCRIPT" ]]; then
echo "ERROR: Local setup script not found: $LOCAL_SETUP_SCRIPT" >&2
exit 1
fi
mkdir -p "$LOCAL_LOG_DIR"
if ! command -v ssh >/dev/null 2>&1 || ! command -v scp >/dev/null 2>&1; then
echo "ERROR: ssh/scp is required." >&2
exit 1
fi
SSH_CMD=(ssh "${SSH_OPTS[@]}")
SCP_CMD=(scp "${SSH_OPTS[@]}")
if [[ -n "${ATVM_PASSWORD:-}" ]]; then
if command -v sshpass >/dev/null 2>&1; then
SSH_CMD=(sshpass -p "$ATVM_PASSWORD" ssh "${SSH_OPTS[@]}")
SCP_CMD=(sshpass -p "$ATVM_PASSWORD" scp "${SSH_OPTS[@]}")
else
echo "WARNING: ATVM_PASSWORD is set, but sshpass is not installed. Falling back to interactive password prompts."
fi
fi
run_ssh() {
local host="$1"
shift
"${SSH_CMD[@]}" "${REMOTE_USER}@${host}" "$@"
}
run_scp_to_remote() {
local src="$1"
local host="$2"
local dst="$3"
"${SCP_CMD[@]}" "$src" "${REMOTE_USER}@${host}:${dst}"
}
run_scp_from_remote() {
local host="$1"
local src="$2"
local dst="$3"
"${SCP_CMD[@]}" "${REMOTE_USER}@${host}:${src}" "$dst"
}
wait_for_reachable_host() {
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
pick_initial_host() {
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
return 1
}
wait_for_completed_task() {
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "test -f '$REMOTE_LOG_FILE' && grep -q 'SUCCESS: ATVM VM Setup Complete!' '$REMOTE_LOG_FILE'" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
wait_for_host_offline() {
local host="$1"
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
if ! run_ssh "$host" "echo still-up" >/dev/null 2>&1; then
return 0
fi
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
if [[ "$MODE" != "run-and-collect" && "$MODE" != "--collect-after-complete" ]]; then
echo "Usage:"
echo " $0 # run setup on client, then collect log"
echo " $0 --collect-after-complete # wait for completed client task, then collect log only"
exit 1
fi
if [[ "$MODE" == "run-and-collect" ]]; then
if [[ -z "$EXPECTED_IP_ARG" || -z "$EXPECTED_HOSTNAME_ARG" ]]; then
echo "ERROR: run-and-collect requires EXPECTED_IP_ARG and EXPECTED_HOSTNAME_ARG." >&2
echo "Example:" >&2
echo " EXPECTED_IP_ARG=192.168.0.121 EXPECTED_HOSTNAME_ARG=atvm-codextest-vm $0" >&2
exit 1
fi
INITIAL_HOST="$(pick_initial_host)" || {
echo "ERROR: Could not reach ${REMOTE_IP_PRIMARY} or ${REMOTE_IP_SECONDARY} for initial setup." >&2
exit 1
}
echo "Copying setup script to ${REMOTE_USER}@${INITIAL_HOST}:${REMOTE_SETUP_SCRIPT}"
run_scp_to_remote "$LOCAL_SETUP_SCRIPT" "$INITIAL_HOST" "$REMOTE_SETUP_SCRIPT"
echo "Running remote setup script on ${INITIAL_HOST} (disconnect is expected during IP/reboot steps)"
set +e
run_ssh "$INITIAL_HOST" "chmod +x '$REMOTE_SETUP_SCRIPT' && bash '$REMOTE_SETUP_SCRIPT' --expected-ip '$EXPECTED_IP_ARG' --expected-hostname '$EXPECTED_HOSTNAME_ARG'"
run_status=$?
set -e
if (( run_status != 0 )); then
echo "INFO: Remote run returned non-zero (${run_status}). Continuing because network reconfiguration/reboot can interrupt SSH."
fi
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
ACTIVE_HOST="$(wait_for_completed_task)" || {
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
exit 1
}
else
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
ACTIVE_HOST="$(wait_for_completed_task)" || {
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
exit 1
}
fi
echo "Host reachable at: ${ACTIVE_HOST}"
REMOTE_HOSTNAME="$(run_ssh "$ACTIVE_HOST" "hostname" | tr -d '\r' | tail -n1)"
RUN_TS="$(date +%Y%m%d_%H%M%S)"
LOCAL_LOG_FILE="${LOCAL_LOG_DIR}/atvm_configuration_${REMOTE_HOSTNAME}_${RUN_TS}.log"
echo "Collecting remote log: ${REMOTE_LOG_FILE}"
run_scp_from_remote "$ACTIVE_HOST" "$REMOTE_LOG_FILE" "$LOCAL_LOG_FILE"
REMOTE_HASH="$(run_ssh "$ACTIVE_HOST" "sha256sum '$REMOTE_LOG_FILE' | awk '{print \$1}'" | tr -d '\r' | tail -n1)"
LOCAL_HASH="$(sha256sum "$LOCAL_LOG_FILE" | awk '{print $1}')"
if [[ "$REMOTE_HASH" != "$LOCAL_HASH" ]]; then
echo "ERROR: Hash mismatch after log copy." >&2
echo "Remote: $REMOTE_HASH" >&2
echo "Local: $LOCAL_HASH" >&2
exit 1
fi
HAS_ERRORS_IN_LOG=false
# Match only real error log records. Do not match instructional text that mentions "[ERROR]".
if run_ssh "$ACTIVE_HOST" "grep -Eq '^\\[ERROR\\]' '$REMOTE_LOG_FILE'"; then
HAS_ERRORS_IN_LOG=true
fi
if [[ "$HAS_ERRORS_IN_LOG" == true ]]; then
echo "WARNING: [ERROR] entries detected in remote log. VM will remain powered on for manual inspection."
else
echo "Log indicates success with no [ERROR] entries. Powering off ${ACTIVE_HOST}."
set +e
run_ssh "$ACTIVE_HOST" "shutdown -h now"
shutdown_status=$?
set -e
if (( shutdown_status != 0 )); then
echo "INFO: Shutdown command returned non-zero (${shutdown_status}); this can occur if SSH disconnects during shutdown."
fi
echo "Waiting for ${ACTIVE_HOST} to go offline (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
if wait_for_host_offline "$ACTIVE_HOST"; then
echo "Power-off confirmed: ${ACTIVE_HOST} is offline."
else
echo "WARNING: Could not confirm ${ACTIVE_HOST} offline within timeout."
fi
fi
echo "Success"
echo "Active host: ${ACTIVE_HOST}"
echo "Local log: ${LOCAL_LOG_FILE}"
echo "SHA256: ${LOCAL_HASH}"