Initial commit

This commit is contained in:
2026-03-11 15:19:25 -04:00
commit 93b6d7acb8
16 changed files with 4454 additions and 0 deletions

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
log/

162
atvm/AGENTS.md Normal file
View File

@@ -0,0 +1,162 @@
# ATVM AGENTS Guide
This file defines how to operate and maintain the ATVM folder workflows.
It is rebuilt from current files in `/home/aw/code/cds/atvm`.
## Scope
Two operational tracks exist in this folder:
- Setup/bootstrap track:
- `atvm-setup-script.sh`
- `run-atvm-setup-and-collect-log.sh`
- `atvm-setup-script-guide.md`
- `atvm-setup-script-runs.md`
- Cypress automation track:
- `atvm-automation-guide.md`
- `atvm-automation-examples.md`
- `atvm-automation-runs.md`
Reference/inventory material:
- `cypress-automation-for-cmc.md`
- `cypress-automation-for-cmc.md:Zone.Identifier`
## File Roles
- `*-guide.md` files:
- Guide-only procedures, rules, defaults, and checklists.
- No dated or one-off run examples.
- `*-runs.md` files:
- Run-specific learnings only when a run introduces new information.
- No routine/no-change run logs.
- `*-examples.md` files:
- Reusable command examples and commonly used option combinations.
- Keep generic; avoid dated one-off run outcomes.
## Setup Track: Required Behavior
Use `atvm-setup-script-guide.md` as the procedure source and keep behavior aligned with `atvm-setup-script.sh`.
### Safety-Critical Rules
1. Never run setup without operator-provided `--expected-ip` and `--expected-hostname`.
2. Never infer expected hostname from target host output.
3. Stop immediately on hostname mismatch or expected-IP-not-assigned.
4. Keep static IP configuration as a final step to avoid mid-run connection loss.
### Canonical Setup Order
1. Parse args.
2. Validate host identity.
3. Check sudo/privileges.
4. Fix repositories.
5. Configure Ubuntu root SSH/password workflow (Ubuntu only).
6. Install sudo if needed.
7. Configure Oracle default non-UEK kernel (Oracle Linux only).
8. Disable Ubuntu auto-upgrades (Ubuntu only).
9. Run package cleanup/install.
10. Disable SELinux (RHEL-family).
11. Configure static IP.
12. Print summary.
13. Reboot + post-reboot SELinux verifier when applicable.
14. Keep client on until controller log copy + SHA256 verification completes.
15. Power off only after verified success and no real error log lines.
### Setup Defaults
- ATVM static IP target: `192.168.3.191/22`
- Gateway: `192.168.0.1`
- DNS: `8.8.8.8`, `8.8.4.4`
- Ubuntu root SSH workflow credential in docs/script: `root / cdsi2012`
- Client log file: `atvm_setup_script.log` (typically `/root/atvm_setup_script.log` when run as root)
### Setup Controller Wrapper Rules
- Wrapper supports:
- run-and-collect (default)
- `--collect-after-complete`
- `run-and-collect` requires env vars:
- `EXPECTED_IP_ARG`
- `EXPECTED_HOSTNAME_ARG`
- Wrapper validates success marker and SHA256 before success.
- Wrapper powers off only when log has no lines matching `^\[ERROR\]`.
## Cypress Automation Track: Required Behavior
Use `atvm-automation-guide.md` as the execution source.
Use `atvm-automation-examples.md` as the common options/command reference.
### Controller Client
- Hostname: `atvm-cypres-vm-1`
- IP: `192.168.3.190`
- Credentials: `root / atvmcdsi2012`
### Mandatory Run Control
1. Before planning a new run, check for active automation processes.
2. Report running/not-running status.
3. If running, ask before termination; terminate only with explicit approval.
4. Always show exact planned command(s) before execution.
5. Execute only after explicit approval.
6. If monitoring is not requested, report immediate command success/failure and any errors.
7. Monitor completion only when explicitly requested by the operator.
8. For monitored runs, allow long runtime windows (15-30+ minutes or longer) and continue until completion unless operator instructs otherwise.
9. Do not terminate monitored runs unless the operator explicitly instructs termination.
### Status Request Format
When the operator asks for run status, report in this order:
1. Heading/title using the run `build_name`.
2. Completed machines with pass/fail state for each machine.
3. Skipped machines with reason.
4. Remaining machines still to run.
5. Summary counts for finished, passed, failed, and skipped machines.
6. Timing details:
- start time
- end time if complete
- total run time if complete, or elapsed run time if still running
- quickest completed test runtime
- longest completed test runtime
- average completed test runtime
7. Estimated completion time.
Status details:
- Use the live run log on the automation VM when available.
- Use the run `build_name` as the heading/title when available.
- Show blacklisted machines under skipped machines when they are part of the requested scope.
- Show in-progress machines under remaining machines as `RUNNING`.
- Show not-yet-started machines as `NOT STARTED`.
- Use completed spec results already recorded in the log to determine machine pass/fail state.
- For failed machines, include the failure reason from the run log in the status output.
- Include start time in status output when it can be derived from the log.
- Include end time and total runtime for completed runs, or elapsed runtime for active runs.
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the log.
### Automation Blacklist
Always exclude these machines with `--exclude_partial_match` when building ATVM automation commands.
CMC install blacklist (`BLACKLISTED: CMC INSTALL - CAN'T COMPILE`):
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
Support-request blacklist (`BLACKLISTED: SUPPORT REQUEST - WAITING`):
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
Re-create blacklist:
- `atvm157-debian13.0.0`
### Operator Preferences
- Do not include Gold Disk IDs in `--build_name`.
- `--build_name` must not contain spaces; use `-` between words.
- Prefer distro-scoped filtering (for example `--containsVm redhat9`) when possible.
## Update Policy (Both Tracks)
After each run:
- Update corresponding `*-guide.md` only if workflow/rules/default behavior changed.
- Update corresponding `*-examples.md` when common command patterns/options change.
- Update corresponding `*-runs.md` only if the run produced new learning.
## Path and Naming Consistency Note
Current repo filenames use hyphen style, but some script text/defaults still show underscore-style paths (for example `atvm_setup_script.sh`, `run_atvm_setup_and_collect_log.sh`, `/home/aw/code/atvm`).
When operating:
1. Use actual filesystem paths in this repo first (`/home/aw/code/cds/atvm/...`).
2. If script defaults are used, verify they match existing files before execution.
3. If changing path conventions, update scripts and guides in the same change.
## Non-Goals
- Do not treat `cypress-automation-for-cmc.md` as executable runbook logic.
- Do not record secrets/tokens into new guide or runs entries.

View File

@@ -0,0 +1,97 @@
## Examples
- `--build_name` values must not include spaces; use `-` between words.
- Add the maintained blacklist to `--exclude_partial_match` for runs that use broad selection or randomization.
- Maintained blacklist:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
- `atvm157-debian13.0.0`
### E2E: Pure iscsi+fc with specific VMs
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --specify_vms atvm3-ubuntu18.04 atvm109-w2k12R2; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-pure-plugin
```
### E2E: Infinibox fc with specific VMs
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type infinibox --use_specified_plugin fc --specify_vms atvm51-redhat6.10 atvm110-w2k16; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-infinibox-plugin
```
### E2E: Regular cutover
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --specify_vms atvm93-oracle7.9 atvm111-w2k19 --regular_cutover; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-regular-cutover
```
### Reboot test
```bash
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm37-rocky8.8 atvm112-w2k22 --wait_for_power_on 120; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-reboot
```
### SystemOS test
```bash
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --specify_vms atvm118-oracle9.3 atvm145-w2k25; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-systemOS
```
### MigrateOPS test
```bash
python3 cmc-templates.py --template cmc-migrateops --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm139-redhat9.5 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-migrateOPS
```
### Compute MigrateOPS: vmware
```bash
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms vmware --test_partition --specify_vms atvm138-oracle9.4-opt atvm112-w2k22 --set_static_ip_dest; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-vmware
```
### Compute MigrateOPS: ovirt
```bash
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms ovirt --test_partition --specify_vms atvm124-redhat8.8 atvm111-w2k19 --set_static_ip_dest; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-ovirt
```
### Group consistency
```bash
python3 cmc-templates.py --template cmc-group-consistency --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm4-ubuntu20.04 atvm112-w2k22 --enable_uuid; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-consistentyGroup
```
### H2H same platform
```bash
python3 cmc-templates.py --template cmc-h2h-same-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm38-rocky9.0 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hSamePlatform
```
### H2H different platform
```bash
python3 cmc-templates.py --template cmc-h2h-diff-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm65-redhat8.3 atvm112-w2k22; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hDifferentPlatform
```
### Randomized reboot sanity
```bash
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0 --wait_for_power_on 120; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-reboot-iscsi
```
### Randomized e2e sanity
```bash
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-e2e
```
### Randomized systemOS sanity
```bash
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --randomize 1 --exclude_partial_match suse15.0 fedora34 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-systemOS
```

View File

@@ -0,0 +1,166 @@
# Run ATVM Automation Guide
This file is guide-only documentation for operating ATVM CMC automation.
Do not put specific run examples here.
For reusable command examples and common option combinations, use `atvm-automation-examples.md`.
## Purpose
Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.
## ATVM Cypress Automation Controller Client
- Hostname: `atvm-cypres-vm-1`
- IP: `192.168.3.190`
- Credentials: `root / atvmcdsi2012`
## Operating Constraints
- Run only scripts/commands explicitly requested.
- Do not make manual system configuration changes on the client.
- Do not edit client files unless explicitly requested.
## Operator Preferences
- Do not include Gold Disk identifiers in `--build_name`.
- `--build_name` must not contain spaces; use `-` between words.
- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
- Before preparing a new run, always check whether automation is already running.
- Always report whether automation is currently running.
- If running, ask whether to terminate; terminate only with explicit approval.
- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
- Before any run, always show exact planned command(s) and wait for explicit approval.
- Execute only after explicit approval (for example `approve`).
- After execution, report immediate success/failure only.
- Do not actively monitor completion unless explicitly requested.
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
- Report command errors immediately.
- `sshpass` may be used where password-based SSH automation is required.
## Core Scripts
- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
- Test execution: `./run-sorry-cypress.py`
Typical sequence:
1. Run `cmc-templates.py` with requested template/options.
2. Run `run-sorry-cypress.py` with matching config and build name.
## Config File / Gold Disk Mapping
- `cypress.atvm-config-gold.ts` -> Gold Disk 1
- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
- Additional numbered config variants map to corresponding Gold Disks.
## Available Templates
- `cmc-e2e`
- `cmc-group-consistency`
- `cmc-h2h-diff-platf`
- `cmc-h2h-same-platf`
- `cmc-migrateops`
- `cmc-migrateops-compute-migration`
- `cmc-reboot`
- `cmc-systemOS`
## Command Pattern
```bash
python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces>
```
## Examples Reference
- Commonly used command examples: `atvm-automation-examples.md`
- Keep this guide focused on run-control rules and workflow constraints.
## Example Option Patterns (Guide-Only)
- Distro-scoped VM selection:
- `--containsVm redhat`
- `--containsVm redhat9`
- Explicit VM selection:
- `--specify_vms <vm1> <vm2> ...`
- Compute migrateops platform:
- `--vm_platforms vmware|ovirt|openshift|proxmox`
## Blacklisted Machines
Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.
Permanently blacklisted because CMC cannot compile:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
Temporarily blacklisted while support requests are waiting:
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
Temporarily blacklisted until re-created:
- `atvm157-debian13.0.0`
Preferred exclude list:
- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0`
## Running-Automation Check (Mandatory)
Before any new automation request:
1. SSH to `root@192.168.3.190`.
2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
3. Report:
- `Running` with process details, or
- `Not running`.
4. If `Running`, ask operator whether to terminate.
5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
6. If termination is not approved, do not start a new run.
## Approval Workflow (Mandatory)
1. Build exact command(s) for the request.
2. Present them verbatim as planned commands.
3. Wait for explicit approval.
4. Run only approved command(s), no extra options.
5. If monitoring was not requested, report immediate success/failure for each command.
6. If monitoring was requested, keep monitoring until completion and report final outcome.
## Requested Test Style
When asked for one VM or a VM set:
- choose requested template/options,
- choose correct config file for intended Gold Disk,
- use a descriptive `--build_name` without Gold Disk IDs.
## Update Rule
- After each run, update this guide only for workflow/rule/default changes.
- Update `atvm-automation-examples.md` for reusable command/option examples.
- Add run-specific learnings only to `atvm-automation-runs.md` when the run produced new information.
## Monitoring Policy
- Monitor only when the operator explicitly asks to monitor.
- If monitoring was not requested, run commands and report execution success/failure and any errors.
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
## Status Reporting Format
When the operator asks for the status of an ATVM automation run, report in this order:
1. Heading/title using the run `build_name`.
2. Completed machines with pass/fail state for each machine.
3. Skipped machines with reason.
4. Remaining machines still to run.
5. Summary counts for finished, passed, failed, and skipped machines.
6. Timing details:
- start time
- end time if complete
- total run time if complete, or elapsed run time if still running
- quickest completed test runtime
- longest completed test runtime
- average completed test runtime
7. Estimated completion time.
Status-report expectations:
- Use the live automation VM state when available.
- Derive the heading/title from the run `build_name` when available.
- Derive completed-machine status from completed spec results already written to the run log.
- Include the run start time in every status response when it can be derived from the run log.
- If the run is complete, include the end time and total run time.
- If the run is still active, include the elapsed run time so far.
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
- For skipped machines, include the reason category:
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
- `BLACKLISTED: SUPPORT REQUEST - WAITING`
- `BLACKLISTED: RE-CREATE NEEDED`
- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
- If no failures are present in completed spec results, report those completed machines as `PASS`.
- If a completed spec result shows a failure, report that machine as `FAIL` and include the failure reason from the run log.
- Base the completion estimate on the current remaining machine count and recent per-machine runtime visible in the run log.

View File

@@ -0,0 +1,47 @@
# Run ATVM Automation Runs
This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.
## Entry Rule
- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
- Do not add routine runs with no new learning.
## Current State
- No run-learning entries recorded yet from `atvm-automation-guide.md` source material.
## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
- Request:
- template: `cmc-e2e`
- filter: `--containsVm redhat9.7`
- integration: `--integration_type pure`
- plugin: `--use_specified_plugin fc`
- Observed result:
- Cypress spec execution passed (`1` test, `1` passing, `0` failing).
- Cloud run URL was produced and marked uploaded.
- `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
- Action for future runs:
- If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
- Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.
## Run Learning: 2026-03-09 (Blacklist handling and status format)
- Observed requirement:
- Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
- Machines to blacklist via `--exclude_partial_match`:
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`
- `BLACKLISTED: SUPPORT REQUEST - WAITING`:
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`
- Needs re-creation:
- `atvm157-debian13.0.0`
- Action for future runs:
- Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
- When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
- Use the run `build_name` as the heading/title for status responses so the test type is obvious.
- For failed machines in status responses, include the failure reason taken from the run log.
- Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
- Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.

View File

@@ -0,0 +1,165 @@
# ATVM Setup Script Guide
This file is guide-only documentation for running and maintaining the ATVM setup workflow.
Do not put dated run examples here.
## Scope
- Client setup script: `/home/aw/code/cds/atvm/atvm-setup-script.sh`
- Controller wrapper: `/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh`
- Run-learnings log: `/home/aw/code/cds/atvm/atvm-setup-script-runs.md`
## Purpose
The setup flow performs a controlled bootstrap across supported Linux distributions:
1. Validate target host identity using expected IP + expected hostname before any configuration.
2. Fix repositories (especially CD/DVD media repo entries).
3. On Ubuntu, configure root SSH password-login workflow (`root/cdsi2012`) for follow-on root operations.
4. On Oracle Linux, set default boot kernel to non-UEK when available.
5. Disable unattended auto-upgrades on Ubuntu.
6. Remove specific storage-related packages and install base tooling.
7. Disable SELinux on Red Hat-family systems.
8. Configure static IP as the final step.
9. Print final summary and write logs to `atvm_setup_script.log`.
10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
12. Power off from controller only after successful verification and no setup errors.
## Execution Model
- Shell safety flags: `set -euo pipefail`
- Logging: colorized console + plain text log file
- Entry point: `main "$@"`
- Default operator assumption for setup access: `root / cdsi2012` unless explicitly overridden.
## Mandatory Identity Gate
Setup must not start unless operator explicitly provides both values:
- `--expected-ip <ip>`
- `--expected-hostname <hostname>`
Rules:
- Connect to the operator-provided target IP directly.
- Do not pre-scan alternate candidate IPs.
- Do not infer hostname from target.
- If hostname is missing from request, stop and ask for it.
- If detected hostname does not exactly match expected hostname, stop immediately.
- If expected IP is not assigned on target, stop immediately.
## Canonical Run Order
1. `parse_args`
2. `validate_target_host_identity`
3. `check_sudo`
4. `fix_repositories`
5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
6. `install_sudo_if_needed`
7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
9. `run_package_installation`
10. `disable_selinux` (RHEL-family only)
11. `configure_static_ip` (final configuration step)
12. `print_final_summary`
13. `reboot_and_verify_selinux_if_needed`
14. `poweroff_client_if_successful` (controller-driven after verification)
## Core Behavior By Step
### Repository Fix
- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
- Fedora: same model via `dnf clean all && dnf makecache`.
- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
### Oracle Linux Kernel Handling
- Oracle Linux only.
- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
- Track whether default changed and whether reboot is required.
### Ubuntu Root SSH Workflow
- Ubuntu only.
- Set root password `cdsi2012`, unlock root account.
- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
- Validate config and restart SSH service.
### Ubuntu Auto-Upgrade Disable
- Ubuntu only.
- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
### Package Installation
- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
- Pre-cleanup removes multipath/iSCSI packages where applicable.
- Installs kernel headers per distro.
- Base package set includes:
`curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
### SELinux Disable
- RHEL-family only.
- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
- Marks reboot recommendation/requirement in summary.
### Static IP Configuration (Final Step)
Hardcoded target values:
- IP: `192.168.3.191`
- Prefix: `22`
- Gateway: `192.168.0.1`
- DNS: `8.8.8.8`, `8.8.4.4`
Interface detection priority:
1. default-route interface
2. first non-loopback interface with IPv4
3. first non-loopback interface from link list
Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
### SELinux Reboot Verification
- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
- Creates one-time systemd verifier service before reboot.
- Post-reboot service records runtime `getenforce` and self-removes.
- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
- On errors, leaves client on for manual inspection.
## Power-State Rules
- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
- If any real error lines exist, keep client powered on.
## Logging and Verification
- Client log filename: `atvm_setup_script.log`
- Common client log path when run as root: `/root/atvm_setup_script.log`
- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
Required post-run validation:
1. Copy client log to controller `atvm/log/` path.
2. Compare SHA256 between client and copied controller log.
3. Require exact match.
## Preferred Execution Commands
Direct client execution:
```bash
sudo bash /home/cirrususer/atvm-setup-script.sh \
--expected-ip <current-client-ip> \
--expected-hostname <exact-hostname>
```
Controller run + collect:
```bash
EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh
```
Controller collect-only after client run:
```bash
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh --collect-after-complete
```
## Troubleshooting
- If local collected log is missing, do not rerun full setup just for log recovery.
- Use collect-only mode and verify SHA256 after copy.
- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
## Operational Caveats
- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
- Static IP values are hardcoded; adjust before use in other environments.
- Run in maintenance windows because network changes can interrupt active sessions.
- Preserve host identity gating; do not weaken expected IP/hostname checks.
## Update Rule
- After each run, update this file only for guide/rule/checklist/default behavior changes.
- Put run-specific outcomes in `atvm-setup-script-runs.md` only when the run produced a new learning.

View File

@@ -0,0 +1,40 @@
# ATVM Setup Script Runs
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, exposed a new failure mode, or confirmed a new required check.
- Do not add routine runs with no new learning.
## Run Learning: 2026-03-03 (Ubuntu 24.04)
- Environment:
- Initial IP: `192.168.0.89`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm-1`
- Learning:
- Root SSH password workflow (`root/cdsi2012`) and log copy/hash verification path are valid end-to-end.
- Wrapper must enforce identity arguments for run-and-collect mode.
- Action for future runs:
- Require `EXPECTED_IP_ARG` and `EXPECTED_HOSTNAME_ARG` for wrapper run-and-collect.
## Run Learning: 2026-03-05 (RHEL 9)
- Environment:
- Initial IP: `192.168.3.212`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm-2`
- Learning:
- SELinux disable path with reboot + post-reboot verifier worked.
- Auto power-off can race controller-side log collection if done too early.
- Action for future runs:
- Keep client powered on until controller log copy + SHA256 verification completes.
- Only then perform controller-side power-off when no real error lines are present.
## Run Learning: 2026-03-06 (Oracle Linux 9)
- Environment:
- Initial IP: `192.168.0.121`
- Final static IP: `192.168.3.191`
- Hostname: `atvm-codextest-vm`
- Learning:
- Wrapper auto power-off was blocked by false-positive error detection from instructional text.
- Action for future runs:
- Match only real error log lines using `^\[ERROR\]` for power-off gating.

1867
atvm/atvm-setup-script.sh Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -0,0 +1,228 @@
#!/usr/bin/env bash
set -euo pipefail
REMOTE_IP_PRIMARY="${REMOTE_IP_PRIMARY:-192.168.0.121}"
REMOTE_IP_SECONDARY="${REMOTE_IP_SECONDARY:-192.168.3.191}"
REMOTE_USER="${REMOTE_USER:-root}"
PROJECT_DIR="${PROJECT_DIR:-/home/aw/code/atvm}"
LOCAL_LOG_DIR="${LOCAL_LOG_DIR:-$PROJECT_DIR/log}"
LOCAL_SETUP_SCRIPT="${LOCAL_SETUP_SCRIPT:-$PROJECT_DIR/atvm_setup_script.sh}"
REMOTE_SETUP_SCRIPT="${REMOTE_SETUP_SCRIPT:-/root/atvm_setup_script.sh}"
REMOTE_LOG_FILE="${REMOTE_LOG_FILE:-/root/atvm_setup_script.log}"
WAIT_TIMEOUT_SECONDS="${WAIT_TIMEOUT_SECONDS:-600}"
MODE="${1:-run-and-collect}"
EXPECTED_IP_ARG="${EXPECTED_IP_ARG:-}"
EXPECTED_HOSTNAME_ARG="${EXPECTED_HOSTNAME_ARG:-}"
SSH_OPTS=(-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5)
if [[ ! -f "$LOCAL_SETUP_SCRIPT" ]]; then
echo "ERROR: Local setup script not found: $LOCAL_SETUP_SCRIPT" >&2
exit 1
fi
mkdir -p "$LOCAL_LOG_DIR"
if ! command -v ssh >/dev/null 2>&1 || ! command -v scp >/dev/null 2>&1; then
echo "ERROR: ssh/scp is required." >&2
exit 1
fi
SSH_CMD=(ssh "${SSH_OPTS[@]}")
SCP_CMD=(scp "${SSH_OPTS[@]}")
if [[ -n "${ATVM_PASSWORD:-}" ]]; then
if command -v sshpass >/dev/null 2>&1; then
SSH_CMD=(sshpass -p "$ATVM_PASSWORD" ssh "${SSH_OPTS[@]}")
SCP_CMD=(sshpass -p "$ATVM_PASSWORD" scp "${SSH_OPTS[@]}")
else
echo "WARNING: ATVM_PASSWORD is set, but sshpass is not installed. Falling back to interactive password prompts."
fi
fi
run_ssh() {
local host="$1"
shift
"${SSH_CMD[@]}" "${REMOTE_USER}@${host}" "$@"
}
run_scp_to_remote() {
local src="$1"
local host="$2"
local dst="$3"
"${SCP_CMD[@]}" "$src" "${REMOTE_USER}@${host}:${dst}"
}
run_scp_from_remote() {
local host="$1"
local src="$2"
local dst="$3"
"${SCP_CMD[@]}" "${REMOTE_USER}@${host}:${src}" "$dst"
}
wait_for_reachable_host() {
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
pick_initial_host() {
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
return 1
}
wait_for_completed_task() {
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
if run_ssh "$host" "test -f '$REMOTE_LOG_FILE' && grep -q 'SUCCESS: ATVM VM Setup Complete!' '$REMOTE_LOG_FILE'" >/dev/null 2>&1; then
echo "$host"
return 0
fi
done
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
wait_for_host_offline() {
local host="$1"
local start_ts current_ts elapsed
start_ts="$(date +%s)"
while true; do
if ! run_ssh "$host" "echo still-up" >/dev/null 2>&1; then
return 0
fi
current_ts="$(date +%s)"
elapsed=$((current_ts - start_ts))
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
return 1
fi
sleep 5
done
}
if [[ "$MODE" != "run-and-collect" && "$MODE" != "--collect-after-complete" ]]; then
echo "Usage:"
echo " $0 # run setup on client, then collect log"
echo " $0 --collect-after-complete # wait for completed client task, then collect log only"
exit 1
fi
if [[ "$MODE" == "run-and-collect" ]]; then
if [[ -z "$EXPECTED_IP_ARG" || -z "$EXPECTED_HOSTNAME_ARG" ]]; then
echo "ERROR: run-and-collect requires EXPECTED_IP_ARG and EXPECTED_HOSTNAME_ARG." >&2
echo "Example:" >&2
echo " EXPECTED_IP_ARG=192.168.0.121 EXPECTED_HOSTNAME_ARG=atvm-codextest-vm $0" >&2
exit 1
fi
INITIAL_HOST="$(pick_initial_host)" || {
echo "ERROR: Could not reach ${REMOTE_IP_PRIMARY} or ${REMOTE_IP_SECONDARY} for initial setup." >&2
exit 1
}
echo "Copying setup script to ${REMOTE_USER}@${INITIAL_HOST}:${REMOTE_SETUP_SCRIPT}"
run_scp_to_remote "$LOCAL_SETUP_SCRIPT" "$INITIAL_HOST" "$REMOTE_SETUP_SCRIPT"
echo "Running remote setup script on ${INITIAL_HOST} (disconnect is expected during IP/reboot steps)"
set +e
run_ssh "$INITIAL_HOST" "chmod +x '$REMOTE_SETUP_SCRIPT' && bash '$REMOTE_SETUP_SCRIPT' --expected-ip '$EXPECTED_IP_ARG' --expected-hostname '$EXPECTED_HOSTNAME_ARG'"
run_status=$?
set -e
if (( run_status != 0 )); then
echo "INFO: Remote run returned non-zero (${run_status}). Continuing because network reconfiguration/reboot can interrupt SSH."
fi
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
ACTIVE_HOST="$(wait_for_completed_task)" || {
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
exit 1
}
else
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
ACTIVE_HOST="$(wait_for_completed_task)" || {
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
exit 1
}
fi
echo "Host reachable at: ${ACTIVE_HOST}"
REMOTE_HOSTNAME="$(run_ssh "$ACTIVE_HOST" "hostname" | tr -d '\r' | tail -n1)"
RUN_TS="$(date +%Y%m%d_%H%M%S)"
LOCAL_LOG_FILE="${LOCAL_LOG_DIR}/atvm_configuration_${REMOTE_HOSTNAME}_${RUN_TS}.log"
echo "Collecting remote log: ${REMOTE_LOG_FILE}"
run_scp_from_remote "$ACTIVE_HOST" "$REMOTE_LOG_FILE" "$LOCAL_LOG_FILE"
REMOTE_HASH="$(run_ssh "$ACTIVE_HOST" "sha256sum '$REMOTE_LOG_FILE' | awk '{print \$1}'" | tr -d '\r' | tail -n1)"
LOCAL_HASH="$(sha256sum "$LOCAL_LOG_FILE" | awk '{print $1}')"
if [[ "$REMOTE_HASH" != "$LOCAL_HASH" ]]; then
echo "ERROR: Hash mismatch after log copy." >&2
echo "Remote: $REMOTE_HASH" >&2
echo "Local: $LOCAL_HASH" >&2
exit 1
fi
HAS_ERRORS_IN_LOG=false
# Match only real error log records. Do not match instructional text that mentions "[ERROR]".
if run_ssh "$ACTIVE_HOST" "grep -Eq '^\\[ERROR\\]' '$REMOTE_LOG_FILE'"; then
HAS_ERRORS_IN_LOG=true
fi
if [[ "$HAS_ERRORS_IN_LOG" == true ]]; then
echo "WARNING: [ERROR] entries detected in remote log. VM will remain powered on for manual inspection."
else
echo "Log indicates success with no [ERROR] entries. Powering off ${ACTIVE_HOST}."
set +e
run_ssh "$ACTIVE_HOST" "shutdown -h now"
shutdown_status=$?
set -e
if (( shutdown_status != 0 )); then
echo "INFO: Shutdown command returned non-zero (${shutdown_status}); this can occur if SSH disconnects during shutdown."
fi
echo "Waiting for ${ACTIVE_HOST} to go offline (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
if wait_for_host_offline "$ACTIVE_HOST"; then
echo "Power-off confirmed: ${ACTIVE_HOST} is offline."
else
echo "WARNING: Could not confirm ${ACTIVE_HOST} offline within timeout."
fi
fi
echo "Success"
echo "Active host: ${ACTIVE_HOST}"
echo "Local log: ${LOCAL_LOG_FILE}"
echo "SHA256: ${LOCAL_HASH}"

42
cdsmcp/AGENTS.md Normal file
View File

@@ -0,0 +1,42 @@
# AGENTS.md
This folder contains the VMware/vCenter + MigrateOps runbook for CDS MCP workflows.
## Files
- `esxvm.md`: index file only; points to guide and run-learnings docs.
- `esxvm-guide.md`: authoritative workflow/rules/checklists/default behavior.
- `esxvm-runs.md`: run-specific learnings, only when a run adds new information.
- `vmw.yaml`: base template for `MIGRATEOPS_VMWARE_COMPUTE` operations.
## Source Of Truth
- Use `esxvm-guide.md` for how to execute runs.
- Use `vmw.yaml` as the starting operation template.
- Treat `esxvm-runs.md` as evidence/history, not baseline procedure.
## Required Run Pattern
1. Confirm source VM in vCenter and power state before IP/SSH actions.
2. Prepare source host (CDC cleanup + CMC reinstall/registration) and verify source is connected in CDC.
3. Validate preflight requirements from `esxvm-guide.md` (integration, access node, destination name, datastore/host/network, source NIC).
4. Create MigrateOps from `vmw.yaml` with request-specific replacements.
5. Monitor continuously and auto-approve cutover unless user requests manual approval.
6. After terminal state:
- validate destination login (poll guest IP if needed),
- archive operation,
- run offline-host cleanup loop until source/helper cleanup conditions are satisfied,
- provide final read-only status listing for source/destination/access/helper across CDC and vCenter.
7. Ask user explicitly before deleting destination VM; never delete without same-run confirmation.
## VM Lookup Requirement
- Unless user explicitly asks otherwise, scope VM lookup/list responses to cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests, always include datastore name and VM notes/annotation in the response.
## Update Rules
- Update `esxvm-guide.md` only when workflow/rules/default behavior changes.
- Update `esxvm-runs.md` only when a run reveals a new learning/failure pattern/required check.
- Keep `esxvm.md` as a lightweight index.
## Environment Defaults
- vCenter: `192.168.0.201`
- Cluster scope: `QACL-ATVMCypressONLY` unless user overrides.
- Default CDC project: `Skidamarink`
- Default access node: `atvm-linux-h2h`

154
cdsmcp/esxvm-guide.md Normal file
View File

@@ -0,0 +1,154 @@
# ESX / vCenter Guide
This file is for workflow guidance only. Do not add specific run examples here.
## Update Rule
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
- Add run-specific examples and evidence to `esxvm-runs.md` only when that run produced a new learning.
## vCenter Access
- Address: `192.168.0.201`
- Username: `administrator@qalab.cdsi.local`
- Password: `CDSi101!`
- Standard CLI path: `/home/aw/.local/bin/govc`
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
## IP And Power-State Policy (Mandatory)
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
- Treat only these as stable references:
- `192.168.0.201` for vCenter login only
- `192.168.3.190` for ATVM Cypress automation only
- `192.168.3.191` as default ATVM target reference
- Any other VM IP must be obtained live from vCenter for that run only.
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
## Cluster Scope Rule
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
## Ignore VMs
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
## IP Lookup Rule
- If asked about an IP address, only check powered-on VMs.
## VM Lookup Response Rule
- Unless user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests (for example name/contains filters), always report:
- VM name
- datastore name
- VM notes/annotation
- include power state and IP when available
## Common VM Credentials
- Username: `root`
- Password: `cdsi2012`
## CMC Install/Uninstall Commands
### Default Project Rule
- Default project: `Skidamarink`
- Default registration code: `BZHKABCODZLIOK6RTAJ4`
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
- Use a different project code only when user explicitly requests it in that run.
### Skidamarink Install (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
```
### Skidamarink Install (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
```
### Uninstall (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
```
### Uninstall (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
```
### CMC Reinstall Fallback (RHEL 10)
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
- Do not continue migrateops create until source host is visible as connected in CDC.
## Status Output Format (Power-Off/Revert/Power-On)
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
- `Snapshot rollback completed`
- `VM [vm name] powered back on successfully`
- `Current IP: <ip>`
## VMware Compute MigrateOps Defaults
- Use `/home/aw/code/cds/cdsmcp/vmw.yaml` as the starting template.
- Default sequence for requested source machine:
- clean CDC state for that machine
- reinstall CMC Linux on that machine
- perform migration preflight and operation create
- If user provides a client name, replace consistently:
- `config.system_name`
- `migrateops_vmware_compute.compute.vm_name`
- operation `name`
- Validate `integration_name` is active in target project before create.
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
- Always discover `source_nic` from live source host networking.
## Approval and Monitoring Defaults
- Auto-approve cutover by default.
- Start monitoring immediately after operation create.
- Approve as soon as `final-synchronization` requests input.
- Skip auto-approval only if user explicitly asks for manual approval.
- Patience rule:
- if heartbeat/progress is advancing, keep waiting
- allow longer waits for helper deployment/registration steps
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
## Preflight Checklist
- Source host connected in CDC.
- Integration exists and is active in same project.
- `atvm-linux-h2h` powered on in vCenter.
- `atvm-linux-h2h` connected in same CDC project.
- Destination VM name does not already exist in vCenter.
- Destination datastore/host/network resolve in vCenter.
- `source_nic` discovered via SSH from source host.
## Post-Migration Validation and Cleanup Pattern
- Validate destination login before cleanup:
- get destination guest IP from vCenter
- verify SSH/login works
- if guest IP empty, keep polling and do not skip validation
- do not mark run complete before validation result is recorded
- Before deleting destination VM:
- always prompt user for explicit confirmation
- never delete destination VM without that confirmation in the same run
- For delete path:
- resolve source VM ID and destination VM ID separately
- abort if IDs match
- power off destination if needed
- delete destination by explicit VM ID
- verify destination removed and source still exists
- Always run project cleanup after terminal migration state:
- archive completed operation
- run global offline-host cleanup
- cleanup must target source VM named in current request only
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
- Completion gate:
- do not report run complete until archive + cleanup verification are done
- always provide read-only final listing for source, destination, access node, helper:
- CDC status (`present` or `cleaned up`)
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
## Default Behavior Contract
- Perform automatically on every VMware compute run:
- destination login validation
- operation archive
- offline-host cleanup and source/helper stale verification
- Still require explicit user confirmation before destination delete:
- always prompt
- if no confirmation, keep destination and record `deletion skipped by user`

47
cdsmcp/esxvm-runs.md Normal file
View File

@@ -0,0 +1,47 @@
# ESX / vCenter Run Learnings
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
- Do not add routine successful runs with no new learning.
## Run Learning: Operation 14208
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
## Run Learning: Operation 14213
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
## Run Learning: Operation 14214
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
- Action for future runs: rerun cleanup until stale entries are actually removed.
## Run Learning: Operation 14215
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
- Action for future runs:
- remove leftover helper artifacts before retry
- avoid manual helper power actions during active task execution
- keep waiting while heartbeats/progress still advance
## Run Learning: Operation 14216
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
## Run Learning: Operation 14218
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
## Run Learning: Operation 14221
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
- Action for future runs:
- always provide final source/destination/access/helper listing across CDC and vCenter
- keep destination delete as explicit user-confirmed step only
## Run Learning: Operation 14223
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
- Action for future runs:
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
- enforce `galaxy_complete_endpoint` before manual register
- proceed with migrateops only after source host is confirmed connected in CDC

10
cdsmcp/esxvm.md Normal file
View File

@@ -0,0 +1,10 @@
# ESX / vCenter Notes Index
This file is now an index only.
- Guide-only workflow and rules: `/home/aw/code/cds/cdsmcp/esxvm-guide.md`
- Run-specific learnings log: `/home/aw/code/cds/cdsmcp/esxvm-runs.md`
Update policy:
- After each run, update `esxvm-guide.md` only for guide/rule changes.
- After each run, update `esxvm-runs.md` only if the run produced a new learning.

109
cdsmcp/vmw.yaml Normal file
View File

@@ -0,0 +1,109 @@
#
# VMware Compute MigrateOps template
# Rules:
# 1) Replace all client references consistently:
# - config.system_name
# - migrateops_vmware_compute.compute.vm_name
# - operations[].name
# - cleanup targeting must use the source VM from the current user request only
# 1a) Default CMC migration sequence for any specified machine:
# - clean up CDC project state for that machine (remove stale/offline registration context)
# - reinstall CMC Linux on that machine
# - then perform migration setup/create
# 2) Verify integration_name is valid in the target CDC project before creating operation.
# 3) Default access node is "atvm-linux-h2h":
# - VM must be powered on in vCenter
# - CMC must be installed/connected in the same CDC project
# 4) Source NIC must be discovered from the source client (do not assume ens192).
# 5) Preflight checks before create:
# - confirm source VM power state in vCenter first; power on before IP discovery/SSH steps
# - destination vm_name must not already exist
# - datastore/host/network names must resolve in vCenter
# - source client + access node must both be connected in same CDC project
# - use only standard vCenter credentials/session for vCenter actions
# (do not use 192.168.3.190 for vCenter actions; reserved for Cypress ATVM automation)
# - IP handling policy:
# * 192.168.0.201 is vCenter only
# * 192.168.3.190 is ATVM automation only
# * 192.168.3.191 is default ATVM target reference
# * any other VM IP must be read live from vCenter for the current run only
# and must not be retained/reused as a future default
# 6) Post-submit approval behavior (default):
# - start monitoring as soon as operation create succeeds
# - auto-approve cutover immediately when final-synchronization requests approval
# - only use manual approval if explicitly requested by user
# - patience rule while monitoring:
# * if heartbeat/progress is advancing, keep waiting and do not intervene
# * allow longer wait windows for helper VM deploy/registration-related steps
# * intervene only on terminal failure, confirmed blocker, or prolonged no-progress
# 7) Post-migration validation and cleanup behavior (default):
# - verify SSH login to the newly migrated VM first (using vCenter guest IP)
# - if vCenter guest IP is initially empty, keep polling until available; do not skip login validation
# - never report run completion before destination login validation is recorded
# - only target the newly migrated VM for cleanup, never the source VM
# - resolve and compare source/destination VM IDs; abort cleanup if IDs match
# - prompt user for confirmation before power-off + delete of migrated VM
# - prompt user even if they did not explicitly ask for deletion in same request
# - never delete destination VM without explicit user confirmation in that run
# - archive the completed MigrateOps operation after migration reaches terminal state
# - mandatory: run global offline-host cleanup at end of successful runbook
# even if source host is offline (remove all offline CMC host records)
# - if source/helper entries are still connected in CDC, disconnect first
# (for example uninstall CMC on source/helper or power off/delete helper VM),
# then rerun offline-host cleanup until source/helper entries are removed
# - if CDC still shows source/helper as connected but last_checkin is stale after
# source/helper are already powered off/deleted, wait for heartbeat timeout and
# rerun offline-host cleanup in a loop until those entries are removed
# - verify source host + helper host stale/offline duplicates from this run are removed
# - verify helper CMC host entries from the run are removed
# (e.g. migrateops-<operation-id>-<source-system-name>)
# - if helper entry remains, ensure helper VM is absent/powered off and rerun offline cleanup
# - mandatory: remove the source VM from the current request from CDC host list during cleanup
# (do not reuse source VM names from prior runs)
# - mandatory post-run reporting: always include a read-only status listing for
# source VM, destination VM, access node, and helper VM across both CDC and vCenter
# with explicit present/cleaned-up state
# - do not report run completion until cleanup verification is done and destination VM
# deletion is either completed or explicitly skipped by user decision
# - default autonomous behavior for every run:
# * always perform login validation + archive + offline-host cleanup automatically
# * always prompt user before deleting destination VM and record explicit keep/delete decision
#
operations:
- recipe: "MIGRATEOPS_VMWARE_COMPUTE"
config:
migrateops_vmware_compute:
access_node:
system_name: "atvm-linux-h2h"
compute:
datastore: "AutomatedTest-VMBootImgComputeMigration-Gold"
host: "192.168.1.165"
datacenter: "CDSHQ-Eng"
vm_name: "atvm-codextest-vm-migrated"
migration:
qos_level: "RELENTLESS"
auto_resync_interval: "600s"
cmchelper:
network: "VM Network"
ip_config:
use_static_ip: true
address: "192.168.3.195/22"
dns_servers:
- "8.8.8.8"
gateway: "192.168.0.1"
content_library: "vc-cmchelper"
template_name: "vc-cmchelper-vm"
install_via_access_node: true
network:
adapters:
- network: "VM Network"
# Must be discovered from source host via SSH before create.
source_nic: "REPLACE_WITH_SOURCE_NIC"
transfer_ip: true
transfer_mac: false
adapter_type: "VMXNET3"
keep_source_powered_on: false
system_name: "atvm-codextest-vm"
integration_name: "vCenter201"
name: "atvm-codextest-vm"
notes: ""