Initial commit
This commit is contained in:
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
log/
|
||||||
162
atvm/AGENTS.md
Normal file
162
atvm/AGENTS.md
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
# ATVM AGENTS Guide
|
||||||
|
|
||||||
|
This file defines how to operate and maintain the ATVM folder workflows.
|
||||||
|
It is rebuilt from current files in `/home/aw/code/cds/atvm`.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
Two operational tracks exist in this folder:
|
||||||
|
- Setup/bootstrap track:
|
||||||
|
- `atvm-setup-script.sh`
|
||||||
|
- `run-atvm-setup-and-collect-log.sh`
|
||||||
|
- `atvm-setup-script-guide.md`
|
||||||
|
- `atvm-setup-script-runs.md`
|
||||||
|
- Cypress automation track:
|
||||||
|
- `atvm-automation-guide.md`
|
||||||
|
- `atvm-automation-examples.md`
|
||||||
|
- `atvm-automation-runs.md`
|
||||||
|
|
||||||
|
Reference/inventory material:
|
||||||
|
- `cypress-automation-for-cmc.md`
|
||||||
|
- `cypress-automation-for-cmc.md:Zone.Identifier`
|
||||||
|
|
||||||
|
## File Roles
|
||||||
|
- `*-guide.md` files:
|
||||||
|
- Guide-only procedures, rules, defaults, and checklists.
|
||||||
|
- No dated or one-off run examples.
|
||||||
|
- `*-runs.md` files:
|
||||||
|
- Run-specific learnings only when a run introduces new information.
|
||||||
|
- No routine/no-change run logs.
|
||||||
|
- `*-examples.md` files:
|
||||||
|
- Reusable command examples and commonly used option combinations.
|
||||||
|
- Keep generic; avoid dated one-off run outcomes.
|
||||||
|
|
||||||
|
## Setup Track: Required Behavior
|
||||||
|
Use `atvm-setup-script-guide.md` as the procedure source and keep behavior aligned with `atvm-setup-script.sh`.
|
||||||
|
|
||||||
|
### Safety-Critical Rules
|
||||||
|
1. Never run setup without operator-provided `--expected-ip` and `--expected-hostname`.
|
||||||
|
2. Never infer expected hostname from target host output.
|
||||||
|
3. Stop immediately on hostname mismatch or expected-IP-not-assigned.
|
||||||
|
4. Keep static IP configuration as a final step to avoid mid-run connection loss.
|
||||||
|
|
||||||
|
### Canonical Setup Order
|
||||||
|
1. Parse args.
|
||||||
|
2. Validate host identity.
|
||||||
|
3. Check sudo/privileges.
|
||||||
|
4. Fix repositories.
|
||||||
|
5. Configure Ubuntu root SSH/password workflow (Ubuntu only).
|
||||||
|
6. Install sudo if needed.
|
||||||
|
7. Configure Oracle default non-UEK kernel (Oracle Linux only).
|
||||||
|
8. Disable Ubuntu auto-upgrades (Ubuntu only).
|
||||||
|
9. Run package cleanup/install.
|
||||||
|
10. Disable SELinux (RHEL-family).
|
||||||
|
11. Configure static IP.
|
||||||
|
12. Print summary.
|
||||||
|
13. Reboot + post-reboot SELinux verifier when applicable.
|
||||||
|
14. Keep client on until controller log copy + SHA256 verification completes.
|
||||||
|
15. Power off only after verified success and no real error log lines.
|
||||||
|
|
||||||
|
### Setup Defaults
|
||||||
|
- ATVM static IP target: `192.168.3.191/22`
|
||||||
|
- Gateway: `192.168.0.1`
|
||||||
|
- DNS: `8.8.8.8`, `8.8.4.4`
|
||||||
|
- Ubuntu root SSH workflow credential in docs/script: `root / cdsi2012`
|
||||||
|
- Client log file: `atvm_setup_script.log` (typically `/root/atvm_setup_script.log` when run as root)
|
||||||
|
|
||||||
|
### Setup Controller Wrapper Rules
|
||||||
|
- Wrapper supports:
|
||||||
|
- run-and-collect (default)
|
||||||
|
- `--collect-after-complete`
|
||||||
|
- `run-and-collect` requires env vars:
|
||||||
|
- `EXPECTED_IP_ARG`
|
||||||
|
- `EXPECTED_HOSTNAME_ARG`
|
||||||
|
- Wrapper validates success marker and SHA256 before success.
|
||||||
|
- Wrapper powers off only when log has no lines matching `^\[ERROR\]`.
|
||||||
|
|
||||||
|
## Cypress Automation Track: Required Behavior
|
||||||
|
Use `atvm-automation-guide.md` as the execution source.
|
||||||
|
Use `atvm-automation-examples.md` as the common options/command reference.
|
||||||
|
|
||||||
|
### Controller Client
|
||||||
|
- Hostname: `atvm-cypres-vm-1`
|
||||||
|
- IP: `192.168.3.190`
|
||||||
|
- Credentials: `root / atvmcdsi2012`
|
||||||
|
|
||||||
|
### Mandatory Run Control
|
||||||
|
1. Before planning a new run, check for active automation processes.
|
||||||
|
2. Report running/not-running status.
|
||||||
|
3. If running, ask before termination; terminate only with explicit approval.
|
||||||
|
4. Always show exact planned command(s) before execution.
|
||||||
|
5. Execute only after explicit approval.
|
||||||
|
6. If monitoring is not requested, report immediate command success/failure and any errors.
|
||||||
|
7. Monitor completion only when explicitly requested by the operator.
|
||||||
|
8. For monitored runs, allow long runtime windows (15-30+ minutes or longer) and continue until completion unless operator instructs otherwise.
|
||||||
|
9. Do not terminate monitored runs unless the operator explicitly instructs termination.
|
||||||
|
|
||||||
|
### Status Request Format
|
||||||
|
When the operator asks for run status, report in this order:
|
||||||
|
1. Heading/title using the run `build_name`.
|
||||||
|
2. Completed machines with pass/fail state for each machine.
|
||||||
|
3. Skipped machines with reason.
|
||||||
|
4. Remaining machines still to run.
|
||||||
|
5. Summary counts for finished, passed, failed, and skipped machines.
|
||||||
|
6. Timing details:
|
||||||
|
- start time
|
||||||
|
- end time if complete
|
||||||
|
- total run time if complete, or elapsed run time if still running
|
||||||
|
- quickest completed test runtime
|
||||||
|
- longest completed test runtime
|
||||||
|
- average completed test runtime
|
||||||
|
7. Estimated completion time.
|
||||||
|
|
||||||
|
Status details:
|
||||||
|
- Use the live run log on the automation VM when available.
|
||||||
|
- Use the run `build_name` as the heading/title when available.
|
||||||
|
- Show blacklisted machines under skipped machines when they are part of the requested scope.
|
||||||
|
- Show in-progress machines under remaining machines as `RUNNING`.
|
||||||
|
- Show not-yet-started machines as `NOT STARTED`.
|
||||||
|
- Use completed spec results already recorded in the log to determine machine pass/fail state.
|
||||||
|
- For failed machines, include the failure reason from the run log in the status output.
|
||||||
|
- Include start time in status output when it can be derived from the log.
|
||||||
|
- Include end time and total runtime for completed runs, or elapsed runtime for active runs.
|
||||||
|
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the log.
|
||||||
|
|
||||||
|
### Automation Blacklist
|
||||||
|
Always exclude these machines with `--exclude_partial_match` when building ATVM automation commands.
|
||||||
|
|
||||||
|
CMC install blacklist (`BLACKLISTED: CMC INSTALL - CAN'T COMPILE`):
|
||||||
|
- `atvm6-centos6.0`
|
||||||
|
- `atvm41-redhat6.0`
|
||||||
|
- `atvm73-oracle6.0`
|
||||||
|
|
||||||
|
Support-request blacklist (`BLACKLISTED: SUPPORT REQUEST - WAITING`):
|
||||||
|
- `atvm113-debian9.0.0`
|
||||||
|
- `atvm115-debian9.1.0`
|
||||||
|
- `atvm116-debian9.2.0`
|
||||||
|
- `atvm156-debian9.3.0`
|
||||||
|
|
||||||
|
Re-create blacklist:
|
||||||
|
- `atvm157-debian13.0.0`
|
||||||
|
|
||||||
|
### Operator Preferences
|
||||||
|
- Do not include Gold Disk IDs in `--build_name`.
|
||||||
|
- `--build_name` must not contain spaces; use `-` between words.
|
||||||
|
- Prefer distro-scoped filtering (for example `--containsVm redhat9`) when possible.
|
||||||
|
|
||||||
|
## Update Policy (Both Tracks)
|
||||||
|
After each run:
|
||||||
|
- Update corresponding `*-guide.md` only if workflow/rules/default behavior changed.
|
||||||
|
- Update corresponding `*-examples.md` when common command patterns/options change.
|
||||||
|
- Update corresponding `*-runs.md` only if the run produced new learning.
|
||||||
|
|
||||||
|
## Path and Naming Consistency Note
|
||||||
|
Current repo filenames use hyphen style, but some script text/defaults still show underscore-style paths (for example `atvm_setup_script.sh`, `run_atvm_setup_and_collect_log.sh`, `/home/aw/code/atvm`).
|
||||||
|
|
||||||
|
When operating:
|
||||||
|
1. Use actual filesystem paths in this repo first (`/home/aw/code/cds/atvm/...`).
|
||||||
|
2. If script defaults are used, verify they match existing files before execution.
|
||||||
|
3. If changing path conventions, update scripts and guides in the same change.
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
- Do not treat `cypress-automation-for-cmc.md` as executable runbook logic.
|
||||||
|
- Do not record secrets/tokens into new guide or runs entries.
|
||||||
97
atvm/atvm-automation-examples.md
Normal file
97
atvm/atvm-automation-examples.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
## Examples
|
||||||
|
|
||||||
|
- `--build_name` values must not include spaces; use `-` between words.
|
||||||
|
- Add the maintained blacklist to `--exclude_partial_match` for runs that use broad selection or randomization.
|
||||||
|
- Maintained blacklist:
|
||||||
|
- `atvm6-centos6.0`
|
||||||
|
- `atvm41-redhat6.0`
|
||||||
|
- `atvm73-oracle6.0`
|
||||||
|
- `atvm113-debian9.0.0`
|
||||||
|
- `atvm115-debian9.1.0`
|
||||||
|
- `atvm116-debian9.2.0`
|
||||||
|
- `atvm156-debian9.3.0`
|
||||||
|
- `atvm157-debian13.0.0`
|
||||||
|
|
||||||
|
### E2E: Pure iscsi+fc with specific VMs
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --specify_vms atvm3-ubuntu18.04 atvm109-w2k12R2; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-pure-plugin
|
||||||
|
```
|
||||||
|
|
||||||
|
### E2E: Infinibox fc with specific VMs
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type infinibox --use_specified_plugin fc --specify_vms atvm51-redhat6.10 atvm110-w2k16; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-infinibox-plugin
|
||||||
|
```
|
||||||
|
|
||||||
|
### E2E: Regular cutover
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --specify_vms atvm93-oracle7.9 atvm111-w2k19 --regular_cutover; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-e2e-regular-cutover
|
||||||
|
```
|
||||||
|
|
||||||
|
### Reboot test
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm37-rocky8.8 atvm112-w2k22 --wait_for_power_on 120; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
### SystemOS test
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --specify_vms atvm118-oracle9.3 atvm145-w2k25; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-systemOS
|
||||||
|
```
|
||||||
|
|
||||||
|
### MigrateOPS test
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-migrateops --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm139-redhat9.5 atvm112-w2k22; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-migrateOPS
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compute MigrateOPS: vmware
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms vmware --test_partition --specify_vms atvm138-oracle9.4-opt atvm112-w2k22 --set_static_ip_dest; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-vmware
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compute MigrateOPS: ovirt
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-migrateops-compute-migration --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --vm_platforms ovirt --test_partition --specify_vms atvm124-redhat8.8 atvm111-w2k19 --set_static_ip_dest; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-computeMigrateOPS-ovirt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Group consistency
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-group-consistency --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm4-ubuntu20.04 atvm112-w2k22 --enable_uuid; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-consistentyGroup
|
||||||
|
```
|
||||||
|
|
||||||
|
### H2H same platform
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-h2h-same-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm38-rocky9.0 atvm112-w2k22; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hSamePlatform
|
||||||
|
```
|
||||||
|
|
||||||
|
### H2H different platform
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-h2h-diff-platf --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --integration_type pure --use_specified_plugin fc --specify_vms atvm65-redhat8.3 atvm112-w2k22; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name nightly-h2hDifferentPlatform
|
||||||
|
```
|
||||||
|
|
||||||
|
### Randomized reboot sanity
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-reboot --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin fc --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0 --wait_for_power_on 120; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-reboot-iscsi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Randomized e2e sanity
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-e2e --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --test_partition --integration_type pure --use_specified_plugin both --randomize 1 --exclude_partial_match suse15.0 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-e2e
|
||||||
|
```
|
||||||
|
|
||||||
|
### Randomized systemOS sanity
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template cmc-systemOS --ignore_force_shutdown --config_file_path ./cypress.atvm-config.ts --randomize 1 --exclude_partial_match suse15.0 fedora34 atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file cypress.atvm-config.ts --build_name sanity-systemOS
|
||||||
|
```
|
||||||
166
atvm/atvm-automation-guide.md
Normal file
166
atvm/atvm-automation-guide.md
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
# Run ATVM Automation Guide
|
||||||
|
|
||||||
|
This file is guide-only documentation for operating ATVM CMC automation.
|
||||||
|
Do not put specific run examples here.
|
||||||
|
For reusable command examples and common option combinations, use `atvm-automation-examples.md`.
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.
|
||||||
|
|
||||||
|
## ATVM Cypress Automation Controller Client
|
||||||
|
- Hostname: `atvm-cypres-vm-1`
|
||||||
|
- IP: `192.168.3.190`
|
||||||
|
- Credentials: `root / atvmcdsi2012`
|
||||||
|
|
||||||
|
## Operating Constraints
|
||||||
|
- Run only scripts/commands explicitly requested.
|
||||||
|
- Do not make manual system configuration changes on the client.
|
||||||
|
- Do not edit client files unless explicitly requested.
|
||||||
|
|
||||||
|
## Operator Preferences
|
||||||
|
- Do not include Gold Disk identifiers in `--build_name`.
|
||||||
|
- `--build_name` must not contain spaces; use `-` between words.
|
||||||
|
- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
|
||||||
|
- Before preparing a new run, always check whether automation is already running.
|
||||||
|
- Always report whether automation is currently running.
|
||||||
|
- If running, ask whether to terminate; terminate only with explicit approval.
|
||||||
|
- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
|
||||||
|
- Before any run, always show exact planned command(s) and wait for explicit approval.
|
||||||
|
- Execute only after explicit approval (for example `approve`).
|
||||||
|
- After execution, report immediate success/failure only.
|
||||||
|
- Do not actively monitor completion unless explicitly requested.
|
||||||
|
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
|
||||||
|
- Report command errors immediately.
|
||||||
|
- `sshpass` may be used where password-based SSH automation is required.
|
||||||
|
|
||||||
|
## Core Scripts
|
||||||
|
- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
|
||||||
|
- Test execution: `./run-sorry-cypress.py`
|
||||||
|
|
||||||
|
Typical sequence:
|
||||||
|
1. Run `cmc-templates.py` with requested template/options.
|
||||||
|
2. Run `run-sorry-cypress.py` with matching config and build name.
|
||||||
|
|
||||||
|
## Config File / Gold Disk Mapping
|
||||||
|
- `cypress.atvm-config-gold.ts` -> Gold Disk 1
|
||||||
|
- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
|
||||||
|
- Additional numbered config variants map to corresponding Gold Disks.
|
||||||
|
|
||||||
|
## Available Templates
|
||||||
|
- `cmc-e2e`
|
||||||
|
- `cmc-group-consistency`
|
||||||
|
- `cmc-h2h-diff-platf`
|
||||||
|
- `cmc-h2h-same-platf`
|
||||||
|
- `cmc-migrateops`
|
||||||
|
- `cmc-migrateops-compute-migration`
|
||||||
|
- `cmc-reboot`
|
||||||
|
- `cmc-systemOS`
|
||||||
|
|
||||||
|
## Command Pattern
|
||||||
|
```bash
|
||||||
|
python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
|
||||||
|
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples Reference
|
||||||
|
- Commonly used command examples: `atvm-automation-examples.md`
|
||||||
|
- Keep this guide focused on run-control rules and workflow constraints.
|
||||||
|
|
||||||
|
## Example Option Patterns (Guide-Only)
|
||||||
|
- Distro-scoped VM selection:
|
||||||
|
- `--containsVm redhat`
|
||||||
|
- `--containsVm redhat9`
|
||||||
|
- Explicit VM selection:
|
||||||
|
- `--specify_vms <vm1> <vm2> ...`
|
||||||
|
- Compute migrateops platform:
|
||||||
|
- `--vm_platforms vmware|ovirt|openshift|proxmox`
|
||||||
|
|
||||||
|
## Blacklisted Machines
|
||||||
|
Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.
|
||||||
|
|
||||||
|
Permanently blacklisted because CMC cannot compile:
|
||||||
|
- `atvm6-centos6.0`
|
||||||
|
- `atvm41-redhat6.0`
|
||||||
|
- `atvm73-oracle6.0`
|
||||||
|
|
||||||
|
Temporarily blacklisted while support requests are waiting:
|
||||||
|
- `atvm113-debian9.0.0`
|
||||||
|
- `atvm115-debian9.1.0`
|
||||||
|
- `atvm116-debian9.2.0`
|
||||||
|
- `atvm156-debian9.3.0`
|
||||||
|
|
||||||
|
Temporarily blacklisted until re-created:
|
||||||
|
- `atvm157-debian13.0.0`
|
||||||
|
|
||||||
|
Preferred exclude list:
|
||||||
|
- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0`
|
||||||
|
|
||||||
|
## Running-Automation Check (Mandatory)
|
||||||
|
Before any new automation request:
|
||||||
|
1. SSH to `root@192.168.3.190`.
|
||||||
|
2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
|
||||||
|
3. Report:
|
||||||
|
- `Running` with process details, or
|
||||||
|
- `Not running`.
|
||||||
|
4. If `Running`, ask operator whether to terminate.
|
||||||
|
5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
|
||||||
|
6. If termination is not approved, do not start a new run.
|
||||||
|
|
||||||
|
## Approval Workflow (Mandatory)
|
||||||
|
1. Build exact command(s) for the request.
|
||||||
|
2. Present them verbatim as planned commands.
|
||||||
|
3. Wait for explicit approval.
|
||||||
|
4. Run only approved command(s), no extra options.
|
||||||
|
5. If monitoring was not requested, report immediate success/failure for each command.
|
||||||
|
6. If monitoring was requested, keep monitoring until completion and report final outcome.
|
||||||
|
|
||||||
|
## Requested Test Style
|
||||||
|
When asked for one VM or a VM set:
|
||||||
|
- choose requested template/options,
|
||||||
|
- choose correct config file for intended Gold Disk,
|
||||||
|
- use a descriptive `--build_name` without Gold Disk IDs.
|
||||||
|
|
||||||
|
## Update Rule
|
||||||
|
- After each run, update this guide only for workflow/rule/default changes.
|
||||||
|
- Update `atvm-automation-examples.md` for reusable command/option examples.
|
||||||
|
- Add run-specific learnings only to `atvm-automation-runs.md` when the run produced new information.
|
||||||
|
|
||||||
|
## Monitoring Policy
|
||||||
|
- Monitor only when the operator explicitly asks to monitor.
|
||||||
|
- If monitoring was not requested, run commands and report execution success/failure and any errors.
|
||||||
|
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.
|
||||||
|
|
||||||
|
## Status Reporting Format
|
||||||
|
When the operator asks for the status of an ATVM automation run, report in this order:
|
||||||
|
1. Heading/title using the run `build_name`.
|
||||||
|
2. Completed machines with pass/fail state for each machine.
|
||||||
|
3. Skipped machines with reason.
|
||||||
|
4. Remaining machines still to run.
|
||||||
|
5. Summary counts for finished, passed, failed, and skipped machines.
|
||||||
|
6. Timing details:
|
||||||
|
- start time
|
||||||
|
- end time if complete
|
||||||
|
- total run time if complete, or elapsed run time if still running
|
||||||
|
- quickest completed test runtime
|
||||||
|
- longest completed test runtime
|
||||||
|
- average completed test runtime
|
||||||
|
7. Estimated completion time.
|
||||||
|
|
||||||
|
Status-report expectations:
|
||||||
|
- Use the live automation VM state when available.
|
||||||
|
- Derive the heading/title from the run `build_name` when available.
|
||||||
|
- Derive completed-machine status from completed spec results already written to the run log.
|
||||||
|
- Include the run start time in every status response when it can be derived from the run log.
|
||||||
|
- If the run is complete, include the end time and total run time.
|
||||||
|
- If the run is still active, include the elapsed run time so far.
|
||||||
|
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
|
||||||
|
- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
|
||||||
|
- For skipped machines, include the reason category:
|
||||||
|
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
|
||||||
|
- `BLACKLISTED: SUPPORT REQUEST - WAITING`
|
||||||
|
- `BLACKLISTED: RE-CREATE NEEDED`
|
||||||
|
- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
|
||||||
|
- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
|
||||||
|
- If no failures are present in completed spec results, report those completed machines as `PASS`.
|
||||||
|
- If a completed spec result shows a failure, report that machine as `FAIL` and include the failure reason from the run log.
|
||||||
|
- Base the completion estimate on the current remaining machine count and recent per-machine runtime visible in the run log.
|
||||||
47
atvm/atvm-automation-runs.md
Normal file
47
atvm/atvm-automation-runs.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# Run ATVM Automation Runs
|
||||||
|
|
||||||
|
This file stores run-specific examples only when a run produced a new learning relevant to future automation tasks.
|
||||||
|
|
||||||
|
## Entry Rule
|
||||||
|
- Add an entry only when a run changed workflow behavior, exposed a failure mode, or confirmed a required new check.
|
||||||
|
- Do not add routine runs with no new learning.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
- No run-learning entries recorded yet from `atvm-automation-guide.md` source material.
|
||||||
|
|
||||||
|
## Run Learning: 2026-03-08 (E2E redhat9.7, pure/fc)
|
||||||
|
- Request:
|
||||||
|
- template: `cmc-e2e`
|
||||||
|
- filter: `--containsVm redhat9.7`
|
||||||
|
- integration: `--integration_type pure`
|
||||||
|
- plugin: `--use_specified_plugin fc`
|
||||||
|
- Observed result:
|
||||||
|
- Cypress spec execution passed (`1` test, `1` passing, `0` failing).
|
||||||
|
- Cloud run URL was produced and marked uploaded.
|
||||||
|
- `run-sorry-cypress.py` remained running afterward with a defunct `npm exec cypress-cloud` child process and did not exit cleanly on its own.
|
||||||
|
- Action for future runs:
|
||||||
|
- If pass/upload is confirmed but `run-sorry-cypress.py` does not exit, treat it as a runner hang condition.
|
||||||
|
- Capture run URL and pass/fail status first, then terminate the stuck runner process cleanly.
|
||||||
|
|
||||||
|
## Run Learning: 2026-03-09 (Blacklist handling and status format)
|
||||||
|
- Observed requirement:
|
||||||
|
- Some ATVM machines must be skipped even when a broad selector such as `--containsVm` or `--randomize` would otherwise include them.
|
||||||
|
- Machines to blacklist via `--exclude_partial_match`:
|
||||||
|
- `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`:
|
||||||
|
- `atvm6-centos6.0`
|
||||||
|
- `atvm41-redhat6.0`
|
||||||
|
- `atvm73-oracle6.0`
|
||||||
|
- `BLACKLISTED: SUPPORT REQUEST - WAITING`:
|
||||||
|
- `atvm113-debian9.0.0`
|
||||||
|
- `atvm115-debian9.1.0`
|
||||||
|
- `atvm116-debian9.2.0`
|
||||||
|
- `atvm156-debian9.3.0`
|
||||||
|
- Needs re-creation:
|
||||||
|
- `atvm157-debian13.0.0`
|
||||||
|
- Action for future runs:
|
||||||
|
- Add these machine names to `--exclude_partial_match` when building broad-scope automation commands.
|
||||||
|
- When reporting run status, include skipped blacklisted machines separately with their reason, in addition to completed and remaining machines.
|
||||||
|
- Use the run `build_name` as the heading/title for status responses so the test type is obvious.
|
||||||
|
- For failed machines in status responses, include the failure reason taken from the run log.
|
||||||
|
- Include timing details in status responses: start time, end time when complete, and total or elapsed runtime.
|
||||||
|
- Also include timing stats in status responses: quickest completed test runtime, longest completed test runtime, and average completed test runtime.
|
||||||
165
atvm/atvm-setup-script-guide.md
Normal file
165
atvm/atvm-setup-script-guide.md
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# ATVM Setup Script Guide
|
||||||
|
|
||||||
|
This file is guide-only documentation for running and maintaining the ATVM setup workflow.
|
||||||
|
Do not put dated run examples here.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
- Client setup script: `/home/aw/code/cds/atvm/atvm-setup-script.sh`
|
||||||
|
- Controller wrapper: `/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh`
|
||||||
|
- Run-learnings log: `/home/aw/code/cds/atvm/atvm-setup-script-runs.md`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
The setup flow performs a controlled bootstrap across supported Linux distributions:
|
||||||
|
1. Validate target host identity using expected IP + expected hostname before any configuration.
|
||||||
|
2. Fix repositories (especially CD/DVD media repo entries).
|
||||||
|
3. On Ubuntu, configure root SSH password-login workflow (`root/cdsi2012`) for follow-on root operations.
|
||||||
|
4. On Oracle Linux, set default boot kernel to non-UEK when available.
|
||||||
|
5. Disable unattended auto-upgrades on Ubuntu.
|
||||||
|
6. Remove specific storage-related packages and install base tooling.
|
||||||
|
7. Disable SELinux on Red Hat-family systems.
|
||||||
|
8. Configure static IP as the final step.
|
||||||
|
9. Print final summary and write logs to `atvm_setup_script.log`.
|
||||||
|
10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
|
||||||
|
11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
|
||||||
|
12. Power off from controller only after successful verification and no setup errors.
|
||||||
|
|
||||||
|
## Execution Model
|
||||||
|
- Shell safety flags: `set -euo pipefail`
|
||||||
|
- Logging: colorized console + plain text log file
|
||||||
|
- Entry point: `main "$@"`
|
||||||
|
- Default operator assumption for setup access: `root / cdsi2012` unless explicitly overridden.
|
||||||
|
|
||||||
|
## Mandatory Identity Gate
|
||||||
|
Setup must not start unless operator explicitly provides both values:
|
||||||
|
- `--expected-ip <ip>`
|
||||||
|
- `--expected-hostname <hostname>`
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Connect to the operator-provided target IP directly.
|
||||||
|
- Do not pre-scan alternate candidate IPs.
|
||||||
|
- Do not infer hostname from target.
|
||||||
|
- If hostname is missing from request, stop and ask for it.
|
||||||
|
- If detected hostname does not exactly match expected hostname, stop immediately.
|
||||||
|
- If expected IP is not assigned on target, stop immediately.
|
||||||
|
|
||||||
|
## Canonical Run Order
|
||||||
|
1. `parse_args`
|
||||||
|
2. `validate_target_host_identity`
|
||||||
|
3. `check_sudo`
|
||||||
|
4. `fix_repositories`
|
||||||
|
5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
|
||||||
|
6. `install_sudo_if_needed`
|
||||||
|
7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
|
||||||
|
8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
|
||||||
|
9. `run_package_installation`
|
||||||
|
10. `disable_selinux` (RHEL-family only)
|
||||||
|
11. `configure_static_ip` (final configuration step)
|
||||||
|
12. `print_final_summary`
|
||||||
|
13. `reboot_and_verify_selinux_if_needed`
|
||||||
|
14. `poweroff_client_if_successful` (controller-driven after verification)
|
||||||
|
|
||||||
|
## Core Behavior By Step
|
||||||
|
|
||||||
|
### Repository Fix
|
||||||
|
- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
|
||||||
|
- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
|
||||||
|
- Fedora: same model via `dnf clean all && dnf makecache`.
|
||||||
|
- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
|
||||||
|
|
||||||
|
### Oracle Linux Kernel Handling
|
||||||
|
- Oracle Linux only.
|
||||||
|
- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
|
||||||
|
- Track whether default changed and whether reboot is required.
|
||||||
|
|
||||||
|
### Ubuntu Root SSH Workflow
|
||||||
|
- Ubuntu only.
|
||||||
|
- Set root password `cdsi2012`, unlock root account.
|
||||||
|
- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
|
||||||
|
- Validate config and restart SSH service.
|
||||||
|
|
||||||
|
### Ubuntu Auto-Upgrade Disable
|
||||||
|
- Ubuntu only.
|
||||||
|
- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
|
||||||
|
|
||||||
|
### Package Installation
|
||||||
|
- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
|
||||||
|
- Pre-cleanup removes multipath/iSCSI packages where applicable.
|
||||||
|
- Installs kernel headers per distro.
|
||||||
|
- Base package set includes:
|
||||||
|
`curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
|
||||||
|
|
||||||
|
### SELinux Disable
|
||||||
|
- RHEL-family only.
|
||||||
|
- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
|
||||||
|
- Marks reboot recommendation/requirement in summary.
|
||||||
|
|
||||||
|
### Static IP Configuration (Final Step)
|
||||||
|
Hardcoded target values:
|
||||||
|
- IP: `192.168.3.191`
|
||||||
|
- Prefix: `22`
|
||||||
|
- Gateway: `192.168.0.1`
|
||||||
|
- DNS: `8.8.8.8`, `8.8.4.4`
|
||||||
|
|
||||||
|
Interface detection priority:
|
||||||
|
1. default-route interface
|
||||||
|
2. first non-loopback interface with IPv4
|
||||||
|
3. first non-loopback interface from link list
|
||||||
|
|
||||||
|
Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
|
||||||
|
|
||||||
|
### SELinux Reboot Verification
|
||||||
|
- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
|
||||||
|
- Creates one-time systemd verifier service before reboot.
|
||||||
|
- Post-reboot service records runtime `getenforce` and self-removes.
|
||||||
|
- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
|
||||||
|
- On errors, leaves client on for manual inspection.
|
||||||
|
|
||||||
|
## Power-State Rules
|
||||||
|
- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
|
||||||
|
- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
|
||||||
|
- If any real error lines exist, keep client powered on.
|
||||||
|
|
||||||
|
## Logging and Verification
|
||||||
|
- Client log filename: `atvm_setup_script.log`
|
||||||
|
- Common client log path when run as root: `/root/atvm_setup_script.log`
|
||||||
|
- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
|
||||||
|
|
||||||
|
Required post-run validation:
|
||||||
|
1. Copy client log to controller `atvm/log/` path.
|
||||||
|
2. Compare SHA256 between client and copied controller log.
|
||||||
|
3. Require exact match.
|
||||||
|
|
||||||
|
## Preferred Execution Commands
|
||||||
|
Direct client execution:
|
||||||
|
```bash
|
||||||
|
sudo bash /home/cirrususer/atvm-setup-script.sh \
|
||||||
|
--expected-ip <current-client-ip> \
|
||||||
|
--expected-hostname <exact-hostname>
|
||||||
|
```
|
||||||
|
|
||||||
|
Controller run + collect:
|
||||||
|
```bash
|
||||||
|
EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
|
||||||
|
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Controller collect-only after client run:
|
||||||
|
```bash
|
||||||
|
/home/aw/code/cds/atvm/run-atvm-setup-and-collect-log.sh --collect-after-complete
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
- If local collected log is missing, do not rerun full setup just for log recovery.
|
||||||
|
- Use collect-only mode and verify SHA256 after copy.
|
||||||
|
- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
|
||||||
|
- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
|
||||||
|
|
||||||
|
## Operational Caveats
|
||||||
|
- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
|
||||||
|
- Static IP values are hardcoded; adjust before use in other environments.
|
||||||
|
- Run in maintenance windows because network changes can interrupt active sessions.
|
||||||
|
- Preserve host identity gating; do not weaken expected IP/hostname checks.
|
||||||
|
|
||||||
|
## Update Rule
|
||||||
|
- After each run, update this file only for guide/rule/checklist/default behavior changes.
|
||||||
|
- Put run-specific outcomes in `atvm-setup-script-runs.md` only when the run produced a new learning.
|
||||||
40
atvm/atvm-setup-script-runs.md
Normal file
40
atvm/atvm-setup-script-runs.md
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
# ATVM Setup Script Runs
|
||||||
|
|
||||||
|
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
|
||||||
|
|
||||||
|
## Entry Rule
|
||||||
|
- Add an entry only when the run changed workflow behavior, exposed a new failure mode, or confirmed a new required check.
|
||||||
|
- Do not add routine runs with no new learning.
|
||||||
|
|
||||||
|
## Run Learning: 2026-03-03 (Ubuntu 24.04)
|
||||||
|
- Environment:
|
||||||
|
- Initial IP: `192.168.0.89`
|
||||||
|
- Final static IP: `192.168.3.191`
|
||||||
|
- Hostname: `atvm-codextest-vm-1`
|
||||||
|
- Learning:
|
||||||
|
- Root SSH password workflow (`root/cdsi2012`) and log copy/hash verification path are valid end-to-end.
|
||||||
|
- Wrapper must enforce identity arguments for run-and-collect mode.
|
||||||
|
- Action for future runs:
|
||||||
|
- Require `EXPECTED_IP_ARG` and `EXPECTED_HOSTNAME_ARG` for wrapper run-and-collect.
|
||||||
|
|
||||||
|
## Run Learning: 2026-03-05 (RHEL 9)
|
||||||
|
- Environment:
|
||||||
|
- Initial IP: `192.168.3.212`
|
||||||
|
- Final static IP: `192.168.3.191`
|
||||||
|
- Hostname: `atvm-codextest-vm-2`
|
||||||
|
- Learning:
|
||||||
|
- SELinux disable path with reboot + post-reboot verifier worked.
|
||||||
|
- Auto power-off can race controller-side log collection if done too early.
|
||||||
|
- Action for future runs:
|
||||||
|
- Keep client powered on until controller log copy + SHA256 verification completes.
|
||||||
|
- Only then perform controller-side power-off when no real error lines are present.
|
||||||
|
|
||||||
|
## Run Learning: 2026-03-06 (Oracle Linux 9)
|
||||||
|
- Environment:
|
||||||
|
- Initial IP: `192.168.0.121`
|
||||||
|
- Final static IP: `192.168.3.191`
|
||||||
|
- Hostname: `atvm-codextest-vm`
|
||||||
|
- Learning:
|
||||||
|
- Wrapper auto power-off was blocked by false-positive error detection from instructional text.
|
||||||
|
- Action for future runs:
|
||||||
|
- Match only real error log lines using `^\[ERROR\]` for power-off gating.
|
||||||
1867
atvm/atvm-setup-script.sh
Normal file
1867
atvm/atvm-setup-script.sh
Normal file
File diff suppressed because it is too large
Load Diff
1319
atvm/cypress-automation-for-cmc.md
Normal file
1319
atvm/cypress-automation-for-cmc.md
Normal file
File diff suppressed because it is too large
Load Diff
BIN
atvm/cypress-automation-for-cmc.md:Zone.Identifier
Normal file
BIN
atvm/cypress-automation-for-cmc.md:Zone.Identifier
Normal file
Binary file not shown.
228
atvm/run-atvm-setup-and-collect-log.sh
Executable file
228
atvm/run-atvm-setup-and-collect-log.sh
Executable file
@@ -0,0 +1,228 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
REMOTE_IP_PRIMARY="${REMOTE_IP_PRIMARY:-192.168.0.121}"
|
||||||
|
REMOTE_IP_SECONDARY="${REMOTE_IP_SECONDARY:-192.168.3.191}"
|
||||||
|
REMOTE_USER="${REMOTE_USER:-root}"
|
||||||
|
PROJECT_DIR="${PROJECT_DIR:-/home/aw/code/atvm}"
|
||||||
|
LOCAL_LOG_DIR="${LOCAL_LOG_DIR:-$PROJECT_DIR/log}"
|
||||||
|
LOCAL_SETUP_SCRIPT="${LOCAL_SETUP_SCRIPT:-$PROJECT_DIR/atvm_setup_script.sh}"
|
||||||
|
REMOTE_SETUP_SCRIPT="${REMOTE_SETUP_SCRIPT:-/root/atvm_setup_script.sh}"
|
||||||
|
REMOTE_LOG_FILE="${REMOTE_LOG_FILE:-/root/atvm_setup_script.log}"
|
||||||
|
WAIT_TIMEOUT_SECONDS="${WAIT_TIMEOUT_SECONDS:-600}"
|
||||||
|
MODE="${1:-run-and-collect}"
|
||||||
|
EXPECTED_IP_ARG="${EXPECTED_IP_ARG:-}"
|
||||||
|
EXPECTED_HOSTNAME_ARG="${EXPECTED_HOSTNAME_ARG:-}"
|
||||||
|
|
||||||
|
SSH_OPTS=(-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5)
|
||||||
|
|
||||||
|
if [[ ! -f "$LOCAL_SETUP_SCRIPT" ]]; then
|
||||||
|
echo "ERROR: Local setup script not found: $LOCAL_SETUP_SCRIPT" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
mkdir -p "$LOCAL_LOG_DIR"
|
||||||
|
|
||||||
|
if ! command -v ssh >/dev/null 2>&1 || ! command -v scp >/dev/null 2>&1; then
|
||||||
|
echo "ERROR: ssh/scp is required." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
SSH_CMD=(ssh "${SSH_OPTS[@]}")
|
||||||
|
SCP_CMD=(scp "${SSH_OPTS[@]}")
|
||||||
|
|
||||||
|
if [[ -n "${ATVM_PASSWORD:-}" ]]; then
|
||||||
|
if command -v sshpass >/dev/null 2>&1; then
|
||||||
|
SSH_CMD=(sshpass -p "$ATVM_PASSWORD" ssh "${SSH_OPTS[@]}")
|
||||||
|
SCP_CMD=(sshpass -p "$ATVM_PASSWORD" scp "${SSH_OPTS[@]}")
|
||||||
|
else
|
||||||
|
echo "WARNING: ATVM_PASSWORD is set, but sshpass is not installed. Falling back to interactive password prompts."
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
run_ssh() {
|
||||||
|
local host="$1"
|
||||||
|
shift
|
||||||
|
"${SSH_CMD[@]}" "${REMOTE_USER}@${host}" "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_scp_to_remote() {
|
||||||
|
local src="$1"
|
||||||
|
local host="$2"
|
||||||
|
local dst="$3"
|
||||||
|
"${SCP_CMD[@]}" "$src" "${REMOTE_USER}@${host}:${dst}"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_scp_from_remote() {
|
||||||
|
local host="$1"
|
||||||
|
local src="$2"
|
||||||
|
local dst="$3"
|
||||||
|
"${SCP_CMD[@]}" "${REMOTE_USER}@${host}:${src}" "$dst"
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_for_reachable_host() {
|
||||||
|
local start_ts current_ts elapsed
|
||||||
|
start_ts="$(date +%s)"
|
||||||
|
|
||||||
|
while true; do
|
||||||
|
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
|
||||||
|
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
|
||||||
|
echo "$host"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
current_ts="$(date +%s)"
|
||||||
|
elapsed=$((current_ts - start_ts))
|
||||||
|
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
pick_initial_host() {
|
||||||
|
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
|
||||||
|
if run_ssh "$host" "echo ready" >/dev/null 2>&1; then
|
||||||
|
echo "$host"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_for_completed_task() {
|
||||||
|
local start_ts current_ts elapsed
|
||||||
|
start_ts="$(date +%s)"
|
||||||
|
|
||||||
|
while true; do
|
||||||
|
for host in "$REMOTE_IP_PRIMARY" "$REMOTE_IP_SECONDARY"; do
|
||||||
|
if run_ssh "$host" "test -f '$REMOTE_LOG_FILE' && grep -q 'SUCCESS: ATVM VM Setup Complete!' '$REMOTE_LOG_FILE'" >/dev/null 2>&1; then
|
||||||
|
echo "$host"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
current_ts="$(date +%s)"
|
||||||
|
elapsed=$((current_ts - start_ts))
|
||||||
|
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
wait_for_host_offline() {
|
||||||
|
local host="$1"
|
||||||
|
local start_ts current_ts elapsed
|
||||||
|
start_ts="$(date +%s)"
|
||||||
|
|
||||||
|
while true; do
|
||||||
|
if ! run_ssh "$host" "echo still-up" >/dev/null 2>&1; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
current_ts="$(date +%s)"
|
||||||
|
elapsed=$((current_ts - start_ts))
|
||||||
|
if (( elapsed >= WAIT_TIMEOUT_SECONDS )); then
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
if [[ "$MODE" != "run-and-collect" && "$MODE" != "--collect-after-complete" ]]; then
|
||||||
|
echo "Usage:"
|
||||||
|
echo " $0 # run setup on client, then collect log"
|
||||||
|
echo " $0 --collect-after-complete # wait for completed client task, then collect log only"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$MODE" == "run-and-collect" ]]; then
|
||||||
|
if [[ -z "$EXPECTED_IP_ARG" || -z "$EXPECTED_HOSTNAME_ARG" ]]; then
|
||||||
|
echo "ERROR: run-and-collect requires EXPECTED_IP_ARG and EXPECTED_HOSTNAME_ARG." >&2
|
||||||
|
echo "Example:" >&2
|
||||||
|
echo " EXPECTED_IP_ARG=192.168.0.121 EXPECTED_HOSTNAME_ARG=atvm-codextest-vm $0" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
INITIAL_HOST="$(pick_initial_host)" || {
|
||||||
|
echo "ERROR: Could not reach ${REMOTE_IP_PRIMARY} or ${REMOTE_IP_SECONDARY} for initial setup." >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
echo "Copying setup script to ${REMOTE_USER}@${INITIAL_HOST}:${REMOTE_SETUP_SCRIPT}"
|
||||||
|
run_scp_to_remote "$LOCAL_SETUP_SCRIPT" "$INITIAL_HOST" "$REMOTE_SETUP_SCRIPT"
|
||||||
|
|
||||||
|
echo "Running remote setup script on ${INITIAL_HOST} (disconnect is expected during IP/reboot steps)"
|
||||||
|
set +e
|
||||||
|
run_ssh "$INITIAL_HOST" "chmod +x '$REMOTE_SETUP_SCRIPT' && bash '$REMOTE_SETUP_SCRIPT' --expected-ip '$EXPECTED_IP_ARG' --expected-hostname '$EXPECTED_HOSTNAME_ARG'"
|
||||||
|
run_status=$?
|
||||||
|
set -e
|
||||||
|
if (( run_status != 0 )); then
|
||||||
|
echo "INFO: Remote run returned non-zero (${run_status}). Continuing because network reconfiguration/reboot can interrupt SSH."
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
|
||||||
|
ACTIVE_HOST="$(wait_for_completed_task)" || {
|
||||||
|
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
else
|
||||||
|
echo "Waiting for completed client task marker in ${REMOTE_LOG_FILE} (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
|
||||||
|
ACTIVE_HOST="$(wait_for_completed_task)" || {
|
||||||
|
echo "ERROR: Could not detect completed task marker in remote log within timeout." >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Host reachable at: ${ACTIVE_HOST}"
|
||||||
|
|
||||||
|
REMOTE_HOSTNAME="$(run_ssh "$ACTIVE_HOST" "hostname" | tr -d '\r' | tail -n1)"
|
||||||
|
RUN_TS="$(date +%Y%m%d_%H%M%S)"
|
||||||
|
LOCAL_LOG_FILE="${LOCAL_LOG_DIR}/atvm_configuration_${REMOTE_HOSTNAME}_${RUN_TS}.log"
|
||||||
|
|
||||||
|
echo "Collecting remote log: ${REMOTE_LOG_FILE}"
|
||||||
|
run_scp_from_remote "$ACTIVE_HOST" "$REMOTE_LOG_FILE" "$LOCAL_LOG_FILE"
|
||||||
|
|
||||||
|
REMOTE_HASH="$(run_ssh "$ACTIVE_HOST" "sha256sum '$REMOTE_LOG_FILE' | awk '{print \$1}'" | tr -d '\r' | tail -n1)"
|
||||||
|
LOCAL_HASH="$(sha256sum "$LOCAL_LOG_FILE" | awk '{print $1}')"
|
||||||
|
|
||||||
|
if [[ "$REMOTE_HASH" != "$LOCAL_HASH" ]]; then
|
||||||
|
echo "ERROR: Hash mismatch after log copy." >&2
|
||||||
|
echo "Remote: $REMOTE_HASH" >&2
|
||||||
|
echo "Local: $LOCAL_HASH" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
HAS_ERRORS_IN_LOG=false
|
||||||
|
# Match only real error log records. Do not match instructional text that mentions "[ERROR]".
|
||||||
|
if run_ssh "$ACTIVE_HOST" "grep -Eq '^\\[ERROR\\]' '$REMOTE_LOG_FILE'"; then
|
||||||
|
HAS_ERRORS_IN_LOG=true
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$HAS_ERRORS_IN_LOG" == true ]]; then
|
||||||
|
echo "WARNING: [ERROR] entries detected in remote log. VM will remain powered on for manual inspection."
|
||||||
|
else
|
||||||
|
echo "Log indicates success with no [ERROR] entries. Powering off ${ACTIVE_HOST}."
|
||||||
|
set +e
|
||||||
|
run_ssh "$ACTIVE_HOST" "shutdown -h now"
|
||||||
|
shutdown_status=$?
|
||||||
|
set -e
|
||||||
|
if (( shutdown_status != 0 )); then
|
||||||
|
echo "INFO: Shutdown command returned non-zero (${shutdown_status}); this can occur if SSH disconnects during shutdown."
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Waiting for ${ACTIVE_HOST} to go offline (timeout: ${WAIT_TIMEOUT_SECONDS}s)"
|
||||||
|
if wait_for_host_offline "$ACTIVE_HOST"; then
|
||||||
|
echo "Power-off confirmed: ${ACTIVE_HOST} is offline."
|
||||||
|
else
|
||||||
|
echo "WARNING: Could not confirm ${ACTIVE_HOST} offline within timeout."
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Success"
|
||||||
|
echo "Active host: ${ACTIVE_HOST}"
|
||||||
|
echo "Local log: ${LOCAL_LOG_FILE}"
|
||||||
|
echo "SHA256: ${LOCAL_HASH}"
|
||||||
42
cdsmcp/AGENTS.md
Normal file
42
cdsmcp/AGENTS.md
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
# AGENTS.md
|
||||||
|
|
||||||
|
This folder contains the VMware/vCenter + MigrateOps runbook for CDS MCP workflows.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
- `esxvm.md`: index file only; points to guide and run-learnings docs.
|
||||||
|
- `esxvm-guide.md`: authoritative workflow/rules/checklists/default behavior.
|
||||||
|
- `esxvm-runs.md`: run-specific learnings, only when a run adds new information.
|
||||||
|
- `vmw.yaml`: base template for `MIGRATEOPS_VMWARE_COMPUTE` operations.
|
||||||
|
|
||||||
|
## Source Of Truth
|
||||||
|
- Use `esxvm-guide.md` for how to execute runs.
|
||||||
|
- Use `vmw.yaml` as the starting operation template.
|
||||||
|
- Treat `esxvm-runs.md` as evidence/history, not baseline procedure.
|
||||||
|
|
||||||
|
## Required Run Pattern
|
||||||
|
1. Confirm source VM in vCenter and power state before IP/SSH actions.
|
||||||
|
2. Prepare source host (CDC cleanup + CMC reinstall/registration) and verify source is connected in CDC.
|
||||||
|
3. Validate preflight requirements from `esxvm-guide.md` (integration, access node, destination name, datastore/host/network, source NIC).
|
||||||
|
4. Create MigrateOps from `vmw.yaml` with request-specific replacements.
|
||||||
|
5. Monitor continuously and auto-approve cutover unless user requests manual approval.
|
||||||
|
6. After terminal state:
|
||||||
|
- validate destination login (poll guest IP if needed),
|
||||||
|
- archive operation,
|
||||||
|
- run offline-host cleanup loop until source/helper cleanup conditions are satisfied,
|
||||||
|
- provide final read-only status listing for source/destination/access/helper across CDC and vCenter.
|
||||||
|
7. Ask user explicitly before deleting destination VM; never delete without same-run confirmation.
|
||||||
|
|
||||||
|
## VM Lookup Requirement
|
||||||
|
- Unless user explicitly asks otherwise, scope VM lookup/list responses to cluster `QACL-ATVMCypressONLY`.
|
||||||
|
- For vCenter VM lookup requests, always include datastore name and VM notes/annotation in the response.
|
||||||
|
|
||||||
|
## Update Rules
|
||||||
|
- Update `esxvm-guide.md` only when workflow/rules/default behavior changes.
|
||||||
|
- Update `esxvm-runs.md` only when a run reveals a new learning/failure pattern/required check.
|
||||||
|
- Keep `esxvm.md` as a lightweight index.
|
||||||
|
|
||||||
|
## Environment Defaults
|
||||||
|
- vCenter: `192.168.0.201`
|
||||||
|
- Cluster scope: `QACL-ATVMCypressONLY` unless user overrides.
|
||||||
|
- Default CDC project: `Skidamarink`
|
||||||
|
- Default access node: `atvm-linux-h2h`
|
||||||
154
cdsmcp/esxvm-guide.md
Normal file
154
cdsmcp/esxvm-guide.md
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
# ESX / vCenter Guide
|
||||||
|
|
||||||
|
This file is for workflow guidance only. Do not add specific run examples here.
|
||||||
|
|
||||||
|
## Update Rule
|
||||||
|
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
|
||||||
|
- Add run-specific examples and evidence to `esxvm-runs.md` only when that run produced a new learning.
|
||||||
|
|
||||||
|
## vCenter Access
|
||||||
|
- Address: `192.168.0.201`
|
||||||
|
- Username: `administrator@qalab.cdsi.local`
|
||||||
|
- Password: `CDSi101!`
|
||||||
|
- Standard CLI path: `/home/aw/.local/bin/govc`
|
||||||
|
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
|
||||||
|
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
|
||||||
|
|
||||||
|
## IP And Power-State Policy (Mandatory)
|
||||||
|
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
|
||||||
|
- Treat only these as stable references:
|
||||||
|
- `192.168.0.201` for vCenter login only
|
||||||
|
- `192.168.3.190` for ATVM Cypress automation only
|
||||||
|
- `192.168.3.191` as default ATVM target reference
|
||||||
|
- Any other VM IP must be obtained live from vCenter for that run only.
|
||||||
|
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
|
||||||
|
|
||||||
|
## Cluster Scope Rule
|
||||||
|
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
|
||||||
|
|
||||||
|
## Ignore VMs
|
||||||
|
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
|
||||||
|
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
|
||||||
|
|
||||||
|
## IP Lookup Rule
|
||||||
|
- If asked about an IP address, only check powered-on VMs.
|
||||||
|
|
||||||
|
## VM Lookup Response Rule
|
||||||
|
- Unless user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
|
||||||
|
- For vCenter VM lookup requests (for example name/contains filters), always report:
|
||||||
|
- VM name
|
||||||
|
- datastore name
|
||||||
|
- VM notes/annotation
|
||||||
|
- include power state and IP when available
|
||||||
|
|
||||||
|
## Common VM Credentials
|
||||||
|
- Username: `root`
|
||||||
|
- Password: `cdsi2012`
|
||||||
|
|
||||||
|
## CMC Install/Uninstall Commands
|
||||||
|
|
||||||
|
### Default Project Rule
|
||||||
|
- Default project: `Skidamarink`
|
||||||
|
- Default registration code: `BZHKABCODZLIOK6RTAJ4`
|
||||||
|
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
|
||||||
|
- Use a different project code only when user explicitly requests it in that run.
|
||||||
|
|
||||||
|
### Skidamarink Install (Linux)
|
||||||
|
```bash
|
||||||
|
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
|
||||||
|
```
|
||||||
|
|
||||||
|
### Skidamarink Install (Windows)
|
||||||
|
```powershell
|
||||||
|
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Uninstall (Linux)
|
||||||
|
```bash
|
||||||
|
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
### Uninstall (Windows)
|
||||||
|
```powershell
|
||||||
|
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
|
||||||
|
```
|
||||||
|
|
||||||
|
### CMC Reinstall Fallback (RHEL 10)
|
||||||
|
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
|
||||||
|
- Do not continue migrateops create until source host is visible as connected in CDC.
|
||||||
|
|
||||||
|
## Status Output Format (Power-Off/Revert/Power-On)
|
||||||
|
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
|
||||||
|
- `Snapshot rollback completed`
|
||||||
|
- `VM [vm name] powered back on successfully`
|
||||||
|
- `Current IP: <ip>`
|
||||||
|
|
||||||
|
## VMware Compute MigrateOps Defaults
|
||||||
|
- Use `/home/aw/code/cds/cdsmcp/vmw.yaml` as the starting template.
|
||||||
|
- Default sequence for requested source machine:
|
||||||
|
- clean CDC state for that machine
|
||||||
|
- reinstall CMC Linux on that machine
|
||||||
|
- perform migration preflight and operation create
|
||||||
|
- If user provides a client name, replace consistently:
|
||||||
|
- `config.system_name`
|
||||||
|
- `migrateops_vmware_compute.compute.vm_name`
|
||||||
|
- operation `name`
|
||||||
|
- Validate `integration_name` is active in target project before create.
|
||||||
|
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
|
||||||
|
- Always discover `source_nic` from live source host networking.
|
||||||
|
|
||||||
|
## Approval and Monitoring Defaults
|
||||||
|
- Auto-approve cutover by default.
|
||||||
|
- Start monitoring immediately after operation create.
|
||||||
|
- Approve as soon as `final-synchronization` requests input.
|
||||||
|
- Skip auto-approval only if user explicitly asks for manual approval.
|
||||||
|
- Patience rule:
|
||||||
|
- if heartbeat/progress is advancing, keep waiting
|
||||||
|
- allow longer waits for helper deployment/registration steps
|
||||||
|
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
|
||||||
|
|
||||||
|
## Preflight Checklist
|
||||||
|
- Source host connected in CDC.
|
||||||
|
- Integration exists and is active in same project.
|
||||||
|
- `atvm-linux-h2h` powered on in vCenter.
|
||||||
|
- `atvm-linux-h2h` connected in same CDC project.
|
||||||
|
- Destination VM name does not already exist in vCenter.
|
||||||
|
- Destination datastore/host/network resolve in vCenter.
|
||||||
|
- `source_nic` discovered via SSH from source host.
|
||||||
|
|
||||||
|
## Post-Migration Validation and Cleanup Pattern
|
||||||
|
- Validate destination login before cleanup:
|
||||||
|
- get destination guest IP from vCenter
|
||||||
|
- verify SSH/login works
|
||||||
|
- if guest IP empty, keep polling and do not skip validation
|
||||||
|
- do not mark run complete before validation result is recorded
|
||||||
|
- Before deleting destination VM:
|
||||||
|
- always prompt user for explicit confirmation
|
||||||
|
- never delete destination VM without that confirmation in the same run
|
||||||
|
- For delete path:
|
||||||
|
- resolve source VM ID and destination VM ID separately
|
||||||
|
- abort if IDs match
|
||||||
|
- power off destination if needed
|
||||||
|
- delete destination by explicit VM ID
|
||||||
|
- verify destination removed and source still exists
|
||||||
|
- Always run project cleanup after terminal migration state:
|
||||||
|
- archive completed operation
|
||||||
|
- run global offline-host cleanup
|
||||||
|
- cleanup must target source VM named in current request only
|
||||||
|
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
|
||||||
|
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
|
||||||
|
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
|
||||||
|
- Completion gate:
|
||||||
|
- do not report run complete until archive + cleanup verification are done
|
||||||
|
- always provide read-only final listing for source, destination, access node, helper:
|
||||||
|
- CDC status (`present` or `cleaned up`)
|
||||||
|
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
|
||||||
|
|
||||||
|
## Default Behavior Contract
|
||||||
|
- Perform automatically on every VMware compute run:
|
||||||
|
- destination login validation
|
||||||
|
- operation archive
|
||||||
|
- offline-host cleanup and source/helper stale verification
|
||||||
|
- Still require explicit user confirmation before destination delete:
|
||||||
|
- always prompt
|
||||||
|
- if no confirmation, keep destination and record `deletion skipped by user`
|
||||||
47
cdsmcp/esxvm-runs.md
Normal file
47
cdsmcp/esxvm-runs.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# ESX / vCenter Run Learnings
|
||||||
|
|
||||||
|
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
|
||||||
|
|
||||||
|
## Entry Rule
|
||||||
|
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
|
||||||
|
- Do not add routine successful runs with no new learning.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14208
|
||||||
|
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
|
||||||
|
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14213
|
||||||
|
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
|
||||||
|
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14214
|
||||||
|
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
|
||||||
|
- Action for future runs: rerun cleanup until stale entries are actually removed.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14215
|
||||||
|
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
|
||||||
|
- Action for future runs:
|
||||||
|
- remove leftover helper artifacts before retry
|
||||||
|
- avoid manual helper power actions during active task execution
|
||||||
|
- keep waiting while heartbeats/progress still advance
|
||||||
|
|
||||||
|
## Run Learning: Operation 14216
|
||||||
|
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
|
||||||
|
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14218
|
||||||
|
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
|
||||||
|
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
|
||||||
|
|
||||||
|
## Run Learning: Operation 14221
|
||||||
|
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
|
||||||
|
- Action for future runs:
|
||||||
|
- always provide final source/destination/access/helper listing across CDC and vCenter
|
||||||
|
- keep destination delete as explicit user-confirmed step only
|
||||||
|
|
||||||
|
## Run Learning: Operation 14223
|
||||||
|
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
|
||||||
|
- Action for future runs:
|
||||||
|
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
|
||||||
|
- enforce `galaxy_complete_endpoint` before manual register
|
||||||
|
- proceed with migrateops only after source host is confirmed connected in CDC
|
||||||
10
cdsmcp/esxvm.md
Normal file
10
cdsmcp/esxvm.md
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# ESX / vCenter Notes Index
|
||||||
|
|
||||||
|
This file is now an index only.
|
||||||
|
|
||||||
|
- Guide-only workflow and rules: `/home/aw/code/cds/cdsmcp/esxvm-guide.md`
|
||||||
|
- Run-specific learnings log: `/home/aw/code/cds/cdsmcp/esxvm-runs.md`
|
||||||
|
|
||||||
|
Update policy:
|
||||||
|
- After each run, update `esxvm-guide.md` only for guide/rule changes.
|
||||||
|
- After each run, update `esxvm-runs.md` only if the run produced a new learning.
|
||||||
109
cdsmcp/vmw.yaml
Normal file
109
cdsmcp/vmw.yaml
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
#
|
||||||
|
# VMware Compute MigrateOps template
|
||||||
|
# Rules:
|
||||||
|
# 1) Replace all client references consistently:
|
||||||
|
# - config.system_name
|
||||||
|
# - migrateops_vmware_compute.compute.vm_name
|
||||||
|
# - operations[].name
|
||||||
|
# - cleanup targeting must use the source VM from the current user request only
|
||||||
|
# 1a) Default CMC migration sequence for any specified machine:
|
||||||
|
# - clean up CDC project state for that machine (remove stale/offline registration context)
|
||||||
|
# - reinstall CMC Linux on that machine
|
||||||
|
# - then perform migration setup/create
|
||||||
|
# 2) Verify integration_name is valid in the target CDC project before creating operation.
|
||||||
|
# 3) Default access node is "atvm-linux-h2h":
|
||||||
|
# - VM must be powered on in vCenter
|
||||||
|
# - CMC must be installed/connected in the same CDC project
|
||||||
|
# 4) Source NIC must be discovered from the source client (do not assume ens192).
|
||||||
|
# 5) Preflight checks before create:
|
||||||
|
# - confirm source VM power state in vCenter first; power on before IP discovery/SSH steps
|
||||||
|
# - destination vm_name must not already exist
|
||||||
|
# - datastore/host/network names must resolve in vCenter
|
||||||
|
# - source client + access node must both be connected in same CDC project
|
||||||
|
# - use only standard vCenter credentials/session for vCenter actions
|
||||||
|
# (do not use 192.168.3.190 for vCenter actions; reserved for Cypress ATVM automation)
|
||||||
|
# - IP handling policy:
|
||||||
|
# * 192.168.0.201 is vCenter only
|
||||||
|
# * 192.168.3.190 is ATVM automation only
|
||||||
|
# * 192.168.3.191 is default ATVM target reference
|
||||||
|
# * any other VM IP must be read live from vCenter for the current run only
|
||||||
|
# and must not be retained/reused as a future default
|
||||||
|
# 6) Post-submit approval behavior (default):
|
||||||
|
# - start monitoring as soon as operation create succeeds
|
||||||
|
# - auto-approve cutover immediately when final-synchronization requests approval
|
||||||
|
# - only use manual approval if explicitly requested by user
|
||||||
|
# - patience rule while monitoring:
|
||||||
|
# * if heartbeat/progress is advancing, keep waiting and do not intervene
|
||||||
|
# * allow longer wait windows for helper VM deploy/registration-related steps
|
||||||
|
# * intervene only on terminal failure, confirmed blocker, or prolonged no-progress
|
||||||
|
# 7) Post-migration validation and cleanup behavior (default):
|
||||||
|
# - verify SSH login to the newly migrated VM first (using vCenter guest IP)
|
||||||
|
# - if vCenter guest IP is initially empty, keep polling until available; do not skip login validation
|
||||||
|
# - never report run completion before destination login validation is recorded
|
||||||
|
# - only target the newly migrated VM for cleanup, never the source VM
|
||||||
|
# - resolve and compare source/destination VM IDs; abort cleanup if IDs match
|
||||||
|
# - prompt user for confirmation before power-off + delete of migrated VM
|
||||||
|
# - prompt user even if they did not explicitly ask for deletion in same request
|
||||||
|
# - never delete destination VM without explicit user confirmation in that run
|
||||||
|
# - archive the completed MigrateOps operation after migration reaches terminal state
|
||||||
|
# - mandatory: run global offline-host cleanup at end of successful runbook
|
||||||
|
# even if source host is offline (remove all offline CMC host records)
|
||||||
|
# - if source/helper entries are still connected in CDC, disconnect first
|
||||||
|
# (for example uninstall CMC on source/helper or power off/delete helper VM),
|
||||||
|
# then rerun offline-host cleanup until source/helper entries are removed
|
||||||
|
# - if CDC still shows source/helper as connected but last_checkin is stale after
|
||||||
|
# source/helper are already powered off/deleted, wait for heartbeat timeout and
|
||||||
|
# rerun offline-host cleanup in a loop until those entries are removed
|
||||||
|
# - verify source host + helper host stale/offline duplicates from this run are removed
|
||||||
|
# - verify helper CMC host entries from the run are removed
|
||||||
|
# (e.g. migrateops-<operation-id>-<source-system-name>)
|
||||||
|
# - if helper entry remains, ensure helper VM is absent/powered off and rerun offline cleanup
|
||||||
|
# - mandatory: remove the source VM from the current request from CDC host list during cleanup
|
||||||
|
# (do not reuse source VM names from prior runs)
|
||||||
|
# - mandatory post-run reporting: always include a read-only status listing for
|
||||||
|
# source VM, destination VM, access node, and helper VM across both CDC and vCenter
|
||||||
|
# with explicit present/cleaned-up state
|
||||||
|
# - do not report run completion until cleanup verification is done and destination VM
|
||||||
|
# deletion is either completed or explicitly skipped by user decision
|
||||||
|
# - default autonomous behavior for every run:
|
||||||
|
# * always perform login validation + archive + offline-host cleanup automatically
|
||||||
|
# * always prompt user before deleting destination VM and record explicit keep/delete decision
|
||||||
|
#
|
||||||
|
operations:
|
||||||
|
- recipe: "MIGRATEOPS_VMWARE_COMPUTE"
|
||||||
|
config:
|
||||||
|
migrateops_vmware_compute:
|
||||||
|
access_node:
|
||||||
|
system_name: "atvm-linux-h2h"
|
||||||
|
compute:
|
||||||
|
datastore: "AutomatedTest-VMBootImgComputeMigration-Gold"
|
||||||
|
host: "192.168.1.165"
|
||||||
|
datacenter: "CDSHQ-Eng"
|
||||||
|
vm_name: "atvm-codextest-vm-migrated"
|
||||||
|
migration:
|
||||||
|
qos_level: "RELENTLESS"
|
||||||
|
auto_resync_interval: "600s"
|
||||||
|
cmchelper:
|
||||||
|
network: "VM Network"
|
||||||
|
ip_config:
|
||||||
|
use_static_ip: true
|
||||||
|
address: "192.168.3.195/22"
|
||||||
|
dns_servers:
|
||||||
|
- "8.8.8.8"
|
||||||
|
gateway: "192.168.0.1"
|
||||||
|
content_library: "vc-cmchelper"
|
||||||
|
template_name: "vc-cmchelper-vm"
|
||||||
|
install_via_access_node: true
|
||||||
|
network:
|
||||||
|
adapters:
|
||||||
|
- network: "VM Network"
|
||||||
|
# Must be discovered from source host via SSH before create.
|
||||||
|
source_nic: "REPLACE_WITH_SOURCE_NIC"
|
||||||
|
transfer_ip: true
|
||||||
|
transfer_mac: false
|
||||||
|
adapter_type: "VMXNET3"
|
||||||
|
keep_source_powered_on: false
|
||||||
|
system_name: "atvm-codextest-vm"
|
||||||
|
integration_name: "vCenter201"
|
||||||
|
name: "atvm-codextest-vm"
|
||||||
|
notes: ""
|
||||||
Reference in New Issue
Block a user