Files
cds-ai/cdsmcp/esxvm-guide.md
anthony.wen d989c8a071 Document default FC and disk assignment workflow for ATVM VMs
Update the VMware runbook to describe the default VM attachment workflow for assigning FC passthrough adapters vmhba7 and vmhba8 plus datastore disks atvm-DISK_1.vmdk and atvm-DISK_2.vmdk, with mandatory live verification, pre-action summary, explicit approval, and no-substitution behavior unless the operator specifies alternatives.
2026-03-18 19:40:32 -04:00

181 lines
8.1 KiB
Markdown

# ESX / vCenter Guide
This file is for workflow guidance only. Do not add specific run examples here.
## Update Rule
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
- Add run-specific examples and evidence to `esxvm-runs.md` only when that run produced a new learning.
## vCenter Access
- Address: `192.168.0.201`
- Username: `administrator@qalab.cdsi.local`
- Password: `CDSi101!`
- Standard CLI path: `/home/aw/.local/bin/govc`
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
## IP And Power-State Policy (Mandatory)
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
- Treat only these as stable references:
- `192.168.0.201` for vCenter login only
- `192.168.3.190` for ATVM Cypress automation only
- `192.168.3.191` as default ATVM target reference
- Any other VM IP must be obtained live from vCenter for that run only.
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
## Cluster Scope Rule
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
## Ignore VMs
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
## IP Lookup Rule
- If asked about an IP address, only check powered-on VMs.
## VM Lookup Response Rule
- Unless user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests (for example name/contains filters), always report:
- VM name
- datastore name
- VM notes/annotation
- include power state and IP when available
## VM Disk And FC Assignment Workflow
- When asked to assign existing disks and PCI passthrough FC adapters to a specified VM, treat the request as a two-step workflow:
- first gather and report findings,
- then wait for explicit approval before making any changes.
- Always log into vCenter `192.168.0.201`.
- Find the specified VM and verify the ESXi host it is currently running on.
- Default expected ESXi host is `192.168.1.165`, but always verify the live host before planning changes.
- Always identify and report the datastore where the VM is stored before planning disk attachment.
- Unless the operator explicitly specifies alternatives, default to these PCI passthrough FC adapters:
- `vmhba7` (`0000:85:00.0`)
- `vmhba8` (`0000:85:00.1`)
- Do not substitute any other PCI FC passthrough adapters if either default or operator-specified adapter cannot be found.
- Unless the operator explicitly specifies alternatives, default to these existing disks from the VM's datastore under the `atvm-DISKS` directory:
- `atvm-DISK_1.vmdk`
- `atvm-DISK_2.vmdk`
- Do not substitute any other disks if either default or operator-specified disk cannot be found.
- If the specified adapters or specified disks cannot be found, do nothing and report that nothing will be assigned.
- Before any assignment action, always provide a summary of:
- the VM found,
- the ESXi host,
- the datastore,
- whether `vmhba7` and `vmhba8` were found and are usable,
- whether `atvm-DISK_1.vmdk` and `atvm-DISK_2.vmdk` were found under `atvm-DISKS`,
- exactly what would be assigned.
- Never perform the assignment step until the operator explicitly approves after seeing that summary.
## Common VM Credentials
- Username: `root`
- Password: `cdsi2012`
## CMC Install/Uninstall Commands
### Default Project Rule
- Default project: `Skidamarink`
- Default registration code: `BZHKABCODZLIOK6RTAJ4`
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
- Use a different project code only when user explicitly requests it in that run.
### Skidamarink Install (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
```
### Skidamarink Install (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
```
### Uninstall (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
```
### Uninstall (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
```
### CMC Reinstall Fallback (RHEL 10)
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
- Do not continue migrateops create until source host is visible as connected in CDC.
## Status Output Format (Power-Off/Revert/Power-On)
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
- `Snapshot rollback completed`
- `VM [vm name] powered back on successfully`
- `Current IP: <ip>`
## VMware Compute MigrateOps Defaults
- Use `/home/aw/code/cds/cdsmcp/vmw.yaml` as the starting template.
- Default sequence for requested source machine:
- clean CDC state for that machine
- reinstall CMC Linux on that machine
- perform migration preflight and operation create
- If user provides a client name, replace consistently:
- `config.system_name`
- `migrateops_vmware_compute.compute.vm_name`
- operation `name`
- Validate `integration_name` is active in target project before create.
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
- Always discover `source_nic` from live source host networking.
## Approval and Monitoring Defaults
- Auto-approve cutover by default.
- Start monitoring immediately after operation create.
- Approve as soon as `final-synchronization` requests input.
- Skip auto-approval only if user explicitly asks for manual approval.
- Patience rule:
- if heartbeat/progress is advancing, keep waiting
- allow longer waits for helper deployment/registration steps
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
## Preflight Checklist
- Source host connected in CDC.
- Integration exists and is active in same project.
- `atvm-linux-h2h` powered on in vCenter.
- `atvm-linux-h2h` connected in same CDC project.
- Destination VM name does not already exist in vCenter.
- Destination datastore/host/network resolve in vCenter.
- `source_nic` discovered via SSH from source host.
## Post-Migration Validation and Cleanup Pattern
- Validate destination login before cleanup:
- get destination guest IP from vCenter
- verify SSH/login works
- if guest IP empty, keep polling and do not skip validation
- do not mark run complete before validation result is recorded
- Before deleting destination VM:
- always prompt user for explicit confirmation
- never delete destination VM without that confirmation in the same run
- For delete path:
- resolve source VM ID and destination VM ID separately
- abort if IDs match
- power off destination if needed
- delete destination by explicit VM ID
- verify destination removed and source still exists
- Always run project cleanup after terminal migration state:
- archive completed operation
- run global offline-host cleanup
- cleanup must target source VM named in current request only
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
- Completion gate:
- do not report run complete until archive + cleanup verification are done
- always provide read-only final listing for source, destination, access node, helper:
- CDC status (`present` or `cleaned up`)
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
## Default Behavior Contract
- Perform automatically on every VMware compute run:
- destination login validation
- operation archive
- offline-host cleanup and source/helper stale verification
- Still require explicit user confirmation before destination delete:
- always prompt
- if no confirmation, keep destination and record `deletion skipped by user`