Reorganize cdsmcp workspace into docs, templates, and artifacts

Restructure the cdsmcp folder to separate operator guidance, reusable templates, and runtime artifacts into clearer top-level areas.

Move the VMware migration guide and run learnings into docs/, move vmw.yaml into templates/, move the existing log into artifacts/logs/, replace the old index file with a README, and split the former monolithic guide into focused documents for VMware MigrateOps workflow, VM lookup and FC/disk assignment, and CMC install reference.

Update internal references so the reorganized layout remains coherent without changing the underlying operational guidance.
This commit is contained in:
2026-03-21 20:57:14 -04:00
parent 274b920b40
commit 0405c09987
9 changed files with 240 additions and 135 deletions

View File

@@ -0,0 +1,33 @@
# CMC Install Reference
This file contains the CMC install, uninstall, and reinstall fallback reference used by the CDS MCP VMware workflow.
## Default Project Rule
- Default project: `Skidamarink`
- Default registration code: `BZHKABCODZLIOK6RTAJ4`
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
- Use a different project code only when the user explicitly requests it in that run.
## Skidamarink Install (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
```
## Skidamarink Install (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
```
## Uninstall (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
```
## Uninstall (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
```
## CMC Reinstall Fallback (RHEL 10)
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
- Do not continue MigrateOps create until the source host is visible as connected in CDC.

View File

@@ -0,0 +1,47 @@
# ESX / vCenter Run Learnings
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
- Do not add routine successful runs with no new learning.
## Run Learning: Operation 14208
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
## Run Learning: Operation 14213
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
## Run Learning: Operation 14214
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
- Action for future runs: rerun cleanup until stale entries are actually removed.
## Run Learning: Operation 14215
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
- Action for future runs:
- remove leftover helper artifacts before retry
- avoid manual helper power actions during active task execution
- keep waiting while heartbeats/progress still advance
## Run Learning: Operation 14216
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
## Run Learning: Operation 14218
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
## Run Learning: Operation 14221
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
- Action for future runs:
- always provide final source/destination/access/helper listing across CDC and vCenter
- keep destination delete as explicit user-confirmed step only
## Run Learning: Operation 14223
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
- Action for future runs:
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
- enforce `galaxy_complete_endpoint` before manual register
- proceed with migrateops only after source host is confirmed connected in CDC

View File

@@ -0,0 +1,57 @@
# VM Lookup And Assignment
This file covers vCenter VM lookup responses and the workflow for assigning existing disks and PCI passthrough FC adapters to a VM.
## Cluster Scope Rule
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
## Ignore VMs
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
## IP Lookup Rule
- If asked about an IP address, only check powered-on VMs.
## VM Lookup Response Rule
- Unless the user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests, always report:
- VM name
- datastore name
- VM notes/annotation
- include power state and IP when available
## VM Disk And FC Assignment Workflow
- When asked to assign existing disks and PCI passthrough FC adapters to a specified VM, treat the request as a two-step workflow:
- first gather and report findings
- then wait for explicit approval before making any changes
- Always log into vCenter `192.168.0.201`.
- Find the specified VM and verify the ESXi host it is currently running on.
- Default expected ESXi host is `192.168.1.165`, but always verify live placement before planning changes.
- Always identify and report the datastore where the VM is stored before planning disk attachment.
- Unless the operator explicitly specifies alternatives, default to these PCI passthrough FC adapters:
- `vmhba7` (`0000:85:00.0`)
- `vmhba8` (`0000:85:00.1`)
- Do not substitute any other PCI FC passthrough adapters if either default or operator-specified adapter cannot be found.
- Unless the operator explicitly specifies alternatives, default to these existing disks from the VM's datastore under the `atvm-DISKS` directory:
- `atvm-DISK_1.vmdk`
- `atvm-DISK_2.vmdk`
- Do not substitute any other disks if either default or operator-specified disk cannot be found.
- If the specified adapters or specified disks cannot be found, do nothing and report that nothing will be assigned.
- Before any assignment action, always provide a summary of:
- the VM found
- the ESXi host
- the datastore
- whether `vmhba7` and `vmhba8` were found and are usable
- whether `atvm-DISK_1.vmdk` and `atvm-DISK_2.vmdk` were found under `atvm-DISKS`
- exactly what would be assigned
- Never perform the assignment step until the operator explicitly approves after seeing that summary.
## Common VM Credentials
- Username: `root`
- Password: `cdsi2012`
## Status Output Format (Power-Off/Revert/Power-On)
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
- `Snapshot rollback completed`
- `VM [vm name] powered back on successfully`
- `Current IP: <ip>`

View File

@@ -0,0 +1,102 @@
# VMware Compute MigrateOps Guide
This file is for workflow guidance only. Do not add specific run examples here.
## Update Rule
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
- Add run-specific examples and evidence to `run-learnings.md` only when that run produced a new learning.
## vCenter Access
- Address: `192.168.0.201`
- Username: `administrator@qalab.cdsi.local`
- Password: `CDSi101!`
- Standard CLI path: `/home/aw/.local/bin/govc`
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
## IP And Power-State Policy (Mandatory)
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
- Treat only these as stable references:
- `192.168.0.201` for vCenter login only
- `192.168.3.190` for ATVM Cypress automation only
- `192.168.3.191` as default ATVM target reference
- Any other VM IP must be obtained live from vCenter for that run only.
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
- When the operator refers to `192.168.3.191`, assume ATVM target SSH access should ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`.
- When the operator refers to `192.168.3.191`, assume default SSH credentials `root / cdsi2012` unless the operator explicitly overrides them.
## Related References
- VM lookup, datastore reporting, and FC/disk assignment:
- `vm-lookup-and-assignment.md`
- CMC install, uninstall, and reinstall fallback:
- `cmc-install-reference.md`
## VMware Compute MigrateOps Defaults
- Use `/home/aw/code/cds/cdsmcp/templates/vmw.yaml` as the starting template.
- Default sequence for requested source machine:
- clean CDC state for that machine
- reinstall CMC Linux on that machine
- perform migration preflight and operation create
- If user provides a client name, replace consistently:
- `config.system_name`
- `migrateops_vmware_compute.compute.vm_name`
- operation `name`
- Validate `integration_name` is active in target project before create.
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
- Always discover `source_nic` from live source host networking.
## Approval and Monitoring Defaults
- Auto-approve cutover by default.
- Start monitoring immediately after operation create.
- Approve as soon as `final-synchronization` requests input.
- Skip auto-approval only if user explicitly asks for manual approval.
- Patience rule:
- if heartbeat/progress is advancing, keep waiting
- allow longer waits for helper deployment/registration steps
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
## Preflight Checklist
- Source host connected in CDC.
- Integration exists and is active in same project.
- `atvm-linux-h2h` powered on in vCenter.
- `atvm-linux-h2h` connected in same CDC project.
- Destination VM name does not already exist in vCenter.
- Destination datastore/host/network resolve in vCenter.
- `source_nic` discovered via SSH from source host.
## Post-Migration Validation and Cleanup Pattern
- Validate destination login before cleanup:
- get destination guest IP from vCenter
- verify SSH/login works
- if guest IP empty, keep polling and do not skip validation
- do not mark run complete before validation result is recorded
- Before deleting destination VM:
- always prompt user for explicit confirmation
- never delete destination VM without that confirmation in the same run
- For delete path:
- resolve source VM ID and destination VM ID separately
- abort if IDs match
- power off destination if needed
- delete destination by explicit VM ID
- verify destination removed and source still exists
- Always run project cleanup after terminal migration state:
- archive completed operation
- run global offline-host cleanup
- cleanup must target source VM named in current request only
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
- Completion gate:
- do not report run complete until archive + cleanup verification are done
- always provide read-only final listing for source, destination, access node, helper:
- CDC status (`present` or `cleaned up`)
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
## Default Behavior Contract
- Perform automatically on every VMware compute run:
- destination login validation
- operation archive
- offline-host cleanup and source/helper stale verification
- Still require explicit user confirmation before destination delete:
- always prompt
- if no confirmation, keep destination and record `deletion skipped by user`