Initial commit

This commit is contained in:
2026-03-11 15:19:25 -04:00
commit 93b6d7acb8
16 changed files with 4454 additions and 0 deletions

42
cdsmcp/AGENTS.md Normal file
View File

@@ -0,0 +1,42 @@
# AGENTS.md
This folder contains the VMware/vCenter + MigrateOps runbook for CDS MCP workflows.
## Files
- `esxvm.md`: index file only; points to guide and run-learnings docs.
- `esxvm-guide.md`: authoritative workflow/rules/checklists/default behavior.
- `esxvm-runs.md`: run-specific learnings, only when a run adds new information.
- `vmw.yaml`: base template for `MIGRATEOPS_VMWARE_COMPUTE` operations.
## Source Of Truth
- Use `esxvm-guide.md` for how to execute runs.
- Use `vmw.yaml` as the starting operation template.
- Treat `esxvm-runs.md` as evidence/history, not baseline procedure.
## Required Run Pattern
1. Confirm source VM in vCenter and power state before IP/SSH actions.
2. Prepare source host (CDC cleanup + CMC reinstall/registration) and verify source is connected in CDC.
3. Validate preflight requirements from `esxvm-guide.md` (integration, access node, destination name, datastore/host/network, source NIC).
4. Create MigrateOps from `vmw.yaml` with request-specific replacements.
5. Monitor continuously and auto-approve cutover unless user requests manual approval.
6. After terminal state:
- validate destination login (poll guest IP if needed),
- archive operation,
- run offline-host cleanup loop until source/helper cleanup conditions are satisfied,
- provide final read-only status listing for source/destination/access/helper across CDC and vCenter.
7. Ask user explicitly before deleting destination VM; never delete without same-run confirmation.
## VM Lookup Requirement
- Unless user explicitly asks otherwise, scope VM lookup/list responses to cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests, always include datastore name and VM notes/annotation in the response.
## Update Rules
- Update `esxvm-guide.md` only when workflow/rules/default behavior changes.
- Update `esxvm-runs.md` only when a run reveals a new learning/failure pattern/required check.
- Keep `esxvm.md` as a lightweight index.
## Environment Defaults
- vCenter: `192.168.0.201`
- Cluster scope: `QACL-ATVMCypressONLY` unless user overrides.
- Default CDC project: `Skidamarink`
- Default access node: `atvm-linux-h2h`

154
cdsmcp/esxvm-guide.md Normal file
View File

@@ -0,0 +1,154 @@
# ESX / vCenter Guide
This file is for workflow guidance only. Do not add specific run examples here.
## Update Rule
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
- Add run-specific examples and evidence to `esxvm-runs.md` only when that run produced a new learning.
## vCenter Access
- Address: `192.168.0.201`
- Username: `administrator@qalab.cdsi.local`
- Password: `CDSi101!`
- Standard CLI path: `/home/aw/.local/bin/govc`
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
## IP And Power-State Policy (Mandatory)
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
- Treat only these as stable references:
- `192.168.0.201` for vCenter login only
- `192.168.3.190` for ATVM Cypress automation only
- `192.168.3.191` as default ATVM target reference
- Any other VM IP must be obtained live from vCenter for that run only.
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
## Cluster Scope Rule
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
## Ignore VMs
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
## IP Lookup Rule
- If asked about an IP address, only check powered-on VMs.
## VM Lookup Response Rule
- Unless user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests (for example name/contains filters), always report:
- VM name
- datastore name
- VM notes/annotation
- include power state and IP when available
## Common VM Credentials
- Username: `root`
- Password: `cdsi2012`
## CMC Install/Uninstall Commands
### Default Project Rule
- Default project: `Skidamarink`
- Default registration code: `BZHKABCODZLIOK6RTAJ4`
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
- Use a different project code only when user explicitly requests it in that run.
### Skidamarink Install (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
```
### Skidamarink Install (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc BZHKABCODZLIOK6RTAJ4 -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
```
### Uninstall (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
```
### Uninstall (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
```
### CMC Reinstall Fallback (RHEL 10)
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
- Do not continue migrateops create until source host is visible as connected in CDC.
## Status Output Format (Power-Off/Revert/Power-On)
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
- `Snapshot rollback completed`
- `VM [vm name] powered back on successfully`
- `Current IP: <ip>`
## VMware Compute MigrateOps Defaults
- Use `/home/aw/code/cds/cdsmcp/vmw.yaml` as the starting template.
- Default sequence for requested source machine:
- clean CDC state for that machine
- reinstall CMC Linux on that machine
- perform migration preflight and operation create
- If user provides a client name, replace consistently:
- `config.system_name`
- `migrateops_vmware_compute.compute.vm_name`
- operation `name`
- Validate `integration_name` is active in target project before create.
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
- Always discover `source_nic` from live source host networking.
## Approval and Monitoring Defaults
- Auto-approve cutover by default.
- Start monitoring immediately after operation create.
- Approve as soon as `final-synchronization` requests input.
- Skip auto-approval only if user explicitly asks for manual approval.
- Patience rule:
- if heartbeat/progress is advancing, keep waiting
- allow longer waits for helper deployment/registration steps
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
## Preflight Checklist
- Source host connected in CDC.
- Integration exists and is active in same project.
- `atvm-linux-h2h` powered on in vCenter.
- `atvm-linux-h2h` connected in same CDC project.
- Destination VM name does not already exist in vCenter.
- Destination datastore/host/network resolve in vCenter.
- `source_nic` discovered via SSH from source host.
## Post-Migration Validation and Cleanup Pattern
- Validate destination login before cleanup:
- get destination guest IP from vCenter
- verify SSH/login works
- if guest IP empty, keep polling and do not skip validation
- do not mark run complete before validation result is recorded
- Before deleting destination VM:
- always prompt user for explicit confirmation
- never delete destination VM without that confirmation in the same run
- For delete path:
- resolve source VM ID and destination VM ID separately
- abort if IDs match
- power off destination if needed
- delete destination by explicit VM ID
- verify destination removed and source still exists
- Always run project cleanup after terminal migration state:
- archive completed operation
- run global offline-host cleanup
- cleanup must target source VM named in current request only
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
- Completion gate:
- do not report run complete until archive + cleanup verification are done
- always provide read-only final listing for source, destination, access node, helper:
- CDC status (`present` or `cleaned up`)
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
## Default Behavior Contract
- Perform automatically on every VMware compute run:
- destination login validation
- operation archive
- offline-host cleanup and source/helper stale verification
- Still require explicit user confirmation before destination delete:
- always prompt
- if no confirmation, keep destination and record `deletion skipped by user`

47
cdsmcp/esxvm-runs.md Normal file
View File

@@ -0,0 +1,47 @@
# ESX / vCenter Run Learnings
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
- Do not add routine successful runs with no new learning.
## Run Learning: Operation 14208
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
## Run Learning: Operation 14213
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
## Run Learning: Operation 14214
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
- Action for future runs: rerun cleanup until stale entries are actually removed.
## Run Learning: Operation 14215
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
- Action for future runs:
- remove leftover helper artifacts before retry
- avoid manual helper power actions during active task execution
- keep waiting while heartbeats/progress still advance
## Run Learning: Operation 14216
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
## Run Learning: Operation 14218
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
## Run Learning: Operation 14221
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
- Action for future runs:
- always provide final source/destination/access/helper listing across CDC and vCenter
- keep destination delete as explicit user-confirmed step only
## Run Learning: Operation 14223
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
- Action for future runs:
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
- enforce `galaxy_complete_endpoint` before manual register
- proceed with migrateops only after source host is confirmed connected in CDC

10
cdsmcp/esxvm.md Normal file
View File

@@ -0,0 +1,10 @@
# ESX / vCenter Notes Index
This file is now an index only.
- Guide-only workflow and rules: `/home/aw/code/cds/cdsmcp/esxvm-guide.md`
- Run-specific learnings log: `/home/aw/code/cds/cdsmcp/esxvm-runs.md`
Update policy:
- After each run, update `esxvm-guide.md` only for guide/rule changes.
- After each run, update `esxvm-runs.md` only if the run produced a new learning.

109
cdsmcp/vmw.yaml Normal file
View File

@@ -0,0 +1,109 @@
#
# VMware Compute MigrateOps template
# Rules:
# 1) Replace all client references consistently:
# - config.system_name
# - migrateops_vmware_compute.compute.vm_name
# - operations[].name
# - cleanup targeting must use the source VM from the current user request only
# 1a) Default CMC migration sequence for any specified machine:
# - clean up CDC project state for that machine (remove stale/offline registration context)
# - reinstall CMC Linux on that machine
# - then perform migration setup/create
# 2) Verify integration_name is valid in the target CDC project before creating operation.
# 3) Default access node is "atvm-linux-h2h":
# - VM must be powered on in vCenter
# - CMC must be installed/connected in the same CDC project
# 4) Source NIC must be discovered from the source client (do not assume ens192).
# 5) Preflight checks before create:
# - confirm source VM power state in vCenter first; power on before IP discovery/SSH steps
# - destination vm_name must not already exist
# - datastore/host/network names must resolve in vCenter
# - source client + access node must both be connected in same CDC project
# - use only standard vCenter credentials/session for vCenter actions
# (do not use 192.168.3.190 for vCenter actions; reserved for Cypress ATVM automation)
# - IP handling policy:
# * 192.168.0.201 is vCenter only
# * 192.168.3.190 is ATVM automation only
# * 192.168.3.191 is default ATVM target reference
# * any other VM IP must be read live from vCenter for the current run only
# and must not be retained/reused as a future default
# 6) Post-submit approval behavior (default):
# - start monitoring as soon as operation create succeeds
# - auto-approve cutover immediately when final-synchronization requests approval
# - only use manual approval if explicitly requested by user
# - patience rule while monitoring:
# * if heartbeat/progress is advancing, keep waiting and do not intervene
# * allow longer wait windows for helper VM deploy/registration-related steps
# * intervene only on terminal failure, confirmed blocker, or prolonged no-progress
# 7) Post-migration validation and cleanup behavior (default):
# - verify SSH login to the newly migrated VM first (using vCenter guest IP)
# - if vCenter guest IP is initially empty, keep polling until available; do not skip login validation
# - never report run completion before destination login validation is recorded
# - only target the newly migrated VM for cleanup, never the source VM
# - resolve and compare source/destination VM IDs; abort cleanup if IDs match
# - prompt user for confirmation before power-off + delete of migrated VM
# - prompt user even if they did not explicitly ask for deletion in same request
# - never delete destination VM without explicit user confirmation in that run
# - archive the completed MigrateOps operation after migration reaches terminal state
# - mandatory: run global offline-host cleanup at end of successful runbook
# even if source host is offline (remove all offline CMC host records)
# - if source/helper entries are still connected in CDC, disconnect first
# (for example uninstall CMC on source/helper or power off/delete helper VM),
# then rerun offline-host cleanup until source/helper entries are removed
# - if CDC still shows source/helper as connected but last_checkin is stale after
# source/helper are already powered off/deleted, wait for heartbeat timeout and
# rerun offline-host cleanup in a loop until those entries are removed
# - verify source host + helper host stale/offline duplicates from this run are removed
# - verify helper CMC host entries from the run are removed
# (e.g. migrateops-<operation-id>-<source-system-name>)
# - if helper entry remains, ensure helper VM is absent/powered off and rerun offline cleanup
# - mandatory: remove the source VM from the current request from CDC host list during cleanup
# (do not reuse source VM names from prior runs)
# - mandatory post-run reporting: always include a read-only status listing for
# source VM, destination VM, access node, and helper VM across both CDC and vCenter
# with explicit present/cleaned-up state
# - do not report run completion until cleanup verification is done and destination VM
# deletion is either completed or explicitly skipped by user decision
# - default autonomous behavior for every run:
# * always perform login validation + archive + offline-host cleanup automatically
# * always prompt user before deleting destination VM and record explicit keep/delete decision
#
operations:
- recipe: "MIGRATEOPS_VMWARE_COMPUTE"
config:
migrateops_vmware_compute:
access_node:
system_name: "atvm-linux-h2h"
compute:
datastore: "AutomatedTest-VMBootImgComputeMigration-Gold"
host: "192.168.1.165"
datacenter: "CDSHQ-Eng"
vm_name: "atvm-codextest-vm-migrated"
migration:
qos_level: "RELENTLESS"
auto_resync_interval: "600s"
cmchelper:
network: "VM Network"
ip_config:
use_static_ip: true
address: "192.168.3.195/22"
dns_servers:
- "8.8.8.8"
gateway: "192.168.0.1"
content_library: "vc-cmchelper"
template_name: "vc-cmchelper-vm"
install_via_access_node: true
network:
adapters:
- network: "VM Network"
# Must be discovered from source host via SSH before create.
source_nic: "REPLACE_WITH_SOURCE_NIC"
transfer_ip: true
transfer_mac: false
adapter_type: "VMXNET3"
keep_source_powered_on: false
system_name: "atvm-codextest-vm"
integration_name: "vCenter201"
name: "atvm-codextest-vm"
notes: ""