docs(cds): consolidate cirrus data cloud reference docs

This commit is contained in:
Cirrus Codex
2026-05-14 12:31:19 -04:00
parent 6e74a0262b
commit 4ecdcb9f80
11 changed files with 37 additions and 83 deletions

View File

@@ -0,0 +1,31 @@
# Cirrus Data Cloud Reference
This folder contains reusable VMware/vCenter and MigrateOps reference material.
## Start Here
- VMware compute migration workflow:
- `vmware-migrateops-guide.md`
- Direct vCenter VM hardware review/change workflow:
- `esxvm-guide.md`
- VM lookup and FC/disk assignment workflow:
- `vm-lookup-and-assignment.md`
- CMC install and uninstall reference:
- `cmc-install-reference.md`
- includes the Windows SSH + PowerShell install/reinstall path
- iSCSI target cleanup reference:
- `iscsi-cleanup-reference.md`
- Run-specific learnings:
- `run-learnings.md`
- Base operation template:
- `templates/vmw.yaml`
## Layout
- `templates/`
- reusable operation template inputs and starter skeletons
- `artifacts/logs/`
- runtime log artifacts
## Update Policy
- Update workflow docs when rules, defaults, or checklists change.
- Update `run-learnings.md` only when a run adds a new lasting lesson.
- Keep `templates/vmw.yaml` as the starting template for VMware compute MigrateOps operations.

View File

@@ -0,0 +1,81 @@
ATVM Local Data Disk CDC Migration Report
Host: atvm-codextest-vm
Date: 03-18-2026
VM IP during run: 192.168.3.191
Project: Skidamarink
Guest OS: Oracle Linux Server 9.7
Summary
- Powered on the VM and reinstalled CMC for Linux.
- Used /dev/sdb as the source data disk and /dev/sdc as the destination data disk.
- Created an ext4 filesystem on /dev/sdb and mounted it at /mnt/disk1.
- Created a local migration session for /dev/sdb -> /dev/sdc.
- Wrote about 1 GiB of data to /mnt/disk1/aw.dat with fio while the session was active.
- Waited for the session to resynchronize and return to TRACKING with remaining data at 0.
- Performed cutover, then final cutover.
- Unmounted the source mount after final cutover, deleted the session, mounted the destination disk at /mnt/destination, and verified the migrated file.
Disk Layout
- Source disk: /dev/sdb
- Source filesystem: ext4
- Source mountpoint: /mnt/disk1
- Destination disk: /dev/sdc
- Destination mountpoint after completion: /mnt/destination
CMC Reinstall
- Registration code: CMC_GCSTAGE_REGISTRATION_CODE from /home/aw/code/cds/.env.credentials.local
- Endpoint: portal.gcstage.cloud.nonprod.cirrusdata.com:443
- Result: successful
Migration Session
- Local session ID: 1
- Session UUID: 6d086de2-de8f-40c9-8a1c-c22712abce28
- Description: aw local ext4 validation
- Auto resync interval after explicit update: 5s
fio Write
- Command intent: write about 1 GiB to /mnt/disk1/aw.dat
- File created: /mnt/disk1/aw.dat
- File size: 1.0G
- fio runtime: 1924 ms
- fio bandwidth: 532 MiB/s
- fio write IOPS: 532
Source File Verification
- Source path: /mnt/disk1/aw.dat
- SHA-256: b695f7743705a16ef6c346d7d3a962d3608d10af4eae12e7474e2224efc0fe47
Key Session State
- Session created successfully.
- Initial file write produced changed data on the session.
- Explicit sync completed and the session returned to TRACKING with remaining = 0.
- Session then entered cutover/cmotion successfully.
Important Session Events
- SESSION_SYNCHRONIZED at 2026-03-19T01:45:52.643383535Z
- SESSION_ENTERED_CMOTION at 2026-03-19T01:45:53.440862866Z
- SESSION_FINAL_CUTOVER_EXECUTED at 2026-03-19T01:46:08.831400665Z
- SESSION_DELETED at 2026-03-19T01:46:12.798177270Z
Post-Cutover Handling
- /mnt/disk1 unmounted before session deletion.
- Migration session deleted successfully.
- No migration sessions remained after deletion.
Destination Verification
- Destination disk mounted at /mnt/destination from /dev/sdc
- Destination file: /mnt/destination/aw.dat
- Destination SHA-256: b695f7743705a16ef6c346d7d3a962d3608d10af4eae12e7474e2224efc0fe47
Verification Result
- Source and destination SHA-256 values match exactly.
- Data verification passed.
Final Guest Disk State
- /dev/sdb present and not mounted
- /dev/sdc mounted at /mnt/destination as ext4
Notes
- The session remained in TRACKING with changed data after the fio write until an explicit auto-resync interval and sync were applied.
- Cutover succeeded first; final cutover had to wait until the session transitioned from TRACKING to STANDING_IN.

View File

@@ -0,0 +1,46 @@
# Cirrus Data Cloud CMC Install Reference
This file contains the CMC install, uninstall, and reinstall fallback reference used by the shared VMware workflow.
## Default Project Rule
- Default project: `Skidamarink`
- Source `/home/aw/code/cds/.env.credentials.local` and use `CMC_GCSTAGE_REGISTRATION_CODE`
- Default endpoint: `portal.gcstage.cloud.nonprod.cirrusdata.com:443`
- Use a different project code only when the user explicitly requests it in that run.
## Skidamarink Install (Linux)
```bash
source /home/aw/code/cds/.env.credentials.local
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -rgc "$CMC_GCSTAGE_REGISTRATION_CODE" -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE
```
## Skidamarink Install (Windows)
Source `/home/aw/code/cds/.env.credentials.local` first so `CMC_GCSTAGE_REGISTRATION_CODE` is present in the PowerShell environment, and use `ATVM_WINDOWS_TARGET_USER` plus `ATVM_WINDOWS_TARGET_PASSWORD` for Windows guest access unless the operator explicitly overrides them.
Prefer SSH to the Windows guest and execute the PowerShell command there instead of relying on VMware guest operations.
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -rgc $env:CMC_GCSTAGE_REGISTRATION_CODE -gce portal.gcstage.cloud.nonprod.cirrusdata.com:443 -pkg-mode PRE_RELEASE"
```
## Windows Reinstall Pattern
- Before install or reinstall, clean stale CDC project state for that machine when applicable instead of blindly reusing an existing registration.
- Before install or reinstall, connect over SSH with `ATVM_WINDOWS_TARGET_USER` and `ATVM_WINDOWS_TARGET_PASSWORD`.
- Check whether CMC is already installed before deciding on the next action.
- If CMC is already installed, uninstall first, then run the Windows install command again.
- After cleanup/reinstall, verify the host shows up as a fresh connected system in the CDC project before creating migrations.
- Use the same registration code and endpoint defaults as the Linux flow.
- Prefer direct SSH + PowerShell execution for both the install and uninstall commands.
## Uninstall (Linux)
```bash
curl https://get.cirrusdata.cloud/install-cmc | bash -s -- -uninstall
```
## Uninstall (Windows)
```powershell
iex "& { $(irm https://get.cirrusdata.cloud/install-cmc-win) } -uninstall"
```
## CMC Reinstall Fallback (RHEL 10)
- If installer-based reinstall fails due repo metadata/download errors, use cached local `mtdi-daemon` and `galaxy-migrate` RPMs, start services, enforce `galaxy_complete_endpoint`, then manually register.
- Do not continue MigrateOps create until the source host is visible as connected in CDC.

View File

@@ -0,0 +1,53 @@
# Cirrus Data Cloud ESX VM Guide
This file covers read-only review and approval rules for direct vCenter VM hardware changes.
## Scope
- Use this guide for direct vCenter VM hardware work such as:
- creating a new VM disk
- attaching existing datastore VMDKs
- assigning PCI passthrough devices
## vCenter Access
- Always log into vCenter `192.168.0.201`.
- Source `/home/aw/code/cds/.env.credentials.local` and use `VCENTER_USER` plus `VCENTER_PASSWORD`.
- Standard CLI path: `/home/aw/.local/bin/govc`
- Default datacenter: `CDSHQ-Eng`
## Review-First Rule
- Treat direct VM hardware changes as a two-step workflow:
- first gather and report findings
- then wait for explicit operator approval before making any change
- Never create, attach, remove, or reassign hardware until the operator explicitly approves after seeing the summary.
## VM Identification Rule
- When the requested VM name is not present exactly as given, search for the closest live inventory name and report the mismatch before planning any change.
- Do not act on a guessed VM silently.
## Host And Datastore Checks
- Find the specified VM and verify the ESXi host it is currently running on.
- Default expected ESXi host is `192.168.1.165`, but always verify live placement before planning changes.
- Always identify and report the datastore where the VM is stored before planning disk work.
## New Disk Creation Rule
- When asked to create a new disk for a VM:
- inspect the VM's current controller layout first
- prefer the existing SCSI controller already backing the VM disks unless the operator explicitly asks for a different controller
- report the controller label, controller type, and the next free unit number before proposing the change
- report the target datastore and requested size before proposing the change
- if the VM has snapshots, include that fact in the review summary before the operator approves
## Existing Disk And FC Assignment Rule
- For existing-disk and FC-passthrough requests, use the workflow in `vm-lookup-and-assignment.md`.
- If the specified devices or disk files cannot be found exactly as requested, do nothing.
## Summary Requirement
- Before any direct VM hardware change, always report:
- VM name found
- guest hostname when available
- ESXi host
- datastore
- current controller layout relevant to the request
- snapshot state when relevant
- exactly what would be changed
- any blockers that mean nothing will be changed

View File

@@ -0,0 +1,37 @@
# Cirrus Data Cloud iSCSI Cleanup Reference
This file contains the standard iSCSI target cleanup sequence for Linux guests.
## Scope
- Use this reference when the user asks to clean up saved iSCSI targets on a machine.
- This is a guest-level cleanup workflow and is separate from CMC install or CDC project cleanup.
## Standard Sequence
1. Log out of all current targets:
```bash
iscsiadm --mode node --logoutall=all
```
2. List saved node records and note the target IQN values:
```bash
iscsiadm -m node
```
3. Remove each target by IQN:
```bash
iscsiadm -m node -o delete -T <iqn>
```
4. Re-list nodes and confirm the list is empty:
```bash
iscsiadm -m node
```
## Notes
- One IQN can appear on multiple portals; deleting by IQN removes the saved node records for that target.
- `No records found` is an acceptable result if the machine is already clean.
- Unless the user asks otherwise, perform the logout step before deleting saved node records.

View File

@@ -0,0 +1,47 @@
# Cirrus Data Cloud Run Learnings
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
- Do not add routine successful runs with no new learning.
## Run Learning: Operation 14208
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
## Run Learning: Operation 14213
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
## Run Learning: Operation 14214
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
- Action for future runs: rerun cleanup until stale entries are actually removed.
## Run Learning: Operation 14215
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
- Action for future runs:
- remove leftover helper artifacts before retry
- avoid manual helper power actions during active task execution
- keep waiting while heartbeats/progress still advance
## Run Learning: Operation 14216
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
## Run Learning: Operation 14218
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
## Run Learning: Operation 14221
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
- Action for future runs:
- always provide final source/destination/access/helper listing across CDC and vCenter
- keep destination delete as explicit user-confirmed step only
## Run Learning: Operation 14223
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
- Action for future runs:
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
- enforce `galaxy_complete_endpoint` before manual register
- proceed with migrateops only after source host is confirmed connected in CDC

View File

@@ -0,0 +1,12 @@
# Starter template for MIGRATEOPS_VMWARE_COMPUTE
#
# This is a shared starting point for VMware compute runs. Fill in the
# source-specific and integration-specific values before use.
system_name: ""
integration_name: ""
migrateops_vmware_compute:
compute:
vm_name: ""
source_nic: ""

View File

@@ -0,0 +1,63 @@
# Cirrus Data Cloud VM Lookup And Assignment
This file covers vCenter VM lookup responses and the workflow for assigning existing disks and PCI passthrough FC adapters to a VM.
## Cluster Scope Rule
- Only work under cluster `QACL-ATVMCypressONLY` unless explicitly told otherwise.
## Ignore VMs
- `vCLS-bf0ec6f6-c7e2-4383-b11e-9c97cec7ed44`
- `vCLS-e5b3c60e-6a1c-46a6-8357-191fc0ab8e14`
## IP Lookup Rule
- If asked about an IP address, only check powered-on VMs.
## VM Lookup Response Rule
- Unless the user explicitly asks otherwise, return VM lookup/list results only from cluster `QACL-ATVMCypressONLY`.
- For vCenter VM lookup requests, always report:
- VM name
- datastore name
- VM notes/annotation
- include power state and IP when available
## VM Disk And FC Assignment Workflow
- When asked to assign existing disks and PCI passthrough FC adapters to a specified VM, treat the request as a two-step workflow:
- first gather and report findings
- then wait for explicit approval before making any changes
- Always log into vCenter `192.168.0.201`.
- Find the specified VM and verify the ESXi host it is currently running on.
- If the requested VM name is not present exactly as given, search for the closest live inventory name and report the mismatch before planning any change.
- Default expected ESXi host is `192.168.1.165`, but always verify live placement before planning changes.
- Always identify and report the datastore where the VM is stored before planning disk attachment.
- Unless the operator explicitly specifies alternatives, default to these PCI passthrough FC adapters:
- `vmhba7` (`0000:85:00.0`)
- `vmhba8` (`0000:85:00.1`)
- Do not substitute any other PCI FC passthrough adapters if either default or operator-specified adapter cannot be found.
- Unless the operator explicitly specifies alternatives, default to these existing disks from the VM's datastore under the `atvm-DISKS` directory:
- `atvm-DISK_1.vmdk`
- `atvm-DISK_2.vmdk`
- Do not substitute any other disks if either default or operator-specified disk cannot be found.
- If the specified adapters or specified disks cannot be found, do nothing and report that nothing will be assigned.
- For these requests, never substitute a different datastore directory when `atvm-DISKS` is missing.
- Before any assignment action, always provide a summary of:
- the VM found
- any name mismatch between requested VM name and live inventory VM name
- the ESXi host
- the datastore
- whether `vmhba7` and `vmhba8` were found and are usable
- whether `atvm-DISK_1.vmdk` and `atvm-DISK_2.vmdk` were found under `atvm-DISKS`
- exactly what would be assigned
- Never perform the assignment step until the operator explicitly approves after seeing that summary.
## Common VM Credentials
- Source `/home/aw/code/cds/.env.credentials.local`
- Linux username: `ATVM_TARGET_USER`
- Linux password: `ATVM_TARGET_PASSWORD`
- Windows username: `ATVM_WINDOWS_TARGET_USER`
- Windows password: `ATVM_WINDOWS_TARGET_PASSWORD`
## Status Output Format (Power-Off/Revert/Power-On)
- `VM [vm name] was poweredOn, so I powered it off` (or `already poweredOff`)
- `Snapshot rollback completed`
- `VM [vm name] powered back on successfully`
- `Current IP: <ip>`

View File

@@ -0,0 +1,112 @@
# Cirrus Data Cloud VMware Compute MigrateOps Guide
This file is for workflow guidance only. Do not add specific run examples here.
## Update Rule
- After every run, update this file only when a workflow rule/checklist/default behavior changed.
- Add run-specific examples and evidence to `run-learnings.md` only when that run produced a new learning.
## vCenter Access
- Address: `192.168.0.201`
- Source `/home/aw/code/cds/.env.credentials.local` and use `VCENTER_USER` plus `VCENTER_PASSWORD`
- Standard CLI path: `/home/aw/.local/bin/govc`
- Use only this standard vCenter login for vCenter actions unless explicitly instructed otherwise.
- Do not use `192.168.3.190` for vCenter actions; that machine is reserved for Cypress ATVM automation.
## IP And Power-State Policy (Mandatory)
- Before finding guest IP or attempting SSH, confirm VM power state in vCenter and power on if needed.
- Treat only these as stable references:
- `192.168.0.201` for vCenter login only
- `192.168.3.190` for ATVM Cypress automation only
- `192.168.3.191` as default ATVM target reference
- Any other VM IP must be obtained live from vCenter for that run only.
- Do not carry forward ad-hoc VM IPs from previous runs in runbooks.
- When the operator refers to `192.168.3.191`, assume ATVM target SSH access should ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`.
- When the operator refers to `192.168.3.191` for Linux SSH access, source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_TARGET_USER` plus `ATVM_TARGET_PASSWORD` unless the operator explicitly overrides them.
- When the operator refers to `192.168.3.191` for Windows guest access, source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_WINDOWS_TARGET_USER` plus `ATVM_WINDOWS_TARGET_PASSWORD` unless the operator explicitly overrides them.
- For Windows guest command execution, prefer SSH + PowerShell on the guest instead of VMware guest operations unless the operator explicitly requests otherwise.
## Related References
- VM lookup, datastore reporting, and FC/disk assignment:
- `vm-lookup-and-assignment.md`
- CMC install, uninstall, and reinstall fallback:
- `cmc-install-reference.md`
## VMware Compute MigrateOps Defaults
- Use `templates/vmw.yaml` as the starting template.
- Default sequence for requested source machine:
- clean CDC state for that machine
- reinstall CMC on that machine
- perform migration preflight and operation create
- If user provides a client name, replace consistently:
- `config.system_name`
- `migrateops_vmware_compute.compute.vm_name`
- operation `name`
- Validate `integration_name` is active in target project before create.
- Default access node: `atvm-linux-h2h` (must be powered on in vCenter and connected in CDC).
- Always discover `source_nic` from live source host networking.
## Approval and Monitoring Defaults
- Auto-approve cutover by default.
- Start monitoring immediately after operation create.
- Approve as soon as `final-synchronization` requests input.
- Skip auto-approval only if user explicitly asks for manual approval.
- Patience rule:
- if heartbeat/progress is advancing, keep waiting
- allow longer waits for helper deployment/registration steps
- intervene only for terminal failure, confirmed blocker, or prolonged no-progress
## Preflight Checklist
- Source host connected in CDC.
- Integration exists and is active in same project.
- `atvm-linux-h2h` powered on in vCenter.
- `atvm-linux-h2h` connected in same CDC project.
- Destination VM name does not already exist in vCenter.
- Destination datastore/host/network resolve in vCenter.
- `source_nic` discovered via SSH from source host.
## CMC Preparation Rule
- For Linux sources, use the Linux SSH credential path and the Linux install/uninstall reference.
- For Windows sources, use SSH to the guest with `ATVM_WINDOWS_TARGET_USER` and `ATVM_WINDOWS_TARGET_PASSWORD`.
- For both Linux and Windows, clean stale CDC project state for that machine before reinstalling CMC unless the operator explicitly wants to reuse the existing registration.
- For Windows sources, check whether CMC is already installed before install.
- If Windows CMC is already present, uninstall first and then reinstall using the Windows PowerShell installer command.
- After reinstall on Windows, confirm the host reconnects in the CDC project as expected before creating local or remote migration sessions.
- Do not treat VMware guest operations as the default Windows execution path.
## Post-Migration Validation and Cleanup Pattern
- Validate destination login before cleanup:
- get destination guest IP from vCenter
- verify SSH/login works
- if guest IP empty, keep polling and do not skip validation
- do not mark run complete before validation result is recorded
- Before deleting destination VM:
- always prompt user for explicit confirmation
- never delete destination VM without that confirmation in the same run
- For delete path:
- resolve source VM ID and destination VM ID separately
- abort if IDs match
- power off destination if needed
- delete destination by explicit VM ID
- verify destination removed and source still exists
- Always run project cleanup after terminal migration state:
- archive completed operation
- run global offline-host cleanup
- cleanup must target source VM named in current request only
- if source/helper entries still connected, force-disconnect conditions and rerun cleanup
- if stale connected state persists after VM removal/power-off, wait heartbeat timeout and rerun cleanup until removed
- verify helper entry from this run (`migrateops-<opid>-<source-system-name>`) is removed
- Completion gate:
- do not report run complete until archive + cleanup verification are done
- always provide read-only final listing for source, destination, access node, helper:
- CDC status (`present` or `cleaned up`)
- vCenter status (`present` or `cleaned up`, and if present include power state + IP)
## Default Behavior Contract
- Perform automatically on every VMware compute run:
- destination login validation
- operation archive
- offline-host cleanup and source/helper stale verification
- Still require explicit user confirmation before destination delete:
- always prompt
- if no confirmation, keep destination and record `deletion skipped by user`