Files
cds-ai/docs/cirrus-data-cloud/run-learnings.md

48 lines
2.8 KiB
Markdown

# Cirrus Data Cloud Run Learnings
This file stores run-specific examples only when a run produced a new learning relevant to future tasks.
## Entry Rule
- Add an entry only when the run changed workflow behavior, uncovered a new failure pattern, or confirmed a new required check.
- Do not add routine successful runs with no new learning.
## Run Learning: Operation 14208
- Learning: `wait-for-vm-registration` helper registration can be the longest early-stage step.
- Action for future runs: if step 6/7 is slow, verify helper VM existence in vCenter before remediation.
## Run Learning: Operation 14213
- Learning: completion response was sent before destination delete prompt, operation archive, and offline-host cleanup.
- Action for future runs: completion must be gated on delete prompt handling, archive, and cleanup verification.
## Run Learning: Operation 14214
- Learning: stale helper/source entries can remain and require explicit offline-host cleanup reruns.
- Action for future runs: rerun cleanup until stale entries are actually removed.
## Run Learning: Operation 14215
- Learning: helper creation can fail with vSphere `ReconfigVM` errors and recover via controlled retries.
- Action for future runs:
- remove leftover helper artifacts before retry
- avoid manual helper power actions during active task execution
- keep waiting while heartbeats/progress still advance
## Run Learning: Operation 14216
- Learning: destination login validation and post-run cleanup were missed before completion reporting.
- Action for future runs: always perform destination login validation + archive + cleanup automatically before declaring completion.
## Run Learning: Operation 14218
- Learning: source/helper entries can remain `connected` with stale `last_checkin` after migration.
- Action for future runs: enforce heartbeat-timeout waits and rerun cleanup until source/helper entries are removed.
## Run Learning: Operation 14221
- Learning: source/helper CDC entries for the current request can be removed cleanly by timeout-based cleanup loop after archive, and final 4-entity status listing is effective for closure.
- Action for future runs:
- always provide final source/destination/access/helper listing across CDC and vCenter
- keep destination delete as explicit user-confirmed step only
## Run Learning: Operation 14223
- Learning: on RHEL 10, CMC reinstall via installer script can fail when repo metadata is unavailable; local RPM install + explicit CDC endpoint config + manual register can recover the source in-place.
- Action for future runs:
- if Linux installer fails on repo metadata, check cached `mtdi-daemon` and `galaxy-migrate` RPMs and install directly
- enforce `galaxy_complete_endpoint` before manual register
- proceed with migrateops only after source host is confirmed connected in CDC