Require live vCenter IP checks during ATVM test

This commit is contained in:
Cirrus Codex
2026-05-14 16:50:36 -04:00
parent 4ec0e38678
commit eda18702f6

View File

@@ -67,6 +67,9 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- After source-host kernel inspection is complete, power the source VM off and re-verify in vCenter that it is powered off before cloning. - After source-host kernel inspection is complete, power the source VM off and re-verify in vCenter that it is powered off before cloning.
- Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change. - Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change.
- Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. - Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage.
- Always use live vCenter guest-tools data to confirm the current clone IP before any SSH or polling attempt.
- Re-check live vCenter guest-tools IP after clone power-on, after switching networking from static to DHCP, and after any reboot before attempting SSH.
- Do not assume the previous IP is still valid after a reboot or network change.
- Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run. - Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run.
- Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. - Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events.
@@ -93,51 +96,55 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
16. Detach the 2 FC PCI adapters from the cloned VM. 16. Detach the 2 FC PCI adapters from the cloned VM.
17. Verify in vCenter that both FC passthrough devices are no longer present on the clone. 17. Verify in vCenter that both FC passthrough devices are no longer present on the clone.
18. Power on clone. 18. Power on clone.
19. SSH to `<INITIAL_CLONE_HOST_OR_IP>` using credentials from `/home/aw/code/cds/.env.credentials.local`. 19. Query vCenter guest-tools for the live clone IP.
20. Change OS hostname to clone name, replacing `.` with `-`. 20. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`.
21. Convert networking from static IP to DHCP. 21. Change OS hostname to clone name, replacing `.` with `-`.
22. Remove/clean static IP configuration references. 22. Convert networking from static IP to DHCP.
23. Reboot clone. 23. Remove/clean static IP configuration references.
24. Find DHCP address and verify it is not `<INITIAL_CLONE_HOST_OR_IP>`. 24. Reboot clone.
25. If still `<INITIAL_CLONE_HOST_OR_IP>`, fix static config cleanup and repeat reboot/verify. 25. Query vCenter guest-tools again for the new live clone IP.
26. Continue all remaining steps using DHCP IP and credentials from `/home/aw/code/cds/.env.credentials.local`. 26. SSH to the new live clone IP and verify the DHCP state.
27. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. 27. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify.
28. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. 28. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`.
29. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 29. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`.
30. Wait for initial sync completion. 30. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`.
31. Check available kernels again using full candidate listing (not latest-only output). 31. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
32. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. 32. Wait for initial sync completion.
33. Verify matching dev/header packages for the selected first-upgrade target are available. 33. Check available kernels again using full candidate listing (not latest-only output).
34. Install selected first-upgrade kernel and matching dev/header packages, then reboot. 34. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate.
35. Verify running kernel and installed dev/header packages match the selected first-upgrade version. 35. Verify matching dev/header packages for the selected first-upgrade target are available.
36. If versions do not match exactly, stop as blocker-fail. 36. Install selected first-upgrade kernel and matching dev/header packages, then reboot.
37. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. 37. Query vCenter guest-tools again for the live clone IP after reboot.
38. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. 38. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version.
39. Write sample data to source 10GB disk. 39. If versions do not match exactly, stop as blocker-fail.
40. Trigger sync and confirm tracking status using `cirrusdata`. 40. After reboot, verify clone is online in `skidamarink` using `cirrusdata`.
41. Uninstall CMC. 41. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
42. Post-uninstall cleanup checkpoint: 42. Write sample data to source 10GB disk.
43. Trigger sync and confirm tracking status using `cirrusdata`.
44. Uninstall CMC.
45. Post-uninstall cleanup checkpoint:
- Run MCP offline-host cleanup for `skidamarink`. - Run MCP offline-host cleanup for `skidamarink`.
- If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host). - If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host).
- Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone. - Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone.
43. Check available kernels. 46. Check available kernels.
44. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). 47. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
45. Verify matching dev/header packages for the selected latest-upgrade target are available. 48. Verify matching dev/header packages for the selected latest-upgrade target are available.
46. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. 49. Install selected latest-upgrade kernel and matching dev/header packages, then reboot.
47. Verify running kernel and installed dev/header packages match the selected latest-upgrade version. 50. Query vCenter guest-tools again for the live clone IP after reboot.
48. If versions do not match exactly, stop as blocker-fail. 51. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version.
49. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. 52. If versions do not match exactly, stop as blocker-fail.
50. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. 53. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`.
51. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 54. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion.
52. Confirm machine is online in `skidamarink` using `cirrusdata`. 55. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
53. SSH and verify MTDI, Galaxy Migrate services/driver are up. 56. Confirm machine is online in `skidamarink` using `cirrusdata`.
54. Success-path cleanup only: power off cloned machine. 57. SSH and verify MTDI, Galaxy Migrate services/driver are up.
55. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. 58. Success-path cleanup only: power off cloned machine.
56. Success-path final cleanup checkpoint: 59. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory.
60. Success-path final cleanup checkpoint:
- Run MCP offline-host cleanup for `skidamarink`. - Run MCP offline-host cleanup for `skidamarink`.
- If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host). - If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host).
- Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone. - Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone.
57. Blocker-fail path after clone creation: 61. Blocker-fail path after clone creation:
- Stop test immediately after recording failure details. - Stop test immediately after recording failure details.
- Leave cloned VM powered on and present in inventory for manual inspection. - Leave cloned VM powered on and present in inventory for manual inspection.
- Do not run clone power-off/delete steps in blocker-fail path. - Do not run clone power-off/delete steps in blocker-fail path.