diff --git a/tests/cmc-upgrade-kernel-test.md b/tests/cmc-upgrade-kernel-test.md index 8c69f04..760fa95 100644 --- a/tests/cmc-upgrade-kernel-test.md +++ b/tests/cmc-upgrade-kernel-test.md @@ -67,6 +67,9 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - After source-host kernel inspection is complete, power the source VM off and re-verify in vCenter that it is powered off before cloning. - Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change. - Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. +- Always use live vCenter guest-tools data to confirm the current clone IP before any SSH or polling attempt. +- Re-check live vCenter guest-tools IP after clone power-on, after switching networking from static to DHCP, and after any reboot before attempting SSH. +- Do not assume the previous IP is still valid after a reboot or network change. - Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run. - Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. @@ -93,51 +96,55 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re 16. Detach the 2 FC PCI adapters from the cloned VM. 17. Verify in vCenter that both FC passthrough devices are no longer present on the clone. 18. Power on clone. -19. SSH to `` using credentials from `/home/aw/code/cds/.env.credentials.local`. -20. Change OS hostname to clone name, replacing `.` with `-`. -21. Convert networking from static IP to DHCP. -22. Remove/clean static IP configuration references. -23. Reboot clone. -24. Find DHCP address and verify it is not ``. -25. If still ``, fix static config cleanup and repeat reboot/verify. -26. Continue all remaining steps using DHCP IP and credentials from `/home/aw/code/cds/.env.credentials.local`. -27. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. -28. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. -29. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. -30. Wait for initial sync completion. -31. Check available kernels again using full candidate listing (not latest-only output). -32. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. -33. Verify matching dev/header packages for the selected first-upgrade target are available. -34. Install selected first-upgrade kernel and matching dev/header packages, then reboot. -35. Verify running kernel and installed dev/header packages match the selected first-upgrade version. -36. If versions do not match exactly, stop as blocker-fail. -37. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. -38. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. -39. Write sample data to source 10GB disk. -40. Trigger sync and confirm tracking status using `cirrusdata`. -41. Uninstall CMC. -42. Post-uninstall cleanup checkpoint: +19. Query vCenter guest-tools for the live clone IP. +20. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`. +21. Change OS hostname to clone name, replacing `.` with `-`. +22. Convert networking from static IP to DHCP. +23. Remove/clean static IP configuration references. +24. Reboot clone. +25. Query vCenter guest-tools again for the new live clone IP. +26. SSH to the new live clone IP and verify the DHCP state. +27. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify. +28. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`. +29. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. +30. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. +31. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. +32. Wait for initial sync completion. +33. Check available kernels again using full candidate listing (not latest-only output). +34. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. +35. Verify matching dev/header packages for the selected first-upgrade target are available. +36. Install selected first-upgrade kernel and matching dev/header packages, then reboot. +37. Query vCenter guest-tools again for the live clone IP after reboot. +38. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version. +39. If versions do not match exactly, stop as blocker-fail. +40. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. +41. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. +42. Write sample data to source 10GB disk. +43. Trigger sync and confirm tracking status using `cirrusdata`. +44. Uninstall CMC. +45. Post-uninstall cleanup checkpoint: - Run MCP offline-host cleanup for `skidamarink`. - If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host). - Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone. -43. Check available kernels. -44. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). -45. Verify matching dev/header packages for the selected latest-upgrade target are available. -46. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. -47. Verify running kernel and installed dev/header packages match the selected latest-upgrade version. -48. If versions do not match exactly, stop as blocker-fail. -49. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. -50. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. -51. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. -52. Confirm machine is online in `skidamarink` using `cirrusdata`. -53. SSH and verify MTDI, Galaxy Migrate services/driver are up. -54. Success-path cleanup only: power off cloned machine. -55. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. -56. Success-path final cleanup checkpoint: +46. Check available kernels. +47. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). +48. Verify matching dev/header packages for the selected latest-upgrade target are available. +49. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. +50. Query vCenter guest-tools again for the live clone IP after reboot. +51. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version. +52. If versions do not match exactly, stop as blocker-fail. +53. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. +54. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. +55. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. +56. Confirm machine is online in `skidamarink` using `cirrusdata`. +57. SSH and verify MTDI, Galaxy Migrate services/driver are up. +58. Success-path cleanup only: power off cloned machine. +59. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. +60. Success-path final cleanup checkpoint: - Run MCP offline-host cleanup for `skidamarink`. - If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host). - Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone. -57. Blocker-fail path after clone creation: +61. Blocker-fail path after clone creation: - Stop test immediately after recording failure details. - Leave cloned VM powered on and present in inventory for manual inspection. - Do not run clone power-off/delete steps in blocker-fail path.