Clarify source VM power-off gate before cloning

This commit is contained in:
Cirrus Codex
2026-05-13 20:51:23 -04:00
parent 0238257a55
commit a96284ea1c

View File

@@ -62,6 +62,7 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- If the clone is missing or identity is uncertain, stop and do not delete any other VM. - If the clone is missing or identity is uncertain, stop and do not delete any other VM.
- If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection. - If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection.
- Do not delete or power off the clone on blocker-fail outcomes. - Do not delete or power off the clone on blocker-fail outcomes.
- After source-host kernel inspection is complete, power the source VM off and re-verify in vCenter that it is powered off before cloning.
- Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change. - Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change.
- Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. - Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage.
- Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run. - Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run.
@@ -69,7 +70,7 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
## Test Procedure ## Test Procedure
1. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. 1. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
2. Confirm source host is powered on. If it is powered off, power it on. 2. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.
3. SSH to the source host and check available kernel versions on the source before cloning. 3. SSH to the source host and check available kernel versions on the source before cloning.
4. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`). 4. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`).
5. Candidate scope rule: 5. Candidate scope rule:
@@ -80,60 +81,61 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
8. Gate check: 8. Gate check:
- If step 7 triggered a stop condition, execute no further steps. - If step 7 triggered a stop condition, execute no further steps.
- If no stop condition was triggered, continue with the next step. - If no stop condition was triggered, continue with the next step.
9. Confirm source host is powered off (required pre-clone state). 9. After source-host inspection is complete, power the source VM off.
10. Determine base clone name: `aw999-[source-without-atvmxxx-]`. 10. Confirm in vCenter that the source host is powered off before cloning.
11. Before cloning, check whether that clone name already exists in vCenter. 11. Determine base clone name: `aw999-[source-without-atvmxxx-]`.
12. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed. 12. Before cloning, check whether that clone name already exists in vCenter.
13. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only. 13. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
14. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed. 14. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only.
15. Detach the 2 FC PCI adapters from the cloned VM. 15. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed.
16. Verify in vCenter that both FC passthrough devices are no longer present on the clone. 16. Detach the 2 FC PCI adapters from the cloned VM.
17. Power on clone. 17. Verify in vCenter that both FC passthrough devices are no longer present on the clone.
18. SSH to `<INITIAL_CLONE_HOST_OR_IP>` using credentials from `/home/aw/code/cds/.env.credentials.local`. 18. Power on clone.
19. Change OS hostname to clone name, replacing `.` with `-`. 19. SSH to `<INITIAL_CLONE_HOST_OR_IP>` using credentials from `/home/aw/code/cds/.env.credentials.local`.
20. Convert networking from static IP to DHCP. 20. Change OS hostname to clone name, replacing `.` with `-`.
21. Remove/clean static IP configuration references. 21. Convert networking from static IP to DHCP.
22. Reboot clone. 22. Remove/clean static IP configuration references.
23. Find DHCP address and verify it is not `<INITIAL_CLONE_HOST_OR_IP>`. 23. Reboot clone.
24. If still `<INITIAL_CLONE_HOST_OR_IP>`, fix static config cleanup and repeat reboot/verify. 24. Find DHCP address and verify it is not `<INITIAL_CLONE_HOST_OR_IP>`.
25. Continue all remaining steps using DHCP IP and credentials from `/home/aw/code/cds/.env.credentials.local`. 25. If still `<INITIAL_CLONE_HOST_OR_IP>`, fix static config cleanup and repeat reboot/verify.
26. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. 26. Continue all remaining steps using DHCP IP and credentials from `/home/aw/code/cds/.env.credentials.local`.
27. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. 27. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`.
28. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 28. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`.
29. Wait for initial sync completion. 29. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
30. Check available kernels again using full candidate listing (not latest-only output). 30. Wait for initial sync completion.
31. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. 31. Check available kernels again using full candidate listing (not latest-only output).
32. Verify matching dev/header packages for the selected first-upgrade target are available. 32. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate.
33. Install selected first-upgrade kernel and matching dev/header packages, then reboot. 33. Verify matching dev/header packages for the selected first-upgrade target are available.
34. Verify running kernel and installed dev/header packages match the selected first-upgrade version. 34. Install selected first-upgrade kernel and matching dev/header packages, then reboot.
35. If versions do not match exactly, stop as blocker-fail. 35. Verify running kernel and installed dev/header packages match the selected first-upgrade version.
36. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. 36. If versions do not match exactly, stop as blocker-fail.
37. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. 37. After reboot, verify clone is online in `skidamarink` using `cirrusdata`.
38. Write sample data to source 10GB disk. 38. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
39. Trigger sync and confirm tracking status using `cirrusdata`. 39. Write sample data to source 10GB disk.
40. Uninstall CMC. 40. Trigger sync and confirm tracking status using `cirrusdata`.
41. Post-uninstall cleanup checkpoint: 41. Uninstall CMC.
42. Post-uninstall cleanup checkpoint:
- Run MCP offline-host cleanup for `skidamarink`. - Run MCP offline-host cleanup for `skidamarink`.
- If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host). - If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host).
- Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone. - Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone.
42. Check available kernels. 43. Check available kernels.
43. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). 44. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
44. Verify matching dev/header packages for the selected latest-upgrade target are available. 45. Verify matching dev/header packages for the selected latest-upgrade target are available.
45. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. 46. Install selected latest-upgrade kernel and matching dev/header packages, then reboot.
46. Verify running kernel and installed dev/header packages match the selected latest-upgrade version. 47. Verify running kernel and installed dev/header packages match the selected latest-upgrade version.
47. If versions do not match exactly, stop as blocker-fail. 48. If versions do not match exactly, stop as blocker-fail.
48. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. 49. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`.
49. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. 50. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion.
50. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 51. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
51. Confirm machine is online in `skidamarink` using `cirrusdata`. 52. Confirm machine is online in `skidamarink` using `cirrusdata`.
52. SSH and verify MTDI, Galaxy Migrate services/driver are up. 53. SSH and verify MTDI, Galaxy Migrate services/driver are up.
53. Success-path cleanup only: power off cloned machine. 54. Success-path cleanup only: power off cloned machine.
54. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. 55. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory.
55. Success-path final cleanup checkpoint: 56. Success-path final cleanup checkpoint:
- Run MCP offline-host cleanup for `skidamarink`. - Run MCP offline-host cleanup for `skidamarink`.
- If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host). - If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host).
- Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone. - Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone.
56. Blocker-fail path after clone creation: 57. Blocker-fail path after clone creation:
- Stop test immediately after recording failure details. - Stop test immediately after recording failure details.
- Leave cloned VM powered on and present in inventory for manual inspection. - Leave cloned VM powered on and present in inventory for manual inspection.
- Do not run clone power-off/delete steps in blocker-fail path. - Do not run clone power-off/delete steps in blocker-fail path.