From eacaaf0ac0c3736fe09e7d49d6aebd47799e4837 Mon Sep 17 00:00:00 2001 From: Cirrus Codex Date: Mon, 18 May 2026 19:15:49 -0400 Subject: [PATCH] Add full execution checklist and artifact recording --- tests/cmc-upgrade-kernel-test.md | 85 +++++++++++++++++++++++++++----- 1 file changed, 74 insertions(+), 11 deletions(-) diff --git a/tests/cmc-upgrade-kernel-test.md b/tests/cmc-upgrade-kernel-test.md index f528ebf..25fe66b 100644 --- a/tests/cmc-upgrade-kernel-test.md +++ b/tests/cmc-upgrade-kernel-test.md @@ -76,19 +76,82 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - If the clone is missing or identity is uncertain, stop and do not delete any other VM. - If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection. - Do not delete or power off the clone on blocker-fail outcomes. -- After source-host kernel inspection is complete, issue the power-off request, then wait for vCenter to report the source VM as `poweredOff` before cloning. -- Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change. -- Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. -- Use `CDS1-ESX165` / `192.168.1.165` as the default ESXi host for the clone unless the operator explicitly specifies a different placement host. -- Always use live vCenter guest-tools data to confirm the current clone IP before any SSH or polling attempt. -- Re-check live vCenter guest-tools IP after clone power-on, after switching networking from static to DHCP, and after any reboot before attempting SSH. -- Do not assume the previous IP is still valid after a reboot or network change. -- Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run. -- Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. -- Do not power off, delete, or otherwise tear down the clone until the entire latest-kernel validation path is complete, including the second migration session, online check, and MTDI/Galaxy Migrate service/driver verification on the latest kernel. -- Do not treat the latest-kernel reboot or reinstall as the end of the test. The run is only complete after steps 55-59 have succeeded and been recorded. +- Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or reinstall is not the end of the test. + +## Execution Checklist +- Treat this checklist as the run ledger for the test. Check off each item only after the supporting evidence has been gathered. +- Do not skip ahead, collapse, or reorder checklist items. +- Do not begin teardown until every item below is checked complete. +- If any checklist item cannot be checked, stop the test and record the blocker. + +- [ ] 1. Confirm the requested source host is not a SUSE/SLES machine. +- [ ] 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. +- [ ] 3. Confirm source host is powered on for the inspection phase. +- [ ] 4. SSH to the source host and check available kernel versions on the source before cloning. +- [ ] 5. Build source-host kernel candidate list from all available versions and refresh package metadata first. +- [ ] 6. Apply the candidate scope rule: same major OS family only, with same minor stream preferred. +- [ ] 7. Verify at least 2 upgrade candidates exist in the filtered candidate list. +- [ ] 8. If fewer than 2 candidates, hard stop and end run before clone creation. +- [ ] 9. Perform the gate check before continuing. +- [ ] 10. Issue the source-host power-off request and wait for `poweredOff`. +- [ ] 11. Confirm the source host is still `poweredOff` immediately before cloning. +- [ ] 12. Determine the base clone name `aw999-[source-without-atvmxxx-]`. +- [ ] 13. Check whether the base clone name already exists in vCenter. +- [ ] 14. If needed, choose the next available suffixed clone name. +- [ ] 15. Clone the source VM on `AutomatedTest-UnitTesting` and default it to `CDS1-ESX165` / `192.168.1.165` unless overridden. +- [ ] 16. Pass only the clone VM name to the clone command destination. +- [ ] 17. Detach the 2 FC PCI adapters from the cloned VM. +- [ ] 18. Verify both FC passthrough devices are no longer present on the clone. +- [ ] 19. Power on the clone. +- [ ] 20. Query vCenter guest-tools for the live clone IP. +- [ ] 21. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`. +- [ ] 22. Change the OS hostname to the clone name with `.` replaced by `-`. +- [ ] 23. Convert networking from static IP to DHCP. +- [ ] 24. Remove/clean static IP configuration references. +- [ ] 25. Reboot the clone. +- [ ] 26. Query vCenter guest-tools again for the new live clone IP. +- [ ] 27. SSH to the new live clone IP and verify DHCP state. +- [ ] 28. If the clone still reports the previous static IP, fix config cleanup and repeat reboot/verify. +- [ ] 29. Continue all remaining steps using the live DHCP IP from vCenter. +- [ ] 30. Wipe `/dev/sdb` once and verify no filesystem or partition signatures remain. +- [ ] 31. Reinstall CMC on the clone with `-no-prebuilt-mtdi-nexus`. +- [ ] 32. Create the first local migration from 10 GB to 11 GB. +- [ ] 33. If migration session creation fails, hard stop as blocker-fail. +- [ ] 34. Wait for initial sync completion. +- [ ] 35. Check available kernels again using full candidate listing. +- [ ] 36. Select the first-upgrade target from the filtered candidate list. +- [ ] 37. Verify matching dev/header packages are available for the first-upgrade target. +- [ ] 38. Install the first-upgrade kernel and matching dev/header packages, then reboot. +- [ ] 39. Query vCenter guest-tools again for the live clone IP after reboot. +- [ ] 40. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected first-upgrade version. +- [ ] 41. If versions do not match exactly, stop as blocker-fail. +- [ ] 42. Verify the clone is online in `skidamarink` using `cirrusdata`. +- [ ] 43. SSH to the clone and verify MTDI and Galaxy Migrate services/driver are up. +- [ ] 44. Write sample data to the source 10 GB disk. +- [ ] 45. Trigger sync and confirm tracking status using `cirrusdata`. +- [ ] 46. Uninstall CMC. +- [ ] 47. Run MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status. +- [ ] 48. Check available kernels again. +- [ ] 49. Select the latest-upgrade target kernel from the filtered candidate list. +- [ ] 50. Verify matching dev/header packages are available for the latest-upgrade target. +- [ ] 51. Install the latest-upgrade kernel and matching dev/header packages, then reboot. +- [ ] 52. Query vCenter guest-tools again for the live clone IP after reboot. +- [ ] 53. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected latest-upgrade version. +- [ ] 54. If versions do not match exactly, stop as blocker-fail. +- [ ] 55. Reinstall CMC via `cirrusdata` with `-no-prebuilt-mtdi-nexus` on the latest kernel. +- [ ] 56. Create the second local migration from 10 GB to 11 GB and wait for initial sync completion. +- [ ] 57. If migration session creation fails, hard stop as blocker-fail. +- [ ] 58. Confirm the machine is online in `skidamarink` using `cirrusdata`. +- [ ] 59. SSH and verify MTDI and Galaxy Migrate services/driver are up. +- [ ] 60. Only after steps 55-59 all pass, begin success-path cleanup. +- [ ] 61. Power off the cloned machine. +- [ ] 62. Delete the cloned VM and its disks from vCenter inventory. +- [ ] 63. Run final MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only. +- [ ] 64. If a blocker-fail occurred after clone creation, leave the cloned VM powered on and present in inventory for manual inspection. +- [ ] 65. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration. ## Test Procedure +The `Execution Checklist` above is the authoritative run ledger. Use the procedure below as the detailed action reference for each checklist item. 1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine. 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. 3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.