Add full execution checklist and artifact recording

This commit is contained in:
Cirrus Codex
2026-05-18 19:15:49 -04:00
parent f4060dd324
commit eacaaf0ac0

View File

@@ -76,19 +76,82 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- If the clone is missing or identity is uncertain, stop and do not delete any other VM. - If the clone is missing or identity is uncertain, stop and do not delete any other VM.
- If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection. - If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection.
- Do not delete or power off the clone on blocker-fail outcomes. - Do not delete or power off the clone on blocker-fail outcomes.
- After source-host kernel inspection is complete, issue the power-off request, then wait for vCenter to report the source VM as `poweredOff` before cloning. - Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or reinstall is not the end of the test.
- Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change.
- Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. ## Execution Checklist
- Use `CDS1-ESX165` / `192.168.1.165` as the default ESXi host for the clone unless the operator explicitly specifies a different placement host. - Treat this checklist as the run ledger for the test. Check off each item only after the supporting evidence has been gathered.
- Always use live vCenter guest-tools data to confirm the current clone IP before any SSH or polling attempt. - Do not skip ahead, collapse, or reorder checklist items.
- Re-check live vCenter guest-tools IP after clone power-on, after switching networking from static to DHCP, and after any reboot before attempting SSH. - Do not begin teardown until every item below is checked complete.
- Do not assume the previous IP is still valid after a reboot or network change. - If any checklist item cannot be checked, stop the test and record the blocker.
- Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run.
- Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. - [ ] 1. Confirm the requested source host is not a SUSE/SLES machine.
- Do not power off, delete, or otherwise tear down the clone until the entire latest-kernel validation path is complete, including the second migration session, online check, and MTDI/Galaxy Migrate service/driver verification on the latest kernel. - [ ] 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
- Do not treat the latest-kernel reboot or reinstall as the end of the test. The run is only complete after steps 55-59 have succeeded and been recorded. - [ ] 3. Confirm source host is powered on for the inspection phase.
- [ ] 4. SSH to the source host and check available kernel versions on the source before cloning.
- [ ] 5. Build source-host kernel candidate list from all available versions and refresh package metadata first.
- [ ] 6. Apply the candidate scope rule: same major OS family only, with same minor stream preferred.
- [ ] 7. Verify at least 2 upgrade candidates exist in the filtered candidate list.
- [ ] 8. If fewer than 2 candidates, hard stop and end run before clone creation.
- [ ] 9. Perform the gate check before continuing.
- [ ] 10. Issue the source-host power-off request and wait for `poweredOff`.
- [ ] 11. Confirm the source host is still `poweredOff` immediately before cloning.
- [ ] 12. Determine the base clone name `aw999-[source-without-atvmxxx-]`.
- [ ] 13. Check whether the base clone name already exists in vCenter.
- [ ] 14. If needed, choose the next available suffixed clone name.
- [ ] 15. Clone the source VM on `AutomatedTest-UnitTesting` and default it to `CDS1-ESX165` / `192.168.1.165` unless overridden.
- [ ] 16. Pass only the clone VM name to the clone command destination.
- [ ] 17. Detach the 2 FC PCI adapters from the cloned VM.
- [ ] 18. Verify both FC passthrough devices are no longer present on the clone.
- [ ] 19. Power on the clone.
- [ ] 20. Query vCenter guest-tools for the live clone IP.
- [ ] 21. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`.
- [ ] 22. Change the OS hostname to the clone name with `.` replaced by `-`.
- [ ] 23. Convert networking from static IP to DHCP.
- [ ] 24. Remove/clean static IP configuration references.
- [ ] 25. Reboot the clone.
- [ ] 26. Query vCenter guest-tools again for the new live clone IP.
- [ ] 27. SSH to the new live clone IP and verify DHCP state.
- [ ] 28. If the clone still reports the previous static IP, fix config cleanup and repeat reboot/verify.
- [ ] 29. Continue all remaining steps using the live DHCP IP from vCenter.
- [ ] 30. Wipe `/dev/sdb` once and verify no filesystem or partition signatures remain.
- [ ] 31. Reinstall CMC on the clone with `-no-prebuilt-mtdi-nexus`.
- [ ] 32. Create the first local migration from 10 GB to 11 GB.
- [ ] 33. If migration session creation fails, hard stop as blocker-fail.
- [ ] 34. Wait for initial sync completion.
- [ ] 35. Check available kernels again using full candidate listing.
- [ ] 36. Select the first-upgrade target from the filtered candidate list.
- [ ] 37. Verify matching dev/header packages are available for the first-upgrade target.
- [ ] 38. Install the first-upgrade kernel and matching dev/header packages, then reboot.
- [ ] 39. Query vCenter guest-tools again for the live clone IP after reboot.
- [ ] 40. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected first-upgrade version.
- [ ] 41. If versions do not match exactly, stop as blocker-fail.
- [ ] 42. Verify the clone is online in `skidamarink` using `cirrusdata`.
- [ ] 43. SSH to the clone and verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 44. Write sample data to the source 10 GB disk.
- [ ] 45. Trigger sync and confirm tracking status using `cirrusdata`.
- [ ] 46. Uninstall CMC.
- [ ] 47. Run MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status.
- [ ] 48. Check available kernels again.
- [ ] 49. Select the latest-upgrade target kernel from the filtered candidate list.
- [ ] 50. Verify matching dev/header packages are available for the latest-upgrade target.
- [ ] 51. Install the latest-upgrade kernel and matching dev/header packages, then reboot.
- [ ] 52. Query vCenter guest-tools again for the live clone IP after reboot.
- [ ] 53. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected latest-upgrade version.
- [ ] 54. If versions do not match exactly, stop as blocker-fail.
- [ ] 55. Reinstall CMC via `cirrusdata` with `-no-prebuilt-mtdi-nexus` on the latest kernel.
- [ ] 56. Create the second local migration from 10 GB to 11 GB and wait for initial sync completion.
- [ ] 57. If migration session creation fails, hard stop as blocker-fail.
- [ ] 58. Confirm the machine is online in `skidamarink` using `cirrusdata`.
- [ ] 59. SSH and verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 60. Only after steps 55-59 all pass, begin success-path cleanup.
- [ ] 61. Power off the cloned machine.
- [ ] 62. Delete the cloned VM and its disks from vCenter inventory.
- [ ] 63. Run final MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only.
- [ ] 64. If a blocker-fail occurred after clone creation, leave the cloned VM powered on and present in inventory for manual inspection.
- [ ] 65. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration.
## Test Procedure ## Test Procedure
The `Execution Checklist` above is the authoritative run ledger. Use the procedure below as the detailed action reference for each checklist item.
1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine. 1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine.
2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on. 3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.