Tighten CMC kernel upgrade test procedure

This commit is contained in:
Cirrus Codex
2026-05-21 00:26:46 -04:00
parent 511c3b7401
commit f9e401997b

View File

@@ -1,7 +1,7 @@
# CMC Upgrade Kernel Test Template
## Purpose
Validate CMC behavior across staged kernel upgrades on a cloned VM, including reinstall, migration health, service health, and cleanup.
Validate CMC behavior across staged kernel upgrades on a cloned VM, including CMC install, uninstall, second install, migration health, service health, and cleanup.
## Scope
- Run per source host provided by operator.
@@ -26,12 +26,13 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
## CMC Tooling
- For all CMC-related actions in this test, use the `cirrusdata` skill/CLI path.
- Exception: host cleanup is not handled by that skill yet; use the Cirrus Data MCP tools for offline-host cleanup and cloned-host cleanup.
- For every CMC install/reinstall command in this test, always include installer option: `-no-prebuilt-mtdi-nexus`.
- For every CMC install command in this test, always include installer option: `-no-prebuilt-mtdi-nexus`.
## Kernel Package Rules
- For every planned kernel upgrade, verify matching development/header packages are available for the exact target kernel version before installing that kernel.
- On Red Hat-family systems, verify `kernel-devel-<target>` and `kernel-headers-<target>` availability (or documented distro-equivalent package names where applicable).
- The first kernel upgrade attempt must not use the latest kernel in the filtered candidate list; reserve the latest kernel for the final kernel-upgrade stage.
- Prefer candidates from the same minor OS stream, or the same OS release where the distro does not use RHEL-style minor streams. Use cross-minor/cross-release candidates only when fewer than 2 same-stream upgrade candidates are available and the candidates remain in the same major OS family.
- When upgrading kernel versions, also upgrade/install the matching development/header packages for that same version.
- On Red Hat-family systems that use `grubby` (including Oracle Linux), explicitly set the selected kernel as the default before rebooting, then verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path. If the default does not match, stop before reboot as blocker-fail.
- After each kernel upgrade and reboot, verify running kernel version and installed dev/header package versions all match.
@@ -39,6 +40,7 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- Before any kernel candidate discovery step, force a fresh package metadata refresh on the live host before evaluating available kernel builds. Use the distro command set in the checklist for RHEL-family and APT-based hosts. If the refreshed view differs from a prior result, trust the refreshed live metadata and record that the earlier view was stale.
## Red Hat Preflight
- Checklist step 1 is authoritative for this test; keep this section synchronized with that step.
- Apply this section only when the test target is an actual Red Hat subscription-managed machine and the run is manually executed.
- Do not apply this section to CentOS, Oracle Linux, Rocky, Alma, or other RHEL-derived distributions unless the operator explicitly says the machine should be treated as Red Hat-managed for this run.
- If the target is not actual RHEL, skip this preflight entirely and do not attempt `subscription-manager`.
@@ -57,8 +59,8 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
## Execution Mode
- Run this test in continuous execution mode.
- Do not pause for additional operator prompts between steps.
- Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS`, `FAIL`, or operator-directed `PARTIAL`) and all required cleanup/reporting steps are completed.
- Only stop early if a true blocker prevents safe continuation, and still complete required cleanup/reporting before returning control.
- Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS`, `FAIL`, or operator-directed `PARTIAL`), then complete only the checklist actions allowed for that outcome before returning control.
- Only stop early if a true blocker prevents safe continuation; for blocker-fail after clone creation, preserve the clone for manual inspection and complete reporting only.
- Time every step explicitly.
- If any single step takes longer than 10 minutes, hard stop the test and treat it as a blocker-fail.
@@ -71,10 +73,10 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
## Safety Rules
- Delete only the clone created for this test.
- If the clone is missing or identity is uncertain, stop and do not delete any other VM.
- If the clone is missing, stop and do not delete any other VM.
- If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection.
- Do not delete or power off the clone on blocker-fail outcomes.
- Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or reinstall is not the end of the test.
- Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or second CMC install is not the end of the test.
## Execution Checklist
- Treat this checklist as the run ledger for the test; check each item as it is completed and confirmed.
@@ -82,93 +84,101 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- Do not begin teardown until every item below is checked complete.
- If any checklist item cannot be checked, stop the test and record the blocker.
- [ ] 0. Source `/home/cirrus/cds/.env.credentials.local` and verify required credential variables are present without printing secret values.
- [ ] 1. Confirm the requested source host is not a SUSE/SLES machine; if it is SUSE/SLES, hard stop before source power-on or clone creation.
- [ ] 2. Remove offline hosts in `skidamarink` using Cirrus Data MCP tools for offline-host cleanup.
- [ ] 3. From vCenter, confirm source host is powered on for the inspection phase; power it on if it is not already powered on.
- [ ] 4. From vCenter, query guest-tools for the live source host IP address.
- [ ] 5. SSH to the source host IP address found in step 4 using credentials from `/home/cirrus/cds/.env.credentials.local`.
- [ ] 6. On the source host, inspect distro repository files before listing available kernel builds and hard stop if any enabled/source repo points at `192.168.3.199` (`/etc/yum.repos.d/*.repo`, `/etc/apt/sources.list`, `/etc/apt/sources.list.d/*`, `/etc/zypp/repos.d/*.repo`, or equivalent files present on the host).
- [ ] 7. On the source host, record the current OS version and running kernel version before cloning.
- [ ] 8. On the source host, refresh package metadata and build the kernel candidate list from all available versions using the distro command set: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Debian/Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`. Do not run this test for SUSE/SLES; step 1 must stop those hosts before this point. On Ubuntu, inspect the generic track first, then confirm candidate availability with alternate package listing methods if needed before deciding whether the generic track is usable.
- [ ] 9. Apply the candidate scope rule: same major OS family only, with same minor stream preferred.
- [ ] 10. Verify at least 2 upgrade candidates exist in the filtered candidate list.
- [ ] 11. If fewer than 2 candidates exist, hard stop and end the run before clone creation.
- [ ] 12. Confirm steps 6-11 passed; if any stop condition was hit, do not clone.
- [ ] 13. From vCenter, issue the source-host power-off request and wait for `poweredOff`.
- [ ] 14. From vCenter, confirm the source host is still `poweredOff` immediately before cloning.
- [ ] 15. Determine the base clone name `aw999-[source-without-atvmxxx-]`.
- [ ] 16. From vCenter, check whether the base clone name already exists.
- [ ] 17. If needed, choose the next available suffixed clone name using `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
- [ ] 18. From vCenter, clone the source VM on `AutomatedTest-UnitTesting` using the resolved clone VM name from steps 15-17, pass only the clone VM name as the destination, and default it to `CDS1-ESX165` / `192.168.1.165` unless overridden.
- [ ] 19. From vCenter, detach the 2 FC PCI adapters from the cloned VM.
- [ ] 20. From vCenter, verify both FC passthrough devices are no longer present on the clone.
- [ ] 21. From vCenter, power on the clone.
- [ ] 22. From vCenter, query guest-tools for the live clone IP.
- [ ] 23. SSH to the live clone IP found in step 22 using credentials from `/home/cirrus/cds/.env.credentials.local`.
- [ ] 24. On the clone, change the OS hostname to the clone name with `.` replaced by `-`.
- [ ] 25. On the clone, convert networking from static IP to DHCP.
- [ ] 26. On the clone, remove/clean static IP configuration references.
- [ ] 27. On the clone, reboot the machine.
- [ ] 28. From vCenter, query guest-tools again for the new live clone IP.
- [ ] 29. SSH to the new live clone IP found in step 28.
- [ ] 30. On the clone, verify DHCP state.
- [ ] 31. If the clone still reports the previous static IP, fix config cleanup and repeat steps 26-30.
- [ ] 32. Continue all remaining steps using the live DHCP IP confirmed in step 30.
- [ ] 33. On the clone, wipe `/dev/sdb` once and verify no filesystem or partition signatures remain.
- [ ] 34. Using the cirrusdata skill, install CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus`.
- [ ] 35. Using the cirrusdata skill, create the first local migration from the 10 GB source disk to the 11 GB destination disk in the `skidamarink` project.
- [ ] 36. If migration session creation fails, hard stop as blocker-fail.
- [ ] 37. Using the cirrusdata skill, wait for initial sync completion in the `skidamarink` project.
- [ ] 38. SSH to the live DHCP clone IP confirmed in step 30, refresh package metadata, and check available kernels again using the full distro candidate listing: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Debian/Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`.
- [ ] 39. Select the first-upgrade target from the filtered candidate list; it must stay in the same major OS family and must not be the latest candidate. If no valid non-latest first-upgrade target exists, hard stop as blocker-fail.
- [ ] 40. On the clone, verify matching dev/header packages are available for the exact first-upgrade target.
- [ ] 41. On the clone, install the first-upgrade kernel and matching dev/header packages without rebooting yet.
- [ ] 42. On Red Hat-family systems with `grubby` including Oracle Linux, set the first-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
- [ ] 43. On the clone, reboot into the first-upgrade kernel.
- [ ] 44. From vCenter, query guest-tools again for the live clone IP after reboot.
- [ ] 45. SSH to the rebooted clone IP found in step 44.
- [ ] 46. On the clone, verify kernel plus dev/header package versions match the selected first-upgrade version.
- [ ] 47. If versions do not match exactly, stop as blocker-fail.
- [ ] 48. Using the cirrusdata skill, verify the clone is online in the `skidamarink` project.
- [ ] 49. On the clone, verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 50. On the clone, write sample data to the source 10 GB disk.
- [ ] 51. Using the cirrusdata skill, trigger sync and confirm tracking status in the `skidamarink` project.
- [ ] 52. Using the cirrusdata skill, uninstall CMC from the clone in the `skidamarink` project.
- [ ] 53. Using Cirrus Data MCP tools, run host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status.
- [ ] 54. Using Cirrus Data MCP tools, verify the cloned host entry and all migration sessions for the cloned host are gone from `skidamarink` before continuing.
- [ ] 55. SSH to the live DHCP clone IP confirmed in step 30, refresh package metadata, and check available kernels again using the full distro candidate listing: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Debian/Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`.
- [ ] 56. Select the latest-upgrade target kernel from the filtered candidate list; it must stay in the same major OS family and should use the latest available candidate in that scope. If no valid latest-upgrade target exists, hard stop as blocker-fail.
- [ ] 57. On the clone, verify matching dev/header packages are available for the exact latest-upgrade target.
- [ ] 58. On the clone, install the latest-upgrade kernel and matching dev/header packages without rebooting yet.
- [ ] 59. On Red Hat-family systems with `grubby` including Oracle Linux, set the latest-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
- [ ] 60. On the clone, reboot into the latest-upgrade kernel.
- [ ] 61. From vCenter, query guest-tools again for the live clone IP after reboot.
- [ ] 62. SSH to the rebooted clone IP found in step 61.
- [ ] 63. On the clone, verify kernel plus dev/header package versions match the selected latest-upgrade version.
- [ ] 64. If versions do not match exactly, stop as blocker-fail.
- [ ] 65. Using the cirrusdata skill, install CMC again on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus` on the latest kernel.
- [ ] 66. Using the cirrusdata skill, create the second local migration from the 10 GB source disk to the 11 GB destination disk in the `skidamarink` project and wait for initial sync completion.
- [ ] 67. If migration session creation fails, hard stop as blocker-fail.
- [ ] 68. Using the cirrusdata skill, confirm the machine is online in the `skidamarink` project.
- [ ] 69. SSH to the live clone IP currently reported by vCenter and verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 70. Only after steps 65-69 all pass, begin success-path cleanup.
- [ ] 71. From vCenter, power off the cloned machine.
- [ ] 72. From vCenter, delete the cloned VM and its disks from inventory.
- [ ] 73. Using Cirrus Data MCP tools, run final host cleanup for `skidamarink`, remove the cloned host entry for this test clone only, and verify the cloned host entry plus all migration sessions for the cloned host are gone.
- [ ] 74. Blocker-fail path after clone creation, as an alternate to steps 70-73: leave the cloned VM powered on and present in inventory for manual inspection.
- [ ] 75. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration.
- [ ] 0. Source `/home/cirrus/cds/.env.credentials.local` and verify required credential variables are present without printing secret values; re-source it in any new shell/session before vCenter, SSH, Red Hat subscription, or CMC actions; start per-step timing and hard stop any step that exceeds 10 minutes.
- [ ] 1. If this is a manual run against an actual Red Hat subscription-managed machine, perform the Red Hat preflight commands `subscription-manager remove --all`, `subscription-manager unregister`, `subscription-manager clean`, and `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"`; otherwise explicitly mark this item skipped.
- [ ] 2. Confirm the requested source host is not a SUSE/SLES machine; if it is SUSE/SLES, hard stop before source power-on or clone creation.
- [ ] 3. Remove offline hosts in `skidamarink` using Cirrus Data MCP tools for offline-host cleanup.
- [ ] 4. From vCenter, confirm source host is powered on for the inspection phase; power it on if it is not already powered on.
- [ ] 5. From vCenter, query guest-tools for the live source host IP address.
- [ ] 6. SSH to the source host IP address found in step 5 using credentials from `/home/cirrus/cds/.env.credentials.local`.
- [ ] 7. On the source host, inspect distro repository files before listing available kernel builds and hard stop if any enabled/source repo points at `192.168.3.199` (`/etc/yum.repos.d/*.repo`, `/etc/apt/sources.list`, `/etc/apt/sources.list.d/*`, `/etc/zypp/repos.d/*.repo`, or equivalent files present on the host).
- [ ] 8. On the source host, record the current OS version and running kernel version before cloning.
- [ ] 9. On the source host, refresh package metadata and build the kernel candidate list from all available versions using the distro command set: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`; Debian: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-amd64 linux-headers-amd64; apt list -a linux-image-amd64 linux-headers-amd64`. Do not run this test for SUSE/SLES; step 2 must stop those hosts before this point. On Ubuntu, inspect the generic track first. On Debian, inspect the amd64 meta-package track first. Confirm candidate availability with alternate package listing methods if needed before deciding whether the distro default track is usable. If this refreshed list differs from any earlier package view, trust the refreshed live metadata and record that the earlier view was stale.
- [ ] 10. Apply the candidate scope rule: same major OS family only, same minor stream preferred, same OS release preferred where the distro does not use RHEL-style minor streams, and cross-minor/cross-release candidates allowed only when fewer than 2 same-stream upgrade candidates are available.
- [ ] 11. Verify at least 2 upgrade candidates exist in the filtered candidate list.
- [ ] 12. If fewer than 2 candidates exist, hard stop and end the run before clone creation.
- [ ] 13. Confirm steps 7-12 passed; if any stop condition was hit, do not clone.
- [ ] 14. From vCenter, issue the source-host power-off request and wait for `poweredOff`.
- [ ] 15. From vCenter, confirm the source host is still `poweredOff` immediately before cloning.
- [ ] 16. Determine the base clone name `aw999-[source-without-atvmxxx-]`.
- [ ] 17. From vCenter, check whether the base clone name already exists.
- [ ] 18. If needed, choose the next available suffixed clone name using `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
- [ ] 19. From vCenter, clone the source VM on `AutomatedTest-UnitTesting` using the resolved clone VM name from steps 16-18, pass only the clone VM name as the destination, and default it to `CDS1-ESX165` / `192.168.1.165` unless overridden.
- [ ] 20. From vCenter, detach the 2 FC PCI adapters from the cloned VM.
- [ ] 21. From vCenter, verify both FC passthrough devices are no longer present on the clone.
- [ ] 22. From vCenter, power on the clone.
- [ ] 23. From vCenter, query guest-tools for the live clone IP.
- [ ] 24. SSH to the live clone IP found in step 23 using credentials from `/home/cirrus/cds/.env.credentials.local`.
- [ ] 25. On the clone, change the OS hostname to the clone name with `.` replaced by `-`.
- [ ] 26. On the clone, convert networking from static IP to DHCP.
- [ ] 27. On the clone, remove/clean static IP configuration references.
- [ ] 28. On the clone, reboot the machine.
- [ ] 29. From vCenter, query guest-tools again for the new live clone IP.
- [ ] 30. SSH to the new live clone IP found in step 29.
- [ ] 31. On the clone, verify DHCP state.
- [ ] 32. If the clone still reports the previous static IP, fix config cleanup and repeat steps 27-31.
- [ ] 33. Continue all remaining steps using the live DHCP IP confirmed in step 31.
- [ ] 34. On the clone, verify `/dev/sdb` is the intended 10 GB source disk, identify the intended 11 GB destination disk, then wipe `/dev/sdb` once and verify no filesystem or partition signatures remain.
- [ ] 35. Using the cirrusdata skill, install CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus`.
- [ ] 36. Using the cirrusdata skill, create the first local migration from the verified 10 GB source disk to the verified 11 GB destination disk in the `skidamarink` project.
- [ ] 37. If migration session creation fails, hard stop as blocker-fail.
- [ ] 38. Using the cirrusdata skill, wait for initial sync completion in the `skidamarink` project.
- [ ] 39. SSH to the live DHCP clone IP confirmed in step 31, refresh package metadata, and check available kernels again using the full distro candidate listing: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`; Debian: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-amd64 linux-headers-amd64; apt list -a linux-image-amd64 linux-headers-amd64`. If this refreshed list differs from step 9, trust the refreshed live metadata and record that the earlier view was stale.
- [ ] 40. Select the first-upgrade target from the filtered candidate list; it must stay in the same major OS family, must not be the latest candidate, and must follow the candidate scope rule from step 10. If no valid non-latest first-upgrade target exists, hard stop as blocker-fail.
- [ ] 41. On the clone, verify matching dev/header packages are available for the exact first-upgrade target; on Debian/Ubuntu, resolve meta packages to the exact versioned `linux-image-*` and `linux-headers-*` packages before installing.
- [ ] 42. On the clone, install the first-upgrade kernel and matching dev/header packages without rebooting yet.
- [ ] 43. On Red Hat-family systems with `grubby` including Oracle Linux, set the first-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
- [ ] 44. On the clone, reboot into the first-upgrade kernel.
- [ ] 45. From vCenter, query guest-tools again for the live clone IP after reboot.
- [ ] 46. SSH to the rebooted clone IP found in step 45.
- [ ] 47. On the clone, verify kernel plus dev/header package versions match the selected first-upgrade version exactly; on Debian/Ubuntu, verify the running kernel maps to the exact installed versioned `linux-image-*` and `linux-headers-*` packages.
- [ ] 48. If versions do not match exactly, stop as blocker-fail.
- [ ] 49. Using the cirrusdata skill, verify the clone is online in the `skidamarink` project.
- [ ] 50. On the clone, verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 51. On the clone, write sample data to the verified 10 GB source disk.
- [ ] 52. Using the cirrusdata skill, trigger sync and confirm tracking status in the `skidamarink` project.
- [ ] 53. Using the cirrusdata skill, uninstall CMC from the clone in the `skidamarink` project.
- [ ] 54. Using Cirrus Data MCP tools, run host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status.
- [ ] 55. Using Cirrus Data MCP tools, verify the cloned host entry and all migration sessions for the cloned host are gone from `skidamarink` before continuing.
- [ ] 56. SSH to the rebooted clone IP found in step 45, refresh package metadata, and check available kernels again using the full distro candidate listing: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`; Debian: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-amd64 linux-headers-amd64; apt list -a linux-image-amd64 linux-headers-amd64`. If this refreshed list differs from step 39, trust the refreshed live metadata and record that the earlier view was stale.
- [ ] 57. Select the latest-upgrade target kernel from the filtered candidate list; it must stay in the same major OS family, should use the latest available candidate in that scope, and must follow the candidate scope rule from step 10. If no valid latest-upgrade target exists, hard stop as blocker-fail.
- [ ] 58. On the clone, verify matching dev/header packages are available for the exact latest-upgrade target; on Debian/Ubuntu, resolve meta packages to the exact versioned `linux-image-*` and `linux-headers-*` packages before installing.
- [ ] 59. On the clone, install the latest-upgrade kernel and matching dev/header packages without rebooting yet.
- [ ] 60. On Red Hat-family systems with `grubby` including Oracle Linux, set the latest-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
- [ ] 61. On the clone, reboot into the latest-upgrade kernel.
- [ ] 62. From vCenter, query guest-tools again for the live clone IP after reboot.
- [ ] 63. SSH to the rebooted clone IP found in step 62.
- [ ] 64. On the clone, verify kernel plus dev/header package versions match the selected latest-upgrade version exactly; on Debian/Ubuntu, verify the running kernel maps to the exact installed versioned `linux-image-*` and `linux-headers-*` packages.
- [ ] 65. If versions do not match exactly, stop as blocker-fail.
- [ ] 66. Using the cirrusdata skill, run the CMC installer again after the prior uninstall and host cleanup, install CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus` on the latest kernel.
- [ ] 67. Using the cirrusdata skill, create the second local migration from the verified 10 GB source disk to the verified 11 GB destination disk in the `skidamarink` project and wait for initial sync completion.
- [ ] 68. If migration session creation fails, hard stop as blocker-fail.
- [ ] 69. Using the cirrusdata skill, confirm the machine is online in the `skidamarink` project.
- [ ] 70. SSH to the live clone IP currently reported by vCenter and verify MTDI and Galaxy Migrate services/driver are up.
- [ ] 71. Only after steps 66-70 all pass, begin success-path cleanup.
- [ ] 72. From vCenter, power off the cloned machine.
- [ ] 73. From vCenter, delete the cloned VM and its disks from inventory.
- [ ] 74. Using Cirrus Data MCP tools, run final host cleanup for `skidamarink`, remove the cloned host entry for this test clone only, and verify the cloned host entry plus all migration sessions for the cloned host are gone.
- [ ] 75. Blocker-fail path after clone creation, as an alternate to steps 71-74: leave the cloned VM powered on and present in inventory for manual inspection, then continue to step 76.
- [ ] 76. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration; keep result artifacts under `tmp/` local-only and do not commit them.
## Stop Conditions
Stop immediately and record a blocker if any of these occur:
- Requested source host is a SUSE/SLES machine.
- Cannot verify clone identity.
- Red Hat preflight fails on an actual Red Hat subscription-managed manual run.
- Any enabled/source repo points at `192.168.3.199`.
- Cannot verify the intended 10 GB source disk or intended 11 GB destination disk before wipe or migration creation.
- Cannot detach required FC PCI adapters.
- Clone cannot be created on datastore `AutomatedTest-UnitTesting`.
- FC passthrough adapters remain attached after the detach/verification step.
- Any single step exceeds the 10-minute timeout.
- DHCP transition cannot be completed because the clone still reports the previous static IP after cleanup and retry.
- Kernel upgrade candidate criteria not met.
- Matching kernel development/header packages are unavailable for the exact selected target kernel.
- The selected grubby default does not match the target kernel on a Red Hat-family system that uses `grubby`.
- Running kernel and installed development/header package versions do not match after a kernel upgrade and reboot.
- CMC install, CMC uninstall, or required cloned-host/session cleanup fails.
- Migration session creation failed (including API/service errors such as HTTP 5xx or equivalent backend unavailability).
- Any critical migration/service validation failure that blocks continuation.
@@ -189,11 +199,14 @@ Use one cumulative results file and append one new section per tested host. Keep
- Start OS version:
- Start kernel version:
- Kernel list before first upgrade (full candidate list, filtered by scope rule):
- Package metadata refresh/stale-view notes:
- First-upgrade candidate scope decision (`same-stream|cross-stream`) and reason:
- Kernel selected for step-up upgrade:
- Matching dev/header packages for step-up target (availability check):
- Kernel after step-up reboot:
- Installed dev/header package versions after step-up:
- Kernel list before latest upgrade (full candidate list, filtered by scope rule):
- Latest-upgrade candidate scope decision (`same-stream|cross-stream`) and reason:
- Kernel selected for latest upgrade:
- Matching dev/header packages for latest target (availability check):
- Kernel after latest reboot:
@@ -201,9 +214,11 @@ Use one cumulative results file and append one new section per tested host. Keep
### Execution Summary (Short Bullets)
- Clone created / FC PCI detached: `PASS|FAIL` - notes
- Red Hat preflight: `PASS|SKIPPED|FAIL` - notes
- SUSE exclusion check: `PASS|FAIL` - notes
- Hostname/IP DHCP conversion: `PASS|FAIL` - notes
- 10 GB source disk prep before first CMC install: `PASS|FAIL` - notes
- CMC reinstall #1: `PASS|FAIL` - notes
- 10 GB source disk and 11 GB destination disk verification/prep before first CMC install: `PASS|FAIL` - notes
- CMC install #1: `PASS|FAIL` - notes
- Local migration #1 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes
- Step-up kernel upgrade: `PASS|FAIL` - notes
- Step-up dev/header package match check: `PASS|FAIL` - notes
@@ -213,16 +228,17 @@ Use one cumulative results file and append one new section per tested host. Keep
- CMC uninstall: `PASS|FAIL` - notes
- Latest kernel upgrade: `PASS|FAIL` - notes
- Latest dev/header package match check: `PASS|FAIL` - notes
- CMC reinstall #2: `PASS|FAIL` - notes
- CMC install #2: `PASS|FAIL` - notes
- Local migration #2 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes
- Online in skidamarink after latest upgrade: `PASS|FAIL` - notes
- MTDI/Galaxy Migrate service+driver health after latest upgrade: `PASS|FAIL` - notes
- Clone power off and deletion (success path only): `PASS|FAIL|N/A` - notes
- Final cloned-host/session cleanup: `PASS|FAIL|N/A` - notes
### Final Outcome
- Overall result: `PASS|FAIL|PARTIAL`
- Outcome interpretation:
- `PASS`: full planned test flow completed and core validation goals passed (CMC install/uninstall/reinstall, kernel step-up/latest upgrade, and post-upgrade service/driver health checks), even if non-blocking warnings occurred.
- `PASS`: full planned test flow completed and core validation goals passed (CMC install, uninstall, second install, kernel step-up/latest upgrade, and post-upgrade service/driver health checks), even if non-blocking warnings occurred.
- `FAIL`: a true blocker prevented completion of required validation goals.
- `PARTIAL`: use only when execution stops early by operator choice or scope is intentionally reduced, not for non-blocking warnings in a completed run.
- Blocking issue summary:
@@ -236,7 +252,7 @@ Use one cumulative results file and append one new section per tested host. Keep
- Do not leave a completed test run only in conversation; the artifact files are the source of record.
- All recorded timestamps must use UTC format: `YYYY-MM-DD HH:MM UTC`.
- Record the UTC start time when the run begins.
- Record the UTC end time when the run reaches a terminal outcome and cleanup/reporting is complete.
- Record the UTC end time when the run reaches a terminal outcome and the allowed final checklist actions for that outcome are complete.
- Compute `Test duration` from the recorded start/end timestamps and include it in both files.
- If a run is still in progress when first recorded, update the runtime once the run reaches its terminal outcome.
- Use the `Per-Host Test Result Record` format for the results file.
@@ -247,6 +263,7 @@ Summary file requirements:
- Include test start time, test end time, and total test duration for the run
- Include a short run summary (current kernel -> first CMC install phase -> kernel upgrade -> CMC uninstall -> kernel upgrade -> second CMC install phase)
- Include host tested, kernel progression (start, step-up, latest), and overall result
- Include package metadata stale-view notes, candidate scope decisions, and final cloned-host/session cleanup status when present.
- Start each run section with a `##` heading that includes the OS family and the final outcome, for example: `## Amazon Linux 2023 - PASS`.
- Put the OS version and the rest of the run details under that heading so the heading stays the visible OS label above the test snippet.
- Backfill `Test duration` into the summary and results artifacts for any run where both timestamps are known.