diff --git a/tests/cmc-upgrade-kernel-test.md b/tests/cmc-upgrade-kernel-test.md index 7f95191..58aa9a5 100644 --- a/tests/cmc-upgrade-kernel-test.md +++ b/tests/cmc-upgrade-kernel-test.md @@ -49,6 +49,12 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"` - Source credentials from `/home/aw/code/cds/.env.credentials.local`. +## SUSE Exclusion Rule (Global) +- Do not run this test against SUSE/SLES ATVM machines. +- SUSE ATVM machines use a local offline DVD/vault repository for packages. +- Kernel upgrade discovery is not valid for this test unless the machine can access official SUSE repositories, which requires a SUSE subscription. +- If the operator requests this test against any SUSE/SLES machine, stop immediately before source power-on or clone creation and report that SUSE is excluded for this test because it uses the local offline repository. + ## Execution Mode (Global) - Run this test in continuous execution mode. - Do not pause for additional operator prompts between steps. @@ -80,84 +86,86 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. ## Test Procedure -1. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. -2. Confirm source host is powered on for the inspection phase. If it is powered off, power it on. -3. SSH to the source host and check available kernel versions on the source before cloning. -4. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`). +1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine. +2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. +3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on. +4. SSH to the source host and check available kernel versions on the source before cloning. +5. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`). - On Ubuntu, prefer the generic kernel track first: use `apt-cache madison` and/or `apt list -a` for `linux-image-generic` and `linux-headers-generic` after `apt update`. - If the generic track returns no usable upgrade candidates, pause and ask the operator whether to continue with alternate Ubuntu kernel tracks. - When prompting for alternate Ubuntu tracks, display the other available kernel candidates, including `linux-image-generic-hwe-24.04` / `linux-headers-generic-hwe-24.04` when present, and wait for explicit operator confirmation before proceeding. -5. Candidate scope rule: +6. Candidate scope rule: - Include only kernels in the same major OS family as the current machine (no major-version upgrades). - Prefer candidates within the same minor stream as current OS/kernel when available. -6. Verify at least 2 upgrade candidates exist in the filtered candidate list. -7. If fewer than 2 candidates: hard stop and end run before clone creation. -8. Gate check: +7. Verify at least 2 upgrade candidates exist in the filtered candidate list. +8. If fewer than 2 candidates: hard stop and end run before clone creation. +9. Gate check: - If step 7 triggered a stop condition, execute no further steps. - If no stop condition was triggered, continue with the next step. -9. After source-host inspection is complete, issue the power-off request and wait for vCenter to report the source host as `poweredOff`. -10. Confirm the source host is still `poweredOff` in vCenter immediately before cloning. Do not start the clone while the source VM is transitioning or pending power-off. -11. Determine base clone name: `aw999-[source-without-atvmxxx-]`. -12. Before cloning, check whether that clone name already exists in vCenter. -13. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed. -14. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only, and place the clone on `CDS1-ESX165` / `192.168.1.165` by default unless the operator explicitly specifies a different ESXi host. -15. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed. -16. Detach the 2 FC PCI adapters from the cloned VM. -17. Verify in vCenter that both FC passthrough devices are no longer present on the clone. -18. Power on clone. -19. Query vCenter guest-tools for the live clone IP. -20. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`. -21. Change OS hostname to clone name, replacing `.` with `-`. -22. Convert networking from static IP to DHCP. -23. Remove/clean static IP configuration references. -24. Reboot clone. -25. Query vCenter guest-tools again for the new live clone IP. -26. SSH to the new live clone IP and verify the DHCP state. -27. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify. -28. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`. -29. Before the first CMC install, wipe the 10GB source disk with `dd if=/dev/zero of=/dev/sdb bs=1M count=32 status=progress conv=fsync`, then verify that no filesystem or partition signatures remain (`wipefs -n /dev/sdb`, `blkid /dev/sdb`, `file -s /dev/sdb`, `lsblk -f /dev/sdb`). This disk prep is one-time only and must not be repeated in later stages of the test. -30. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. -31. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. -32. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. -33. Wait for initial sync completion. -34. Check available kernels again using full candidate listing (not latest-only output). -35. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. -36. Verify matching dev/header packages for the selected first-upgrade target are available. -37. Install selected first-upgrade kernel and matching dev/header packages, then reboot. -38. Query vCenter guest-tools again for the live clone IP after reboot. -39. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version. -40. If versions do not match exactly, stop as blocker-fail. -41. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. -42. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. -43. Write sample data to source 10GB disk. -44. Trigger sync and confirm tracking status using `cirrusdata`. -45. Uninstall CMC. -46. Post-uninstall cleanup checkpoint: +10. After source-host inspection is complete, issue the power-off request and wait for vCenter to report the source host as `poweredOff`. +11. Confirm the source host is still `poweredOff` in vCenter immediately before cloning. Do not start the clone while the source VM is transitioning or pending power-off. +12. Determine base clone name: `aw999-[source-without-atvmxxx-]`. +13. Before cloning, check whether that clone name already exists in vCenter. +14. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed. +15. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only, and place the clone on `CDS1-ESX165` / `192.168.1.165` by default unless the operator explicitly specifies a different ESXi host. +16. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed. +17. Detach the 2 FC PCI adapters from the cloned VM. +18. Verify in vCenter that both FC passthrough devices are no longer present on the clone. +19. Power on clone. +20. Query vCenter guest-tools for the live clone IP. +21. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`. +22. Change OS hostname to clone name, replacing `.` with `-`. +23. Convert networking from static IP to DHCP. +24. Remove/clean static IP configuration references. +25. Reboot clone. +26. Query vCenter guest-tools again for the new live clone IP. +27. SSH to the new live clone IP and verify the DHCP state. +28. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify. +29. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`. +30. Before the first CMC install, wipe the 10GB source disk with `dd if=/dev/zero of=/dev/sdb bs=1M count=32 status=progress conv=fsync`, then verify that no filesystem or partition signatures remain (`wipefs -n /dev/sdb`, `blkid /dev/sdb`, `file -s /dev/sdb`, `lsblk -f /dev/sdb`). This disk prep is one-time only and must not be repeated in later stages of the test. +31. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. +32. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. +33. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. +34. Wait for initial sync completion. +35. Check available kernels again using full candidate listing (not latest-only output). +36. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. +37. Verify matching dev/header packages for the selected first-upgrade target are available. +38. Install selected first-upgrade kernel and matching dev/header packages, then reboot. +39. Query vCenter guest-tools again for the live clone IP after reboot. +40. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version. +41. If versions do not match exactly, stop as blocker-fail. +42. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. +43. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. +44. Write sample data to source 10GB disk. +45. Trigger sync and confirm tracking status using `cirrusdata`. +46. Uninstall CMC. +47. Post-uninstall cleanup checkpoint: - Run MCP host cleanup for `skidamarink`. - Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline. -47. Check available kernels. -48. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). -49. Verify matching dev/header packages for the selected latest-upgrade target are available. -50. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. -51. Query vCenter guest-tools again for the live clone IP after reboot. -52. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version. -53. If versions do not match exactly, stop as blocker-fail. -54. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. -55. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. -56. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. -57. Confirm machine is online in `skidamarink` using `cirrusdata`. -58. SSH and verify MTDI, Galaxy Migrate services/driver are up. -59. Success-path cleanup only: power off cloned machine. -60. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. -61. Success-path final cleanup checkpoint: +48. Check available kernels. +49. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). +50. Verify matching dev/header packages for the selected latest-upgrade target are available. +51. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. +52. Query vCenter guest-tools again for the live clone IP after reboot. +53. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version. +54. If versions do not match exactly, stop as blocker-fail. +55. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. +56. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. +57. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. +58. Confirm machine is online in `skidamarink` using `cirrusdata`. +59. SSH and verify MTDI, Galaxy Migrate services/driver are up. +60. Success-path cleanup only: power off cloned machine. +61. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. +62. Success-path final cleanup checkpoint: - Run MCP host cleanup for `skidamarink`. - Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline. -62. Blocker-fail path after clone creation: +63. Blocker-fail path after clone creation: - Stop test immediately after recording failure details. - Leave cloned VM powered on and present in inventory for manual inspection. - Do not run clone power-off/delete steps in blocker-fail path. ## Stop Conditions +- Requested source host is a SUSE/SLES machine. - Cannot verify clone identity. - Cannot detach required FC PCI adapters. - Clone cannot be created on datastore `AutomatedTest-UnitTesting`.