Exclude SUSE from kernel upgrade test

This commit is contained in:
Cirrus Codex
2026-05-15 17:04:55 -04:00
parent 6996397985
commit 95cb9efebc

View File

@@ -49,6 +49,12 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"`
- Source credentials from `/home/aw/code/cds/.env.credentials.local`.
## SUSE Exclusion Rule (Global)
- Do not run this test against SUSE/SLES ATVM machines.
- SUSE ATVM machines use a local offline DVD/vault repository for packages.
- Kernel upgrade discovery is not valid for this test unless the machine can access official SUSE repositories, which requires a SUSE subscription.
- If the operator requests this test against any SUSE/SLES machine, stop immediately before source power-on or clone creation and report that SUSE is excluded for this test because it uses the local offline repository.
## Execution Mode (Global)
- Run this test in continuous execution mode.
- Do not pause for additional operator prompts between steps.
@@ -80,84 +86,86 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re
- Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events.
## Test Procedure
1. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
2. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.
3. SSH to the source host and check available kernel versions on the source before cloning.
4. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`).
1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine.
2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.
4. SSH to the source host and check available kernel versions on the source before cloning.
5. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`).
- On Ubuntu, prefer the generic kernel track first: use `apt-cache madison` and/or `apt list -a` for `linux-image-generic` and `linux-headers-generic` after `apt update`.
- If the generic track returns no usable upgrade candidates, pause and ask the operator whether to continue with alternate Ubuntu kernel tracks.
- When prompting for alternate Ubuntu tracks, display the other available kernel candidates, including `linux-image-generic-hwe-24.04` / `linux-headers-generic-hwe-24.04` when present, and wait for explicit operator confirmation before proceeding.
5. Candidate scope rule:
6. Candidate scope rule:
- Include only kernels in the same major OS family as the current machine (no major-version upgrades).
- Prefer candidates within the same minor stream as current OS/kernel when available.
6. Verify at least 2 upgrade candidates exist in the filtered candidate list.
7. If fewer than 2 candidates: hard stop and end run before clone creation.
8. Gate check:
7. Verify at least 2 upgrade candidates exist in the filtered candidate list.
8. If fewer than 2 candidates: hard stop and end run before clone creation.
9. Gate check:
- If step 7 triggered a stop condition, execute no further steps.
- If no stop condition was triggered, continue with the next step.
9. After source-host inspection is complete, issue the power-off request and wait for vCenter to report the source host as `poweredOff`.
10. Confirm the source host is still `poweredOff` in vCenter immediately before cloning. Do not start the clone while the source VM is transitioning or pending power-off.
11. Determine base clone name: `aw999-[source-without-atvmxxx-]`.
12. Before cloning, check whether that clone name already exists in vCenter.
13. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
14. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only, and place the clone on `CDS1-ESX165` / `192.168.1.165` by default unless the operator explicitly specifies a different ESXi host.
15. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed.
16. Detach the 2 FC PCI adapters from the cloned VM.
17. Verify in vCenter that both FC passthrough devices are no longer present on the clone.
18. Power on clone.
19. Query vCenter guest-tools for the live clone IP.
20. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`.
21. Change OS hostname to clone name, replacing `.` with `-`.
22. Convert networking from static IP to DHCP.
23. Remove/clean static IP configuration references.
24. Reboot clone.
25. Query vCenter guest-tools again for the new live clone IP.
26. SSH to the new live clone IP and verify the DHCP state.
27. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify.
28. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`.
29. Before the first CMC install, wipe the 10GB source disk with `dd if=/dev/zero of=/dev/sdb bs=1M count=32 status=progress conv=fsync`, then verify that no filesystem or partition signatures remain (`wipefs -n /dev/sdb`, `blkid /dev/sdb`, `file -s /dev/sdb`, `lsblk -f /dev/sdb`). This disk prep is one-time only and must not be repeated in later stages of the test.
30. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`.
31. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`.
32. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
33. Wait for initial sync completion.
34. Check available kernels again using full candidate listing (not latest-only output).
35. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate.
36. Verify matching dev/header packages for the selected first-upgrade target are available.
37. Install selected first-upgrade kernel and matching dev/header packages, then reboot.
38. Query vCenter guest-tools again for the live clone IP after reboot.
39. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version.
40. If versions do not match exactly, stop as blocker-fail.
41. After reboot, verify clone is online in `skidamarink` using `cirrusdata`.
42. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
43. Write sample data to source 10GB disk.
44. Trigger sync and confirm tracking status using `cirrusdata`.
45. Uninstall CMC.
46. Post-uninstall cleanup checkpoint:
10. After source-host inspection is complete, issue the power-off request and wait for vCenter to report the source host as `poweredOff`.
11. Confirm the source host is still `poweredOff` in vCenter immediately before cloning. Do not start the clone while the source VM is transitioning or pending power-off.
12. Determine base clone name: `aw999-[source-without-atvmxxx-]`.
13. Before cloning, check whether that clone name already exists in vCenter.
14. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
15. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only, and place the clone on `CDS1-ESX165` / `192.168.1.165` by default unless the operator explicitly specifies a different ESXi host.
16. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed.
17. Detach the 2 FC PCI adapters from the cloned VM.
18. Verify in vCenter that both FC passthrough devices are no longer present on the clone.
19. Power on clone.
20. Query vCenter guest-tools for the live clone IP.
21. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`.
22. Change OS hostname to clone name, replacing `.` with `-`.
23. Convert networking from static IP to DHCP.
24. Remove/clean static IP configuration references.
25. Reboot clone.
26. Query vCenter guest-tools again for the new live clone IP.
27. SSH to the new live clone IP and verify the DHCP state.
28. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify.
29. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`.
30. Before the first CMC install, wipe the 10GB source disk with `dd if=/dev/zero of=/dev/sdb bs=1M count=32 status=progress conv=fsync`, then verify that no filesystem or partition signatures remain (`wipefs -n /dev/sdb`, `blkid /dev/sdb`, `file -s /dev/sdb`, `lsblk -f /dev/sdb`). This disk prep is one-time only and must not be repeated in later stages of the test.
31. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`.
32. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`.
33. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
34. Wait for initial sync completion.
35. Check available kernels again using full candidate listing (not latest-only output).
36. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate.
37. Verify matching dev/header packages for the selected first-upgrade target are available.
38. Install selected first-upgrade kernel and matching dev/header packages, then reboot.
39. Query vCenter guest-tools again for the live clone IP after reboot.
40. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version.
41. If versions do not match exactly, stop as blocker-fail.
42. After reboot, verify clone is online in `skidamarink` using `cirrusdata`.
43. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
44. Write sample data to source 10GB disk.
45. Trigger sync and confirm tracking status using `cirrusdata`.
46. Uninstall CMC.
47. Post-uninstall cleanup checkpoint:
- Run MCP host cleanup for `skidamarink`.
- Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline.
47. Check available kernels.
48. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
49. Verify matching dev/header packages for the selected latest-upgrade target are available.
50. Install selected latest-upgrade kernel and matching dev/header packages, then reboot.
51. Query vCenter guest-tools again for the live clone IP after reboot.
52. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version.
53. If versions do not match exactly, stop as blocker-fail.
54. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`.
55. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion.
56. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
57. Confirm machine is online in `skidamarink` using `cirrusdata`.
58. SSH and verify MTDI, Galaxy Migrate services/driver are up.
59. Success-path cleanup only: power off cloned machine.
60. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory.
61. Success-path final cleanup checkpoint:
48. Check available kernels.
49. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
50. Verify matching dev/header packages for the selected latest-upgrade target are available.
51. Install selected latest-upgrade kernel and matching dev/header packages, then reboot.
52. Query vCenter guest-tools again for the live clone IP after reboot.
53. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version.
54. If versions do not match exactly, stop as blocker-fail.
55. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`.
56. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion.
57. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
58. Confirm machine is online in `skidamarink` using `cirrusdata`.
59. SSH and verify MTDI, Galaxy Migrate services/driver are up.
60. Success-path cleanup only: power off cloned machine.
61. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory.
62. Success-path final cleanup checkpoint:
- Run MCP host cleanup for `skidamarink`.
- Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline.
62. Blocker-fail path after clone creation:
63. Blocker-fail path after clone creation:
- Stop test immediately after recording failure details.
- Leave cloned VM powered on and present in inventory for manual inspection.
- Do not run clone power-off/delete steps in blocker-fail path.
## Stop Conditions
- Requested source host is a SUSE/SLES machine.
- Cannot verify clone identity.
- Cannot detach required FC PCI adapters.
- Clone cannot be created on datastore `AutomatedTest-UnitTesting`.