# CMC Upgrade Kernel Test Template ## Purpose Validate CMC behavior across staged kernel upgrades on a cloned VM, including reinstall, migration health, service health, and cleanup. ## Scope - Run per source host provided by operator. - Work only on the cloned VM created for this test. ## Inputs - Source VM hostname: `` - vCenter target/source location: `` - Required clone datastore: `AutomatedTest-UnitTesting` - Initial clone access host/IP: `` - SSH username variable: `` - SSH password variable: `` - Cirrus profile/project: `gcstage` / `skidamarink` ## Credential Source - Use credentials from: `/home/aw/code/cds/.env.credentials.local` - Do not hardcode usernames/passwords in test records or commands. ## CMC Tooling Rule (Global) - For all CMC-related actions in this test, use the `cirrusdata` skill/CLI path. - Exception: offline-host cleanup is not handled by that skill yet; use the MCP connection for offline-host removal. - Apply this rule to every relevant step in this procedure. - For every CMC install/reinstall command in this test, always include installer option: `-no-prebuilt-mtdi-nexus`. ## Kernel Package Matching Rule (Global) - For every planned kernel upgrade, verify matching development/header packages are available for the exact target kernel version before installing that kernel. - On Red Hat-family systems, verify `kernel-devel-` and `kernel-headers-` availability (or documented distro-equivalent package names where applicable). - The first kernel upgrade attempt must not use the latest kernel in the filtered candidate list; reserve the latest kernel for the final kernel-upgrade stage. - When upgrading kernel versions, also upgrade/install the matching development/header packages for that same version. - After each kernel upgrade and reboot, verify running kernel version and installed dev/header package versions all match. - If kernel and dev/header package versions are mismatched at any point, stop immediately as blocker-fail and do not continue with remediation by assumption. ## Red Hat Preflight (Global, Manual Tasks Only) - Apply this section only when the test target is an actual Red Hat subscription-managed machine and the run is manually executed. - Do not apply this section to CentOS, Oracle Linux, Rocky, Alma, or other RHEL-derived distributions unless the operator explicitly says the machine should be treated as Red Hat-managed for this run. - If the target is not actual RHEL, skip this preflight entirely and do not attempt `subscription-manager`. - Do not apply this section to ATVM automation runs that already handle subscription flow. - Before running test steps on Red Hat, run: - `subscription-manager remove --all` - `subscription-manager unregister` - `subscription-manager clean` - `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"` - Source credentials from `/home/aw/code/cds/.env.credentials.local`. ## Execution Mode (Global) - Run this test in continuous execution mode. - Do not pause for additional operator prompts between steps. - Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS` or `FAIL`) and all required cleanup/reporting steps are completed. - Only stop early if a true blocker prevents safe continuation, and still complete required cleanup/reporting before returning control. ## Naming Rule - Base clone VM name in vCenter: `aw999-[source hostname without atvmxxx- prefix]` - Before cloning, verify the clone VM name is not already in use. - If already in use, append a numeric suffix to the base name: `-1`, `-2`, ... `-N` until an unused name is found. - Use plain VM name only (no `/CDSHQ-Eng/vm/` prefix) for clone destination name, and set folder separately if needed. - OS hostname on clone: same clone name but replace `.` with `-` ## Safety Rules - Delete only the clone created for this test. - If the clone is missing or identity is uncertain, stop and do not delete any other VM. - If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection. - Do not delete or power off the clone on blocker-fail outcomes. - After source-host kernel inspection is complete, power the source VM off and re-verify in vCenter that it is powered off before cloning. - Detaching the 2 FC PCI passthrough adapters from the cloned VM is mandatory before any guest boot or guest-side change. - Verify in vCenter that both FC passthrough devices are absent before proceeding past the clone-prep stage. - Always use live vCenter guest-tools data to confirm the current clone IP before any SSH or polling attempt. - Re-check live vCenter guest-tools IP after clone power-on, after switching networking from static to DHCP, and after any reboot before attempting SSH. - Do not assume the previous IP is still valid after a reboot or network change. - Cleanup actions that remove hosts from CMC must target only the cloned host used in the current test run. - Treat migration session creation failures (for either migration #1 or migration #2) as blocker-fail events. ## Test Procedure 1. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. 2. Confirm source host is powered on for the inspection phase. If it is powered off, power it on. 3. SSH to the source host and check available kernel versions on the source before cloning. 4. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`). 5. Candidate scope rule: - Include only kernels in the same major OS family as the current machine (no major-version upgrades). - Prefer candidates within the same minor stream as current OS/kernel when available. 6. Verify at least 2 upgrade candidates exist in the filtered candidate list. 7. If fewer than 2 candidates: hard stop and end run before clone creation. 8. Gate check: - If step 7 triggered a stop condition, execute no further steps. - If no stop condition was triggered, continue with the next step. 9. After source-host inspection is complete, power the source VM off. 10. Confirm in vCenter that the source host is powered off before cloning. 11. Determine base clone name: `aw999-[source-without-atvmxxx-]`. 12. Before cloning, check whether that clone name already exists in vCenter. 13. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed. 14. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only. 15. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed. 16. Detach the 2 FC PCI adapters from the cloned VM. 17. Verify in vCenter that both FC passthrough devices are no longer present on the clone. 18. Power on clone. 19. Query vCenter guest-tools for the live clone IP. 20. SSH to the live clone IP using credentials from `/home/aw/code/cds/.env.credentials.local`. 21. Change OS hostname to clone name, replacing `.` with `-`. 22. Convert networking from static IP to DHCP. 23. Remove/clean static IP configuration references. 24. Reboot clone. 25. Query vCenter guest-tools again for the new live clone IP. 26. SSH to the new live clone IP and verify the DHCP state. 27. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify. 28. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/aw/code/cds/.env.credentials.local`. 29. Before the first CMC install, wipe the 10GB source disk so it has no filesystem, partition table, or other residual content. This disk prep is one-time only and must not be repeated in later stages of the test. 30. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`. 31. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`. 32. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 33. Wait for initial sync completion. 34. Check available kernels again using full candidate listing (not latest-only output). 35. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate. 36. Verify matching dev/header packages for the selected first-upgrade target are available. 37. Install selected first-upgrade kernel and matching dev/header packages, then reboot. 38. Query vCenter guest-tools again for the live clone IP after reboot. 39. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version. 40. If versions do not match exactly, stop as blocker-fail. 41. After reboot, verify clone is online in `skidamarink` using `cirrusdata`. 42. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up. 43. Write sample data to source 10GB disk. 44. Trigger sync and confirm tracking status using `cirrusdata`. 45. Uninstall CMC. 46. Post-uninstall cleanup checkpoint: - Run MCP offline-host cleanup for `skidamarink`. - If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically via MCP (target only this test clone host). - Because CMC status can lag behind VM state, poll briefly for status transition; if still online, perform targeted MCP host removal for the tested clone. 47. Check available kernels. 48. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred). 49. Verify matching dev/header packages for the selected latest-upgrade target are available. 50. Install selected latest-upgrade kernel and matching dev/header packages, then reboot. 51. Query vCenter guest-tools again for the live clone IP after reboot. 52. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version. 53. If versions do not match exactly, stop as blocker-fail. 54. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`. 55. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion. 56. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail. 57. Confirm machine is online in `skidamarink` using `cirrusdata`. 58. SSH and verify MTDI, Galaxy Migrate services/driver are up. 59. Success-path cleanup only: power off cloned machine. 60. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory. 61. Success-path final cleanup checkpoint: - Run MCP offline-host cleanup for `skidamarink`. - If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically via MCP (target only this test clone host). - Because CMC status can lag behind VM deletion/power-off, wait/poll briefly first; if still online, perform targeted MCP host removal for the tested clone. 62. Blocker-fail path after clone creation: - Stop test immediately after recording failure details. - Leave cloned VM powered on and present in inventory for manual inspection. - Do not run clone power-off/delete steps in blocker-fail path. ## Stop Conditions - Cannot verify clone identity. - Cannot detach required FC PCI adapters. - Clone cannot be created on datastore `AutomatedTest-UnitTesting`. - FC passthrough adapters remain attached after the detach/verification step. - DHCP transition cannot be completed (clone remains static at ``). - Kernel upgrade candidate criteria not met. - Migration session creation failed (including API/service errors such as HTTP 5xx or equivalent backend unavailability). - Any critical migration/service validation failure that blocks continuation. ## Per-Host Test Result Record Use one cumulative results file and append one new section per tested host. ### Host Metadata - Test date/time (UTC): - Operator: - Source VM: - Cloned VM name: - Clone origin (vCenter path/folder/cluster): - Final DHCP IP of clone: ### Kernel / OS Tracking - Start OS version: - Start kernel version: - Kernel list before first upgrade (full candidate list, filtered by scope rule): - Kernel selected for step-up upgrade: - Matching dev/header packages for step-up target (availability check): - Kernel after step-up reboot: - Installed dev/header package versions after step-up: - Kernel list before latest upgrade (full candidate list, filtered by scope rule): - Kernel selected for latest upgrade: - Matching dev/header packages for latest target (availability check): - Kernel after latest reboot: - Installed dev/header package versions after latest upgrade: ### Execution Summary (Short Bullets) - Clone created / FC PCI detached: `PASS|FAIL` - notes - Hostname/IP DHCP conversion: `PASS|FAIL` - notes - CMC reinstall #1: `PASS|FAIL` - notes - 10 GB source disk prep before first CMC install: `PASS|FAIL` - notes - Local migration #1 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes - Step-up kernel upgrade: `PASS|FAIL` - notes - Step-up dev/header package match check: `PASS|FAIL` - notes - Online in skidamarink after step-up: `PASS|FAIL` - notes - MTDI/Galaxy Migrate service+driver health after step-up: `PASS|FAIL` - notes - Write data + tracking status: `PASS|FAIL` - notes - CMC uninstall: `PASS|FAIL` - notes - Latest kernel upgrade: `PASS|FAIL` - notes - Latest dev/header package match check: `PASS|FAIL` - notes - CMC reinstall #2: `PASS|FAIL` - notes - Local migration #2 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes - Online in skidamarink after latest upgrade: `PASS|FAIL` - notes - MTDI/Galaxy Migrate service+driver health after latest upgrade: `PASS|FAIL` - notes - Clone power off and deletion (success path only): `PASS|FAIL|N/A` - notes ### Final Outcome - Overall result: `PASS|FAIL|PARTIAL` - Outcome interpretation: - `PASS`: full planned test flow completed and core validation goals passed (CMC install/uninstall/reinstall, kernel step-up/latest upgrade, and post-upgrade service/driver health checks), even if non-blocking warnings occurred. - `FAIL`: a true blocker prevented completion of required validation goals. - `PARTIAL`: use only when execution stops early by operator choice or scope is intentionally reduced, not for non-blocking warnings in a completed run. - Blocking issue summary: - Follow-up actions: ## Timestamp Standard - All recorded test timestamps must use UTC. - Format: `YYYY-MM-DD HH:MM UTC` ## Result Storage Location Store and append all per-host results in: - `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-results.md` Also generate a run summary file in the same directory: - `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-summary.md` Summary file requirements: - Title: `CMC Upgrade Kernel Test Summary` - Include UTC date/time for the run - Include a short workflow summary (current kernel -> install CMC -> kernel upgrade -> uninstall CMC -> kernel upgrade -> install CMC) - Include host tested, kernel progression (start, step-up, latest), and overall result