diff --git a/tests/cmc-upgrade-kernel-test.md b/tests/cmc-upgrade-kernel-test.md index f12da2c..ac3b4c6 100644 --- a/tests/cmc-upgrade-kernel-test.md +++ b/tests/cmc-upgrade-kernel-test.md @@ -6,33 +6,29 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re ## Scope - Run per source host provided by operator. - Work only on the cloned VM created for this test. -- If the operator asks to run `tests/cmc-upgrade-kernel-test.md` or any variation of the "cmc upgrade kernel test" for an ATVM host, treat that request as referring to this file and this workflow only. -- Treat this file as the source of truth for this test and ignore unrelated workflow references unless the operator explicitly asks to incorporate them for the current request. +- If the operator asks to run `tests/cmc-upgrade-kernel-test.md` or any variation of the "cmc upgrade kernel test" for an ATVM host, treat that request as referring to this file only. +- Treat this file as the source of truth for this test and ignore unrelated procedure references unless the operator explicitly asks to incorporate them for the current request. ## Inputs - Source VM hostname: `` -- vCenter target/source location: `` +- vCenter source VM inventory location: `` - Required clone datastore: `AutomatedTest-UnitTesting` - Default clone ESXi host: `CDS1-ESX165` / `192.168.1.165` unless the operator explicitly specifies otherwise -- Initial clone access host/IP: `` -- SSH username variable: `` -- SSH password variable: `` - Cirrus profile/project: `gcstage` / `skidamarink` -## Credential Source +## Credentials - Use credentials from: `/home/cirrus/cds/.env.credentials.local` - Do not hardcode usernames/passwords in test records or commands. - Before any vCenter, SSH, Red Hat subscription, or CMC action, load credentials with `set -a; source /home/cirrus/cds/.env.credentials.local; set +a`. -- Verify required credential variable names are present without printing secret values. +- Verify required credential variable names are present without printing secret values: `VCENTER_HOST`, `VCENTER_USER`, `VCENTER_PASSWORD`, `ATVM_TARGET_USER`, `ATVM_TARGET_PASSWORD`, `REDHAT_SUBSCRIPTION_USER`, `REDHAT_SUBSCRIPTION_PASSWORD`, `CMC_GCSTAGE_URL`, `CMC_GCSTAGE_REGISTRATION_CODE`, and `CIRRUS_API_TOKEN`. - Do not parse the credential file with `grep`/`awk` as the authority; source it and inspect the environment because entries may use `export KEY=...`. -## CMC Tooling Rule (Global) +## CMC Tooling - For all CMC-related actions in this test, use the `cirrusdata` skill/CLI path. -- Exception: offline-host cleanup is not handled by that skill yet; use the MCP connection for offline-host removal. -- Apply this rule to every relevant step in this procedure. +- Exception: host cleanup is not handled by that skill yet; use the Cirrus Data MCP tools for offline-host cleanup and cloned-host cleanup. - For every CMC install/reinstall command in this test, always include installer option: `-no-prebuilt-mtdi-nexus`. -## Kernel Package Matching Rule (Global) +## Kernel Package Rules - For every planned kernel upgrade, verify matching development/header packages are available for the exact target kernel version before installing that kernel. - On Red Hat-family systems, verify `kernel-devel-` and `kernel-headers-` availability (or documented distro-equivalent package names where applicable). - The first kernel upgrade attempt must not use the latest kernel in the filtered candidate list; reserve the latest kernel for the final kernel-upgrade stage. @@ -40,35 +36,33 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - On Red Hat-family systems that use `grubby` (including Oracle Linux), explicitly set the selected kernel as the default before rebooting, then verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-` path. If the default does not match, stop before reboot as blocker-fail. - After each kernel upgrade and reboot, verify running kernel version and installed dev/header package versions all match. - If kernel and dev/header package versions are mismatched at any point, stop immediately as blocker-fail and do not continue with remediation by assumption. -- Before any kernel candidate discovery step on any distro, force a fresh package metadata refresh on the live host before evaluating available kernel builds. Use the distro's normal refresh command for the installed package manager (for example `dnf makecache`, `yum makecache`, or `zypper refresh`). For APT-based distros, use a hard APT refresh so stale or empty package-list files are rebuilt: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update`. If the refreshed view differs from a prior result, trust the refreshed live metadata and record that the earlier view was stale. +- Before any kernel candidate discovery step, force a fresh package metadata refresh on the live host before evaluating available kernel builds. Use the distro command set in the checklist for RHEL-family and APT-based hosts. If the refreshed view differs from a prior result, trust the refreshed live metadata and record that the earlier view was stale. -## Red Hat Preflight (Global, Manual Tasks Only) +## Red Hat Preflight - Apply this section only when the test target is an actual Red Hat subscription-managed machine and the run is manually executed. - Do not apply this section to CentOS, Oracle Linux, Rocky, Alma, or other RHEL-derived distributions unless the operator explicitly says the machine should be treated as Red Hat-managed for this run. - If the target is not actual RHEL, skip this preflight entirely and do not attempt `subscription-manager`. - Do not apply this section to ATVM automation runs that already handle subscription flow. -- Before running test steps on Red Hat, run: +- After sourcing credentials and before running test steps on Red Hat, run: - `subscription-manager remove --all` - `subscription-manager unregister` - `subscription-manager clean` - `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"` -- Source credentials from `/home/cirrus/cds/.env.credentials.local`. -## SUSE Exclusion Rule (Global) -- Do not run this test against SUSE/SLES ATVM machines. +## SUSE Exclusion +- Do not run this test against SUSE/SLES ATVM machines; stop before source power-on or clone creation and report that SUSE is excluded for this test. - SUSE ATVM machines use a local offline DVD/vault repository for packages. - Kernel upgrade discovery is not valid for this test unless the machine can access official SUSE repositories, which requires a SUSE subscription. -- If the operator requests this test against any SUSE/SLES machine, stop immediately before source power-on or clone creation and report that SUSE is excluded for this test because it uses the local offline repository. -## Execution Mode (Global) +## Execution Mode - Run this test in continuous execution mode. - Do not pause for additional operator prompts between steps. -- Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS` or `FAIL`) and all required cleanup/reporting steps are completed. +- Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS`, `FAIL`, or operator-directed `PARTIAL`) and all required cleanup/reporting steps are completed. - Only stop early if a true blocker prevents safe continuation, and still complete required cleanup/reporting before returning control. - Time every step explicitly. - If any single step takes longer than 10 minutes, hard stop the test and treat it as a blocker-fail. -## Naming Rule +## Naming - Base clone VM name in vCenter: `aw999-[source hostname without atvmxxx- prefix]` - Before cloning, verify the clone VM name is not already in use. - If already in use, append a numeric suffix to the base name: `-1`, `-2`, ... `-N` until an unused name is found. @@ -83,14 +77,14 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or reinstall is not the end of the test. ## Execution Checklist -- Treat this checklist as the run ledger for the test. Figuratively check off the items in the checklist to ensure we do and confirm each step. +- Treat this checklist as the run ledger for the test; check each item as it is completed and confirmed. - Do not skip ahead, collapse, or reorder checklist items. - Do not begin teardown until every item below is checked complete. - If any checklist item cannot be checked, stop the test and record the blocker. - [ ] 0. Source `/home/cirrus/cds/.env.credentials.local` and verify required credential variables are present without printing secret values. - [ ] 1. Confirm the requested source host is not a SUSE/SLES machine; if it is SUSE/SLES, hard stop before source power-on or clone creation. -- [ ] 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup. +- [ ] 2. Remove offline hosts in `skidamarink` using Cirrus Data MCP tools for offline-host cleanup. - [ ] 3. From vCenter, confirm source host is powered on for the inspection phase; power it on if it is not already powered on. - [ ] 4. From vCenter, query guest-tools for the live source host IP address. - [ ] 5. SSH to the source host IP address found in step 4 using credentials from `/home/cirrus/cds/.env.credentials.local`. @@ -122,7 +116,7 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - [ ] 31. If the clone still reports the previous static IP, fix config cleanup and repeat steps 26-30. - [ ] 32. Continue all remaining steps using the live DHCP IP confirmed in step 30. - [ ] 33. On the clone, wipe `/dev/sdb` once and verify no filesystem or partition signatures remain. -- [ ] 34. Using the cirrusdata skill, reinstall CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus`. +- [ ] 34. Using the cirrusdata skill, install CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus`. - [ ] 35. Using the cirrusdata skill, create the first local migration from the 10 GB source disk to the 11 GB destination disk in the `skidamarink` project. - [ ] 36. If migration session creation fails, hard stop as blocker-fail. - [ ] 37. Using the cirrusdata skill, wait for initial sync completion in the `skidamarink` project. @@ -141,8 +135,8 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - [ ] 50. On the clone, write sample data to the source 10 GB disk. - [ ] 51. Using the cirrusdata skill, trigger sync and confirm tracking status in the `skidamarink` project. - [ ] 52. Using the cirrusdata skill, uninstall CMC from the clone in the `skidamarink` project. -- [ ] 53. Using MCP, run host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status. -- [ ] 54. Using MCP, verify the cloned host entry and all migration sessions for the cloned host are gone from `skidamarink` before continuing. +- [ ] 53. Using Cirrus Data MCP tools, run host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status. +- [ ] 54. Using Cirrus Data MCP tools, verify the cloned host entry and all migration sessions for the cloned host are gone from `skidamarink` before continuing. - [ ] 55. SSH to the live DHCP clone IP confirmed in step 30, refresh package metadata, and check available kernels again using the full distro candidate listing: RHEL/Oracle/Rocky/Alma: `dnf makecache; dnf list --showduplicates kernel kernel-devel kernel-headers`; older RHEL/CentOS: `yum makecache; yum list --showduplicates kernel kernel-devel kernel-headers`; Debian/Ubuntu: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update; apt-cache madison linux-image-generic linux-headers-generic; apt list -a linux-image-generic linux-headers-generic`. - [ ] 56. Select the latest-upgrade target kernel from the filtered candidate list; it must stay in the same major OS family and should use the latest available candidate in that scope. If no valid latest-upgrade target exists, hard stop as blocker-fail. - [ ] 57. On the clone, verify matching dev/header packages are available for the exact latest-upgrade target. @@ -153,7 +147,7 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - [ ] 62. SSH to the rebooted clone IP found in step 61. - [ ] 63. On the clone, verify kernel plus dev/header package versions match the selected latest-upgrade version. - [ ] 64. If versions do not match exactly, stop as blocker-fail. -- [ ] 65. Using the cirrusdata skill, reinstall CMC on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus` on the latest kernel. +- [ ] 65. Using the cirrusdata skill, install CMC again on the clone in the `skidamarink` project with `-no-prebuilt-mtdi-nexus` on the latest kernel. - [ ] 66. Using the cirrusdata skill, create the second local migration from the 10 GB source disk to the 11 GB destination disk in the `skidamarink` project and wait for initial sync completion. - [ ] 67. If migration session creation fails, hard stop as blocker-fail. - [ ] 68. Using the cirrusdata skill, confirm the machine is online in the `skidamarink` project. @@ -161,23 +155,25 @@ Validate CMC behavior across staged kernel upgrades on a cloned VM, including re - [ ] 70. Only after steps 65-69 all pass, begin success-path cleanup. - [ ] 71. From vCenter, power off the cloned machine. - [ ] 72. From vCenter, delete the cloned VM and its disks from inventory. -- [ ] 73. Using MCP, run final host cleanup for `skidamarink`, remove the cloned host entry for this test clone only, and verify the cloned host entry plus all migration sessions for the cloned host are gone. +- [ ] 73. Using Cirrus Data MCP tools, run final host cleanup for `skidamarink`, remove the cloned host entry for this test clone only, and verify the cloned host entry plus all migration sessions for the cloned host are gone. - [ ] 74. Blocker-fail path after clone creation, as an alternate to steps 70-73: leave the cloned VM powered on and present in inventory for manual inspection. - [ ] 75. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration. ## Stop Conditions +Stop immediately and record a blocker if any of these occur: + - Requested source host is a SUSE/SLES machine. - Cannot verify clone identity. - Cannot detach required FC PCI adapters. - Clone cannot be created on datastore `AutomatedTest-UnitTesting`. - FC passthrough adapters remain attached after the detach/verification step. -- DHCP transition cannot be completed (clone remains static at ``). +- DHCP transition cannot be completed because the clone still reports the previous static IP after cleanup and retry. - Kernel upgrade candidate criteria not met. - Migration session creation failed (including API/service errors such as HTTP 5xx or equivalent backend unavailability). - Any critical migration/service validation failure that blocks continuation. ## Per-Host Test Result Record -Use one cumulative results file and append one new section per tested host. +Use one cumulative results file and append one new section per tested host. Keep the record concise but complete enough to reconstruct the run. ### Host Metadata - Test start time (UTC): @@ -206,8 +202,8 @@ Use one cumulative results file and append one new section per tested host. ### Execution Summary (Short Bullets) - Clone created / FC PCI detached: `PASS|FAIL` - notes - Hostname/IP DHCP conversion: `PASS|FAIL` - notes -- CMC reinstall #1: `PASS|FAIL` - notes - 10 GB source disk prep before first CMC install: `PASS|FAIL` - notes +- CMC reinstall #1: `PASS|FAIL` - notes - Local migration #1 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes - Step-up kernel upgrade: `PASS|FAIL` - notes - Step-up dev/header package match check: `PASS|FAIL` - notes @@ -232,35 +228,25 @@ Use one cumulative results file and append one new section per tested host. - Blocking issue summary: - Follow-up actions: -## Timestamp Standard -- All recorded test timestamps must use UTC. -- Format: `YYYY-MM-DD HH:MM UTC` - -## Result Storage Location -Store and append all per-host results in: -- `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-results.md` - -Also generate a run summary file in the same directory: -- `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-summary.md` - -## Artifact Recording Rule -- Always append the latest run outcome to the results file and summary file at the end of each run. -- Do this for `PASS`, `FAIL`, and `PARTIAL` outcomes. +## Result Artifacts +- Results file: `/home/cirrus/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-results.md` +- Summary file: `/home/cirrus/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-summary.md` +- Result artifacts under `tmp/` are local run records only and must not be committed. +- Always append the latest run outcome to both files for `PASS`, `FAIL`, and `PARTIAL` outcomes. - Do not leave a completed test run only in conversation; the artifact files are the source of record. -- Include the total test runtime in both artifact files for every run. +- All recorded timestamps must use UTC format: `YYYY-MM-DD HH:MM UTC`. +- Record the UTC start time when the run begins. +- Record the UTC end time when the run reaches a terminal outcome and cleanup/reporting is complete. +- Compute `Test duration` from the recorded start/end timestamps and include it in both files. - If a run is still in progress when first recorded, update the runtime once the run reaches its terminal outcome. +- Use the `Per-Host Test Result Record` format for the results file. Summary file requirements: - Start the file with the test file name line: `Test file: cmc-upgrade-kernel-test.md` - Title: `CMC Upgrade Kernel Test Summary` - Include test start time, test end time, and total test duration for the run -- Include a short workflow summary (current kernel -> install CMC -> kernel upgrade -> uninstall CMC -> kernel upgrade -> install CMC) +- Include a short run summary (current kernel -> first CMC install phase -> kernel upgrade -> CMC uninstall -> kernel upgrade -> second CMC install phase) - Include host tested, kernel progression (start, step-up, latest), and overall result - Start each run section with a `##` heading that includes the OS family and the final outcome, for example: `## Amazon Linux 2023 - PASS`. - Put the OS version and the rest of the run details under that heading so the heading stays the visible OS label above the test snippet. - -### Duration Rule -- Record the UTC start time when the run begins. -- Record the UTC end time when the run reaches a terminal outcome and cleanup/reporting is complete. -- Compute `Test duration` from the recorded start/end timestamps. - Backfill `Test duration` into the summary and results artifacts for any run where both timestamps are known.