356 lines
28 KiB
Markdown
356 lines
28 KiB
Markdown
# CMC Upgrade Kernel Test Template
|
|
|
|
## Purpose
|
|
Validate CMC behavior across staged kernel upgrades on a cloned VM, including reinstall, migration health, service health, and cleanup.
|
|
|
|
## Scope
|
|
- Run per source host provided by operator.
|
|
- Work only on the cloned VM created for this test.
|
|
- If the operator asks to run `tests/cmc-upgrade-kernel-test.md` or any variation of the "cmc upgrade kernel test" for an ATVM host, treat that request as referring to this file and this workflow only.
|
|
- Treat this file as the source of truth for this test and ignore unrelated workflow references unless the operator explicitly asks to incorporate them for the current request.
|
|
|
|
## Inputs
|
|
- Source VM hostname: `<atvmxxx-...>`
|
|
- vCenter target/source location: `<cluster/datastore/folder>`
|
|
- Required clone datastore: `AutomatedTest-UnitTesting`
|
|
- Default clone ESXi host: `CDS1-ESX165` / `192.168.1.165` unless the operator explicitly specifies otherwise
|
|
- Initial clone access host/IP: `<INITIAL_CLONE_HOST_OR_IP>`
|
|
- SSH username variable: `<SSH_USER_VAR>`
|
|
- SSH password variable: `<SSH_PASSWORD_VAR>`
|
|
- Cirrus profile/project: `gcstage` / `skidamarink`
|
|
|
|
## Credential Source
|
|
- Use credentials from: `/home/cirrus/cds/.env.credentials.local`
|
|
- Do not hardcode usernames/passwords in test records or commands.
|
|
- Before any vCenter, SSH, Red Hat subscription, or CMC action, load credentials with `set -a; source /home/cirrus/cds/.env.credentials.local; set +a`.
|
|
- Verify required credential variable names are present without printing secret values.
|
|
- Do not parse the credential file with `grep`/`awk` as the authority; source it and inspect the environment because entries may use `export KEY=...`.
|
|
|
|
## CMC Tooling Rule (Global)
|
|
- For all CMC-related actions in this test, use the `cirrusdata` skill/CLI path.
|
|
- Exception: offline-host cleanup is not handled by that skill yet; use the MCP connection for offline-host removal.
|
|
- Apply this rule to every relevant step in this procedure.
|
|
- For every CMC install/reinstall command in this test, always include installer option: `-no-prebuilt-mtdi-nexus`.
|
|
|
|
## Kernel Package Matching Rule (Global)
|
|
- For every planned kernel upgrade, verify matching development/header packages are available for the exact target kernel version before installing that kernel.
|
|
- On Red Hat-family systems, verify `kernel-devel-<target>` and `kernel-headers-<target>` availability (or documented distro-equivalent package names where applicable).
|
|
- The first kernel upgrade attempt must not use the latest kernel in the filtered candidate list; reserve the latest kernel for the final kernel-upgrade stage.
|
|
- When upgrading kernel versions, also upgrade/install the matching development/header packages for that same version.
|
|
- On Red Hat-family systems that use `grubby` (including Oracle Linux), explicitly set the selected kernel as the default before rebooting, then verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path. If the default does not match, stop before reboot as blocker-fail.
|
|
- After each kernel upgrade and reboot, verify running kernel version and installed dev/header package versions all match.
|
|
- If kernel and dev/header package versions are mismatched at any point, stop immediately as blocker-fail and do not continue with remediation by assumption.
|
|
- Before any kernel candidate discovery step on any distro, force a fresh package metadata refresh on the live host before evaluating available kernel builds. Use the distro's normal refresh command for the installed package manager (for example `dnf makecache`, `yum makecache`, or `zypper refresh`). For APT-based distros, use a hard APT refresh so stale or empty package-list files are rebuilt: `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update`. If the refreshed view differs from a prior result, trust the refreshed live metadata and record that the earlier view was stale.
|
|
|
|
## Red Hat Preflight (Global, Manual Tasks Only)
|
|
- Apply this section only when the test target is an actual Red Hat subscription-managed machine and the run is manually executed.
|
|
- Do not apply this section to CentOS, Oracle Linux, Rocky, Alma, or other RHEL-derived distributions unless the operator explicitly says the machine should be treated as Red Hat-managed for this run.
|
|
- If the target is not actual RHEL, skip this preflight entirely and do not attempt `subscription-manager`.
|
|
- Do not apply this section to ATVM automation runs that already handle subscription flow.
|
|
- Before running test steps on Red Hat, run:
|
|
- `subscription-manager remove --all`
|
|
- `subscription-manager unregister`
|
|
- `subscription-manager clean`
|
|
- `subscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"`
|
|
- Source credentials from `/home/cirrus/cds/.env.credentials.local`.
|
|
|
|
## SUSE Exclusion Rule (Global)
|
|
- Do not run this test against SUSE/SLES ATVM machines.
|
|
- SUSE ATVM machines use a local offline DVD/vault repository for packages.
|
|
- Kernel upgrade discovery is not valid for this test unless the machine can access official SUSE repositories, which requires a SUSE subscription.
|
|
- If the operator requests this test against any SUSE/SLES machine, stop immediately before source power-on or clone creation and report that SUSE is excluded for this test because it uses the local offline repository.
|
|
|
|
## Execution Mode (Global)
|
|
- Run this test in continuous execution mode.
|
|
- Do not pause for additional operator prompts between steps.
|
|
- Keep monitoring and continue automatically until the test reaches a terminal outcome (`PASS` or `FAIL`) and all required cleanup/reporting steps are completed.
|
|
- Only stop early if a true blocker prevents safe continuation, and still complete required cleanup/reporting before returning control.
|
|
- Time every step explicitly.
|
|
- If any single step takes longer than 10 minutes, hard stop the test and treat it as a blocker-fail.
|
|
|
|
## Naming Rule
|
|
- Base clone VM name in vCenter: `aw999-[source hostname without atvmxxx- prefix]`
|
|
- Before cloning, verify the clone VM name is not already in use.
|
|
- If already in use, append a numeric suffix to the base name: `-1`, `-2`, ... `-N` until an unused name is found.
|
|
- Use plain VM name only (no `/CDSHQ-Eng/vm/` prefix) for clone destination name, and set folder separately if needed.
|
|
- OS hostname on clone: same clone name but replace `.` with `-`
|
|
|
|
## Safety Rules
|
|
- Delete only the clone created for this test.
|
|
- If the clone is missing or identity is uncertain, stop and do not delete any other VM.
|
|
- If any blocker occurs after clone creation, stop the test and leave the cloned VM powered on for manual inspection.
|
|
- Do not delete or power off the clone on blocker-fail outcomes.
|
|
- Do not power off, delete, or otherwise tear down the clone until the final latest-kernel migration/session validation is complete and recorded. The latest-kernel reboot or reinstall is not the end of the test.
|
|
|
|
## Execution Checklist
|
|
- Treat this checklist as the run ledger for the test. Figuratively check off the items in the checklist to ensure we do and confirm each step.
|
|
- Do not skip ahead, collapse, or reorder checklist items.
|
|
- Do not begin teardown until every item below is checked complete.
|
|
- If any checklist item cannot be checked, stop the test and record the blocker.
|
|
|
|
- [ ] 0. Source `/home/cirrus/cds/.env.credentials.local` and verify required credential variables are present without printing secret values.
|
|
- [ ] 1. Confirm the requested source host is not a SUSE/SLES machine.
|
|
- [ ] 2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
|
|
- [ ] 3. Confirm source host is powered on for the inspection phase.
|
|
- [ ] 3a. Before listing available kernel builds, inspect source repository files and hard stop if any enabled/source repo points at `192.168.3.199`.
|
|
- [ ] 4. SSH to the source host and check available kernel versions on the source before cloning.
|
|
- [ ] 5. Build source-host kernel candidate list from all available versions after a fresh package metadata refresh. On Ubuntu, inspect the generic track first, then confirm candidate availability with alternate package listing methods if needed before deciding whether the generic track is usable.
|
|
- [ ] 6. Apply the candidate scope rule: same major OS family only, with same minor stream preferred.
|
|
- [ ] 7. Verify at least 2 upgrade candidates exist in the filtered candidate list.
|
|
- [ ] 8. If fewer than 2 candidates, hard stop and end run before clone creation.
|
|
- [ ] 9. Perform the gate check before continuing.
|
|
- [ ] 10. Issue the source-host power-off request and wait for `poweredOff`.
|
|
- [ ] 11. Confirm the source host is still `poweredOff` immediately before cloning.
|
|
- [ ] 12. Determine the base clone name `aw999-[source-without-atvmxxx-]`.
|
|
- [ ] 13. Check whether the base clone name already exists in vCenter.
|
|
- [ ] 14. If needed, choose the next available suffixed clone name.
|
|
- [ ] 15. Clone the source VM on `AutomatedTest-UnitTesting` and default it to `CDS1-ESX165` / `192.168.1.165` unless overridden.
|
|
- [ ] 16. Pass only the clone VM name to the clone command destination.
|
|
- [ ] 17. Detach the 2 FC PCI adapters from the cloned VM.
|
|
- [ ] 18. Verify both FC passthrough devices are no longer present on the clone.
|
|
- [ ] 19. Power on the clone.
|
|
- [ ] 20. Query vCenter guest-tools for the live clone IP.
|
|
- [ ] 21. SSH to the live clone IP using credentials from `/home/cirrus/cds/.env.credentials.local`.
|
|
- [ ] 22. Change the OS hostname to the clone name with `.` replaced by `-`.
|
|
- [ ] 23. Convert networking from static IP to DHCP.
|
|
- [ ] 24. Remove/clean static IP configuration references.
|
|
- [ ] 25. Reboot the clone.
|
|
- [ ] 26. Query vCenter guest-tools again for the new live clone IP.
|
|
- [ ] 27. SSH to the new live clone IP and verify DHCP state.
|
|
- [ ] 28. If the clone still reports the previous static IP, fix config cleanup and repeat reboot/verify.
|
|
- [ ] 29. Continue all remaining steps using the live DHCP IP from vCenter.
|
|
- [ ] 30. Wipe `/dev/sdb` once and verify no filesystem or partition signatures remain.
|
|
- [ ] 31. Reinstall CMC on the clone with `-no-prebuilt-mtdi-nexus`.
|
|
- [ ] 32. Create the first local migration from 10 GB to 11 GB.
|
|
- [ ] 33. If migration session creation fails, hard stop as blocker-fail.
|
|
- [ ] 34. Wait for initial sync completion.
|
|
- [ ] 35. Check available kernels again using full candidate listing.
|
|
- [ ] 36. Select the first-upgrade target from the filtered candidate list.
|
|
- [ ] 37. Verify matching dev/header packages are available for the first-upgrade target.
|
|
- [ ] 38. Install the first-upgrade kernel and matching dev/header packages, then reboot.
|
|
- [ ] 38a. On Red Hat-family systems with `grubby` including Oracle Linux, set the first-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
|
|
- [ ] 39. Query vCenter guest-tools again for the live clone IP after reboot.
|
|
- [ ] 40. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected first-upgrade version.
|
|
- [ ] 41. If versions do not match exactly, stop as blocker-fail.
|
|
- [ ] 42. Verify the clone is online in `skidamarink` using `cirrusdata`.
|
|
- [ ] 43. SSH to the clone and verify MTDI and Galaxy Migrate services/driver are up.
|
|
- [ ] 44. Write sample data to the source 10 GB disk.
|
|
- [ ] 45. Trigger sync and confirm tracking status using `cirrusdata`.
|
|
- [ ] 46. Uninstall CMC.
|
|
- [ ] 47. Run MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only, regardless of online/offline status.
|
|
- [ ] 48. Check available kernels again.
|
|
- [ ] 49. Select the latest-upgrade target kernel from the filtered candidate list.
|
|
- [ ] 50. Verify matching dev/header packages are available for the latest-upgrade target.
|
|
- [ ] 51. Install the latest-upgrade kernel and matching dev/header packages, then reboot.
|
|
- [ ] 51a. On Red Hat-family systems with `grubby` including Oracle Linux, set the latest-upgrade kernel as the grubby default and verify `grubby --default-kernel` returns the selected `/boot/vmlinuz-<target>` path before reboot.
|
|
- [ ] 52. Query vCenter guest-tools again for the live clone IP after reboot.
|
|
- [ ] 53. SSH to the rebooted clone and verify kernel plus dev/header package versions match the selected latest-upgrade version.
|
|
- [ ] 54. If versions do not match exactly, stop as blocker-fail.
|
|
- [ ] 55. Reinstall CMC via `cirrusdata` with `-no-prebuilt-mtdi-nexus` on the latest kernel.
|
|
- [ ] 56. Create the second local migration from 10 GB to 11 GB and wait for initial sync completion.
|
|
- [ ] 57. If migration session creation fails, hard stop as blocker-fail.
|
|
- [ ] 58. Confirm the machine is online in `skidamarink` using `cirrusdata`.
|
|
- [ ] 59. SSH and verify MTDI and Galaxy Migrate services/driver are up.
|
|
- [ ] 60. Only after steps 55-59 all pass, begin success-path cleanup.
|
|
- [ ] 61. Power off the cloned machine.
|
|
- [ ] 62. Delete the cloned VM and its disks from vCenter inventory.
|
|
- [ ] 63. Run final MCP host cleanup for `skidamarink` and remove the cloned host entry for this test clone only.
|
|
- [ ] 64. If a blocker-fail occurred after clone creation, leave the cloned VM powered on and present in inventory for manual inspection.
|
|
- [ ] 65. Append the current run to the summary and results files with the required host metadata, kernel progression, execution summary, final outcome, and total test duration.
|
|
|
|
## Test Procedure
|
|
The `Execution Checklist` above is the authoritative run ledger. Use the procedure below as the detailed action reference for each checklist item.
|
|
0. Credential gate:
|
|
- Before any MCP, vCenter, SSH, Red Hat subscription, or CMC action, load credentials with `set -a; source /home/cirrus/cds/.env.credentials.local; set +a`.
|
|
- Verify required variables are present without printing secret values: `VCENTER_HOST`, `VCENTER_USER`, `VCENTER_PASSWORD`, `ATVM_TARGET_USER`, `ATVM_TARGET_PASSWORD`, `REDHAT_SUBSCRIPTION_USER`, `REDHAT_SUBSCRIPTION_PASSWORD`, `CMC_GCSTAGE_URL`, `CMC_GCSTAGE_REGISTRATION_CODE`, and `CIRRUS_API_TOKEN`.
|
|
- If any required variable is missing, hard stop before powering on or modifying any VM.
|
|
- Do not parse the credential file with `grep`/`awk` as the authority; source it and inspect the environment because entries may use `export KEY=...`.
|
|
1. Confirm the requested source host is not a SUSE/SLES machine. If it is SUSE/SLES, hard stop immediately and do not power on, inspect, or clone the machine.
|
|
2. Remove offline hosts in `skidamarink` using MCP offline-host cleanup.
|
|
3. Confirm source host is powered on for the inspection phase. If it is powered off, power it on.
|
|
3a. Before listing available kernel builds, inspect the source host's package repository configuration files.
|
|
- Check the distro-appropriate repo sources, such as `/etc/yum.repos.d/*.repo`, `/etc/apt/sources.list`, `/etc/apt/sources.list.d/*`, `/etc/zypp/repos.d/*.repo`, and equivalent package-manager source files that exist on the machine.
|
|
- If any enabled/source repository contains `192.168.3.199`, treat the machine as using the local offline DVD/vault repository.
|
|
- Hard stop before kernel candidate discovery, fail the test, and do not clone the machine.
|
|
- Power off the source VM after inspection and confirm vCenter reports `poweredOff`.
|
|
- Append the FAIL result to both result files and explicitly note that the run was skipped because the source package repositories point at the local offline DVD/vault repository at `192.168.3.199`.
|
|
4. SSH to the source host and check available kernel versions on the source before cloning.
|
|
5. Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from `check-update`).
|
|
- On package-managed distros, first force a fresh metadata refresh using the package manager's normal refresh command, then use the distro-appropriate tooling to list all available kernel candidates before selecting a target.
|
|
- On APT-based distros, use `rm -rf /var/lib/apt/lists/* && apt-get clean && apt-get update` instead of a plain `apt update` so stale or empty package-list files are rebuilt before kernel discovery.
|
|
- On Ubuntu, inspect the generic kernel track first after the hard APT refresh, using more than one discovery method before concluding what is available. Use `apt-cache madison`, `apt list -a`, and `apt-cache policy` for `linux-image-generic` and `linux-headers-generic`; if needed, repeat those checks for `linux-generic` and the relevant HWE meta packages.
|
|
- If the refreshed generic-track view still returns no usable upgrade candidates, pause and ask the operator whether to continue with alternate Ubuntu kernel tracks.
|
|
- When prompting for alternate Ubuntu tracks, display the other available kernel candidates, including `linux-image-generic-hwe-24.04` / `linux-headers-generic-hwe-24.04` when present, and wait for explicit operator confirmation before proceeding.
|
|
6. Candidate scope rule:
|
|
- Include only kernels in the same major OS family as the current machine (no major-version upgrades).
|
|
- Prefer candidates within the same minor stream as current OS/kernel when available.
|
|
7. Verify at least 2 upgrade candidates exist in the filtered candidate list.
|
|
8. If fewer than 2 candidates: hard stop and end run before clone creation.
|
|
9. Gate check:
|
|
- If step 7 triggered a stop condition, execute no further steps.
|
|
- If no stop condition was triggered, continue with the next step.
|
|
10. After source-host inspection is complete, issue the power-off request and wait for vCenter to report the source host as `poweredOff`.
|
|
11. Confirm the source host is still `poweredOff` in vCenter immediately before cloning. Do not start the clone while the source VM is transitioning or pending power-off.
|
|
12. Determine base clone name: `aw999-[source-without-atvmxxx-]`.
|
|
13. Before cloning, check whether that clone name already exists in vCenter.
|
|
14. If the name exists, choose the next available suffixed name: `aw999-[source-without-atvmxxx-]-1`, then `-2`, then `-N` as needed.
|
|
15. Clone source VM using the resolved unique clone name on datastore `AutomatedTest-UnitTesting` only, and place the clone on `CDS1-ESX165` / `192.168.1.165` by default unless the operator explicitly specifies a different ESXi host.
|
|
16. For the clone command destination name, pass only the VM name (for example `aw999-ubuntu24.04-1`), not an inventory path like `/CDSHQ-Eng/vm/...`; set folder separately if needed.
|
|
17. Detach the 2 FC PCI adapters from the cloned VM.
|
|
18. Verify in vCenter that both FC passthrough devices are no longer present on the clone.
|
|
19. Power on clone.
|
|
20. Query vCenter guest-tools for the live clone IP.
|
|
21. SSH to the live clone IP using credentials from `/home/cirrus/cds/.env.credentials.local`.
|
|
22. Change OS hostname to clone name, replacing `.` with `-`.
|
|
23. Convert networking from static IP to DHCP.
|
|
24. Remove/clean static IP configuration references.
|
|
25. Reboot clone.
|
|
26. Query vCenter guest-tools again for the new live clone IP.
|
|
27. SSH to the new live clone IP and verify the DHCP state.
|
|
28. If the clone still reports the previous static IP, fix static config cleanup and repeat reboot/verify.
|
|
29. Continue all remaining steps using the live DHCP IP from vCenter and credentials from `/home/cirrus/cds/.env.credentials.local`.
|
|
30. Before the first CMC install, wipe the 10GB source disk with `dd if=/dev/zero of=/dev/sdb bs=1M count=32 status=progress conv=fsync`, then verify that no filesystem or partition signatures remain (`wipefs -n /dev/sdb`, `blkid /dev/sdb`, `file -s /dev/sdb`, `lsblk -f /dev/sdb`). This disk prep is one-time only and must not be repeated in later stages of the test.
|
|
31. Using `cirrusdata` (`gcstage`, project `skidamarink`), reinstall CMC on clone, always adding `-no-prebuilt-mtdi-nexus`.
|
|
32. Create local migration from 10GB source disk to 11GB destination disk using `cirrusdata`.
|
|
33. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
|
|
34. Wait for initial sync completion.
|
|
35. Check available kernels again using full candidate listing (not latest-only output). Refresh package metadata first; on APT-based distros, use the hard APT refresh command from step 5.
|
|
36. Select first-upgrade target from filtered candidate list (same major; same minor preferred), ensuring it is not the latest candidate.
|
|
37. Verify matching dev/header packages for the selected first-upgrade target are available.
|
|
38. Install selected first-upgrade kernel and matching dev/header packages, then reboot.
|
|
- On Red Hat-family systems with `grubby`, run `grubby --set-default /boot/vmlinuz-<selected-kernel>` and verify `grubby --default-kernel` returns that exact path before rebooting.
|
|
39. Query vCenter guest-tools again for the live clone IP after reboot.
|
|
40. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected first-upgrade version.
|
|
41. If versions do not match exactly, stop as blocker-fail.
|
|
42. After reboot, verify clone is online in `skidamarink` using `cirrusdata`.
|
|
43. SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
|
|
44. Write sample data to source 10GB disk.
|
|
45. Trigger sync and confirm tracking status using `cirrusdata`.
|
|
46. Uninstall CMC.
|
|
47. Post-uninstall cleanup checkpoint:
|
|
- Run MCP host cleanup for `skidamarink`.
|
|
- Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline.
|
|
48. Check available kernels. Refresh package metadata first; on APT-based distros, use the hard APT refresh command from step 5.
|
|
49. Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
|
|
50. Verify matching dev/header packages for the selected latest-upgrade target are available.
|
|
51. Install selected latest-upgrade kernel and matching dev/header packages, then reboot.
|
|
- On Red Hat-family systems with `grubby`, run `grubby --set-default /boot/vmlinuz-<selected-kernel>` and verify `grubby --default-kernel` returns that exact path before rebooting.
|
|
52. Query vCenter guest-tools again for the live clone IP after reboot.
|
|
53. SSH to the rebooted clone via the live vCenter IP and verify running kernel and installed dev/header package versions match the selected latest-upgrade version.
|
|
54. If versions do not match exactly, stop as blocker-fail.
|
|
55. Reinstall CMC via `cirrusdata` (`gcstage`, `skidamarink`), always adding `-no-prebuilt-mtdi-nexus`.
|
|
56. Create a local migration (10GB -> 11GB) via `cirrusdata` and wait for initial sync completion.
|
|
57. If migration session creation fails (including API/service errors such as 5xx), hard stop as blocker-fail.
|
|
58. Confirm machine is online in `skidamarink` using `cirrusdata`.
|
|
59. SSH and verify MTDI, Galaxy Migrate services/driver are up.
|
|
60. Only after steps 55-59 all pass, begin success-path cleanup.
|
|
61. Success-path cleanup only: power off cloned machine.
|
|
62. Success-path cleanup only: delete cloned VM and its disks from vCenter inventory.
|
|
63. Success-path final cleanup checkpoint:
|
|
- Run MCP host cleanup for `skidamarink`.
|
|
- Remove the cloned VM host entry specifically via MCP (target only this test clone host), regardless of whether CDC currently reports it as online or offline.
|
|
64. Blocker-fail path after clone creation:
|
|
- Stop test immediately after recording failure details.
|
|
- Leave cloned VM powered on and present in inventory for manual inspection.
|
|
- Do not run clone power-off/delete steps in blocker-fail path.
|
|
|
|
## Stop Conditions
|
|
- Requested source host is a SUSE/SLES machine.
|
|
- Cannot verify clone identity.
|
|
- Cannot detach required FC PCI adapters.
|
|
- Clone cannot be created on datastore `AutomatedTest-UnitTesting`.
|
|
- FC passthrough adapters remain attached after the detach/verification step.
|
|
- DHCP transition cannot be completed (clone remains static at `<INITIAL_CLONE_HOST_OR_IP>`).
|
|
- Kernel upgrade candidate criteria not met.
|
|
- Migration session creation failed (including API/service errors such as HTTP 5xx or equivalent backend unavailability).
|
|
- Any critical migration/service validation failure that blocks continuation.
|
|
|
|
## Per-Host Test Result Record
|
|
Use one cumulative results file and append one new section per tested host.
|
|
|
|
### Host Metadata
|
|
- Test start time (UTC):
|
|
- Test end time (UTC):
|
|
- Test duration:
|
|
- Operator:
|
|
- Source VM:
|
|
- Cloned VM name:
|
|
- Clone origin (vCenter path/folder/cluster):
|
|
- Final DHCP IP of clone:
|
|
|
|
### Kernel / OS Tracking
|
|
- Start OS version:
|
|
- Start kernel version:
|
|
- Kernel list before first upgrade (full candidate list, filtered by scope rule):
|
|
- Kernel selected for step-up upgrade:
|
|
- Matching dev/header packages for step-up target (availability check):
|
|
- Kernel after step-up reboot:
|
|
- Installed dev/header package versions after step-up:
|
|
- Kernel list before latest upgrade (full candidate list, filtered by scope rule):
|
|
- Kernel selected for latest upgrade:
|
|
- Matching dev/header packages for latest target (availability check):
|
|
- Kernel after latest reboot:
|
|
- Installed dev/header package versions after latest upgrade:
|
|
|
|
### Execution Summary (Short Bullets)
|
|
- Clone created / FC PCI detached: `PASS|FAIL` - notes
|
|
- Hostname/IP DHCP conversion: `PASS|FAIL` - notes
|
|
- CMC reinstall #1: `PASS|FAIL` - notes
|
|
- 10 GB source disk prep before first CMC install: `PASS|FAIL` - notes
|
|
- Local migration #1 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes
|
|
- Step-up kernel upgrade: `PASS|FAIL` - notes
|
|
- Step-up dev/header package match check: `PASS|FAIL` - notes
|
|
- Online in skidamarink after step-up: `PASS|FAIL` - notes
|
|
- MTDI/Galaxy Migrate service+driver health after step-up: `PASS|FAIL` - notes
|
|
- Write data + tracking status: `PASS|FAIL` - notes
|
|
- CMC uninstall: `PASS|FAIL` - notes
|
|
- Latest kernel upgrade: `PASS|FAIL` - notes
|
|
- Latest dev/header package match check: `PASS|FAIL` - notes
|
|
- CMC reinstall #2: `PASS|FAIL` - notes
|
|
- Local migration #2 (10GB -> 11GB) initial sync: `PASS|FAIL` - notes
|
|
- Online in skidamarink after latest upgrade: `PASS|FAIL` - notes
|
|
- MTDI/Galaxy Migrate service+driver health after latest upgrade: `PASS|FAIL` - notes
|
|
- Clone power off and deletion (success path only): `PASS|FAIL|N/A` - notes
|
|
|
|
### Final Outcome
|
|
- Overall result: `PASS|FAIL|PARTIAL`
|
|
- Outcome interpretation:
|
|
- `PASS`: full planned test flow completed and core validation goals passed (CMC install/uninstall/reinstall, kernel step-up/latest upgrade, and post-upgrade service/driver health checks), even if non-blocking warnings occurred.
|
|
- `FAIL`: a true blocker prevented completion of required validation goals.
|
|
- `PARTIAL`: use only when execution stops early by operator choice or scope is intentionally reduced, not for non-blocking warnings in a completed run.
|
|
- Blocking issue summary:
|
|
- Follow-up actions:
|
|
|
|
## Timestamp Standard
|
|
- All recorded test timestamps must use UTC.
|
|
- Format: `YYYY-MM-DD HH:MM UTC`
|
|
|
|
## Result Storage Location
|
|
Store and append all per-host results in:
|
|
- `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-results.md`
|
|
|
|
Also generate a run summary file in the same directory:
|
|
- `/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-summary.md`
|
|
|
|
## Artifact Recording Rule
|
|
- Always append the latest run outcome to the results file and summary file at the end of each run.
|
|
- Do this for `PASS`, `FAIL`, and `PARTIAL` outcomes.
|
|
- Do not leave a completed test run only in conversation; the artifact files are the source of record.
|
|
- Include the total test runtime in both artifact files for every run.
|
|
- If a run is still in progress when first recorded, update the runtime once the run reaches its terminal outcome.
|
|
|
|
Summary file requirements:
|
|
- Start the file with the test file name line: `Test file: cmc-upgrade-kernel-test.md`
|
|
- Title: `CMC Upgrade Kernel Test Summary`
|
|
- Include test start time, test end time, and total test duration for the run
|
|
- Include a short workflow summary (current kernel -> install CMC -> kernel upgrade -> uninstall CMC -> kernel upgrade -> install CMC)
|
|
- Include host tested, kernel progression (start, step-up, latest), and overall result
|
|
- Start each run section with a `##` heading that includes the OS family and the final outcome, for example: `## Amazon Linux 2023 - PASS`.
|
|
- Put the OS version and the rest of the run details under that heading so the heading stays the visible OS label above the test snippet.
|
|
|
|
### Duration Rule
|
|
- Record the UTC start time when the run begins.
|
|
- Record the UTC end time when the run reaches a terminal outcome and cleanup/reporting is complete.
|
|
- Compute `Test duration` from the recorded start/end timestamps.
|
|
- Backfill `Test duration` into the summary and results artifacts for any run where both timestamps are known.
|