9.8 KiB
9.8 KiB
CMC Upgrade Test Template
Purpose
Validate CMC behavior across staged kernel upgrades on a cloned VM, including reinstall, migration health, service health, and cleanup.
Scope
- Run per source host provided by operator.
- Work only on the cloned VM created for this test.
Inputs
- Source VM hostname:
<atvmxxx-...> - vCenter target/source location:
<cluster/datastore/folder> - Required clone datastore:
AutomatedTest-UnitTesting - Initial clone access host/IP:
<INITIAL_CLONE_HOST_OR_IP> - SSH username variable:
<SSH_USER_VAR> - SSH password variable:
<SSH_PASSWORD_VAR> - Cirrus profile/project:
gcstage/skidamarink
Credential Source
- Use credentials from:
/home/aw/code/cds/.env.credentials.local - Do not hardcode usernames/passwords in test records or commands.
CMC Tooling Rule (Global)
- For all CMC-related actions in this test, use the
cirrusdataskill/CLI path. - Exception: offline-host cleanup is not handled by that skill yet; use the MCP connection for offline-host removal.
- Apply this rule to every relevant step in this procedure.
Red Hat Preflight (Global, Manual Tasks Only)
- Apply this section when the test target is a Red Hat machine and the run is manually executed.
- Do not apply this section to ATVM automation runs that already handle subscription flow.
- Before running test steps on Red Hat, run:
subscription-manager remove --allsubscription-manager unregistersubscription-manager cleansubscription-manager register --username "$REDHAT_SUBSCRIPTION_USER" --password "$REDHAT_SUBSCRIPTION_PASSWORD"
- Source credentials from
/home/aw/code/cds/.env.credentials.local.
Execution Mode (Global)
- Run this test in continuous execution mode.
- Do not pause for additional operator prompts between steps.
- Keep monitoring and continue automatically until the test reaches a terminal outcome (
PASSorFAIL) and all required cleanup/reporting steps are completed. - Only stop early if a true blocker prevents safe continuation, and still complete required cleanup/reporting before returning control.
Naming Rule
- Base clone VM name in vCenter:
aw999-[source hostname without atvmxxx- prefix] - Before cloning, verify the clone VM name is not already in use.
- If already in use, append a numeric suffix to the base name:
-1,-2, ...-Nuntil an unused name is found. - Use plain VM name only (no
/CDSHQ-Eng/vm/prefix) for clone destination name, and set folder separately if needed. - OS hostname on clone: same clone name but replace
.with-
Safety Rules
- Delete only the clone created for this test.
- If the clone is missing or identity is uncertain, stop and do not delete any other VM.
- If kernel availability checks do not meet criteria, stop, power off clone, and remove clone/disks.
Test Procedure
- Remove offline hosts in
skidamarinkusing MCP offline-host cleanup. - Confirm source host is powered on. If it is powered off, power it on.
- SSH to the source host and check available kernel versions on the source before cloning.
- Build source-host kernel candidate list from all available versions (include intermediate versions, not just the latest from
check-update). - Candidate scope rule:
- Include only kernels in the same major OS family as the current machine (no major-version upgrades).
- Prefer candidates within the same minor stream as current OS/kernel when available.
- Verify at least 2 upgrade candidates exist in the filtered candidate list.
- If fewer than 2 candidates: hard stop and end run before clone creation.
- Gate check:
- If step 7 triggered a stop condition, execute no further steps.
- If no stop condition was triggered, continue with the next step.
- Confirm source host is powered off (required pre-clone state).
- Determine base clone name:
aw999-[source-without-atvmxxx-]. - Before cloning, check whether that clone name already exists in vCenter.
- If the name exists, choose the next available suffixed name:
aw999-[source-without-atvmxxx-]-1, then-2, then-Nas needed. - Clone source VM using the resolved unique clone name on datastore
AutomatedTest-UnitTestingonly. - For the clone command destination name, pass only the VM name (for example
aw999-ubuntu24.04-1), not an inventory path like/CDSHQ-Eng/vm/...; set folder separately if needed. - Detach the 2 FC PCI adapters from the cloned VM.
- Power on clone.
- SSH to
<INITIAL_CLONE_HOST_OR_IP>using credentials from/home/aw/code/cds/.env.credentials.local. - Change OS hostname to clone name, replacing
.with-. - Convert networking from static IP to DHCP.
- Remove/clean static IP configuration references.
- Reboot clone.
- Find DHCP address and verify it is not
<INITIAL_CLONE_HOST_OR_IP>. - If still
<INITIAL_CLONE_HOST_OR_IP>, fix static config cleanup and repeat reboot/verify. - Continue all remaining steps using DHCP IP and credentials from
/home/aw/code/cds/.env.credentials.local. - Using
cirrusdata(gcstage, projectskidamarink), reinstall CMC on clone. - Create local migration from 10GB source disk to 11GB destination disk using
cirrusdata. - Wait for initial sync completion.
- Check available kernels again using full candidate listing (not latest-only output).
- Select upgrade target one step above current kernel from the filtered candidate list (same major; same minor preferred).
- Install selected kernel and reboot.
- After reboot, verify clone is online in
skidamarinkusingcirrusdata. - SSH to clone and verify MTDI, Galaxy Migrate services/driver are up.
- Write sample data to source 10GB disk.
- Trigger sync and confirm tracking status using
cirrusdata. - Uninstall CMC.
- Post-uninstall cleanup checkpoint:
- Run MCP offline-host cleanup for
skidamarink. - If the cloned VM is still marked online after uninstall, remove that cloned VM host entry specifically.
- Check available kernels.
- Select latest-upgrade target kernel from the filtered candidate list (same major required; same minor preferred).
- Upgrade to selected latest target kernel and reboot.
- Reinstall CMC via
cirrusdata(gcstage,skidamarink). - Create a local migration (10GB -> 11GB) via
cirrusdataand wait for initial sync completion. - Confirm machine is online in
skidamarinkusingcirrusdata. - SSH and verify MTDI, Galaxy Migrate services/driver are up.
- Power off cloned machine.
- Delete cloned VM and its disks from vCenter inventory.
- Final cleanup checkpoint:
- Run MCP offline-host cleanup for
skidamarink. - If the cloned VM is still marked online at the end of the test, remove that cloned VM host entry specifically.
Stop Conditions
- Cannot verify clone identity.
- Cannot detach required FC PCI adapters.
- Clone cannot be created on datastore
AutomatedTest-UnitTesting. - DHCP transition cannot be completed (clone remains static at
<INITIAL_CLONE_HOST_OR_IP>). - Kernel upgrade candidate criteria not met.
- Any critical migration/service validation failure that blocks continuation.
Per-Host Test Result Record
Use one cumulative results file and append one new section per tested host.
Host Metadata
- Test date/time (UTC):
- Operator:
- Source VM:
- Cloned VM name:
- Clone origin (vCenter path/folder/cluster):
- Final DHCP IP of clone:
Kernel / OS Tracking
- Start OS version:
- Start kernel version:
- Kernel list before first upgrade (full candidate list, filtered by scope rule):
- Kernel selected for step-up upgrade:
- Kernel after step-up reboot:
- Kernel list before latest upgrade (full candidate list, filtered by scope rule):
- Kernel selected for latest upgrade:
- Kernel after latest reboot:
Execution Summary (Short Bullets)
- Clone created / FC PCI detached:
PASS|FAIL- notes - Hostname/IP DHCP conversion:
PASS|FAIL- notes - CMC reinstall #1:
PASS|FAIL- notes - Local migration #1 (10GB -> 11GB) initial sync:
PASS|FAIL- notes - Step-up kernel upgrade:
PASS|FAIL- notes - Online in skidamarink after step-up:
PASS|FAIL- notes - MTDI/Galaxy Migrate service+driver health after step-up:
PASS|FAIL- notes - Write data + tracking status:
PASS|FAIL- notes - CMC uninstall:
PASS|FAIL- notes - Latest kernel upgrade:
PASS|FAIL- notes - CMC reinstall #2:
PASS|FAIL- notes - Local migration #2 (10GB -> 11GB) initial sync:
PASS|FAIL- notes - Online in skidamarink after latest upgrade:
PASS|FAIL- notes - MTDI/Galaxy Migrate service+driver health after latest upgrade:
PASS|FAIL- notes - Clone power off and deletion:
PASS|FAIL- notes
Final Outcome
- Overall result:
PASS|FAIL|PARTIAL - Outcome interpretation:
PASS: full planned test flow completed and core validation goals passed (CMC install/uninstall/reinstall, kernel step-up/latest upgrade, and post-upgrade service/driver health checks), even if non-blocking warnings occurred.FAIL: a true blocker prevented completion of required validation goals.PARTIAL: use only when execution stops early by operator choice or scope is intentionally reduced, not for non-blocking warnings in a completed run.
- Blocking issue summary:
- Follow-up actions:
Timestamp Standard
- All recorded test timestamps must use UTC.
- Format:
YYYY-MM-DD HH:MM UTC
Result Storage Location
Store and append all per-host results in:
/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-results.md
Also generate a run summary file in the same directory:
/home/aw/code/cds/tmp/tests/cmc upgrade test/cmc-upgrade-kernel-test-summary.md
Summary file requirements:
- Title:
CMC Upgrade Kernel Test Summary - Include UTC date/time for the run
- Include a short workflow summary (current kernel -> install CMC -> kernel upgrade -> uninstall CMC -> kernel upgrade -> install CMC)
- Include host tested, kernel progression (start, step-up, latest), and overall result