Files

anthony.wen 9024d5cadb Trim internal fallback details from ATVM notes

2026-03-27 16:12:14 -04:00

20 KiB

Raw Blame History

Run ATVM Automation Guide

This file is guide-only documentation for operating ATVM CMC automation. Do not put specific run examples here. For reusable command examples and common option combinations, use examples.md. Treat examples.md as reference-only. Do not assume the operator wants the extra options shown in examples unless they explicitly request them.

Purpose

Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.

ATVM Cypress Automation Controller Client

Hostname: atvm-cypres-vm-1
IP: 192.168.3.190
Credentials: source /home/aw/code/cds/.env.credentials.local and use ATVM_CONTROLLER_USER plus ATVM_CONTROLLER_PASSWORD

ATVM Target Host Default

Treat 192.168.3.191 as the default ATVM target host reference.
For SSH to 192.168.3.191, ignore host key mismatch by default with -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null.
For Linux SSH access to 192.168.3.191, source /home/aw/code/cds/.env.credentials.local and use ATVM_TARGET_USER plus ATVM_TARGET_PASSWORD unless the operator explicitly overrides them.
ATVM_LINUX_TARGET_HOST, ATVM_LINUX_TARGET_USER, and ATVM_LINUX_TARGET_PASSWORD mirror the Linux default values when an OS-specific reference is clearer.
For Windows guest access to 192.168.3.191, source /home/aw/code/cds/.env.credentials.local and use ATVM_WINDOWS_TARGET_USER plus ATVM_WINDOWS_TARGET_PASSWORD unless the operator explicitly overrides them.

Operating Constraints

Run only scripts/commands explicitly requested.
Do not make manual system configuration changes on the client.
Do not edit client files unless explicitly requested.

Operator Preferences

Do not include Gold Disk identifiers in --build_name.
--build_name must not contain spaces; use - between words.
For multiple VMs in same distro, use distro-scoped filtering (--containsVm) instead of long explicit VM lists.
Always include --ignore_force_shutdown on cmc-templates.py commands unless the operator explicitly asks not to.
Default to --use_specified_plugin iscsi unless the operator explicitly requests a different plugin.
Before preparing a new run, always check whether automation is already running.
Always report whether automation is currently running.
If running, ask whether to terminate; terminate only with explicit approval.
After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
Before any run, always show exact planned command(s) exactly as they will be executed and wait for explicit approval.
Never execute cmc-templates.py, run-sorry-cypress.py, or any other ATVM run command until the operator explicitly approves the displayed command(s).
Approval is required even for preparation-only steps such as template generation.
If the operator changes any part of the request after commands are displayed, rebuild the commands, show the updated commands, and wait for fresh approval before executing anything.
Execute ATVM run commands only after explicit approval.
Treat approve as approval to run and also start the per-run watcher service for that build.
Treat approve without watcher as approval to execute the ATVM run without starting the watcher.
When --categorize is used with watcher enabled, treat the watcher as a sequential grouped-run watcher:
- it must post one final Mattermost status per completed categorized group/sub-run
- it must stay active between grouped sub-runs while the parent categorized request is still running
- it must not stop after the first grouped run simply because one grouped run completed
- if the child build id label does not match the actual host/spec being executed, report the grouped run using the inferred host-based group instead of the raw child build id label
- it must not wait and replace those with one single parent-only post
After execution, report immediate success/failure only.
Do not include expected, harmless systemctl reset-failed ... unit not loaded output in routine run-start confirmations.
Mention reset-failed output only when it prevents watcher startup or becomes relevant to debugging.
Do not actively monitor completion unless explicitly requested.
If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
Report command errors immediately.
sshpass may be used where password-based SSH automation is required.

Core Scripts

Template prep: /root/cdc-e2e-cyp-12.17.4/cmc-templates.py
Test execution: ./run-sorry-cypress.py
Detailed host-level test artifacts: /root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter

Detailed Test Artifacts

Use /root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter on the automation controller for detailed per-host test evidence.
Reporter subdirectories of interest:
- logs/
  - per-host text and JSON logs for the executed tests
- xml/
  - machine result XML files and the final check-xml-files.ts bookkeeping output
- mochawesome/
  - per-run HTML reports
When a machine fails, use the matching logs/ entry first to capture the detailed failure context for that host.
When reconstructing historical status, prefer cmcReporter artifacts over less-specific runner output because they preserve per-host results after the live run has ended.

Typical sequence:

Build the exact cmc-templates.py and run-sorry-cypress.py commands for the request.
Show those exact commands to the operator.
Wait for explicit approval.
Run cmc-templates.py with the approved options.
Wait for cmc-templates.py to fully finish and confirm success.
Verify the generated .ts files and the config specPattern include every requested VM before starting the runner.
If the watcher is approved, start the watcher before launching run-sorry-cypress.py.
Run run-sorry-cypress.py with the matching approved config and build name.

Config File / Gold Disk Mapping

cypress.atvm-config-gold.ts -> Gold Disk 1
cypress.atvm-config-gold-2.ts -> Gold Disk 2
Additional numbered config variants map to corresponding Gold Disks.
Do not default to cypress.atvm-config.ts.
Unless the operator explicitly requests another config, use a config file with gold in the filename.
If the operator-specified config file is missing, stop immediately and report the missing file.
Do not search for substitute ATVM config files and do not switch to another config unless the operator explicitly instructs it.

Available Templates

cmc-e2e
cmc-group-consistency
cmc-h2h-diff-platf
cmc-h2h-same-platf
cmc-migrateops
cmc-migrateops-compute-migration
cmc-reboot
cmc-systemOS

Command Pattern

python3 cmc-templates.py --template <template> --ignore_force_shutdown --config_file_path ./<config-file> --use_specified_plugin iscsi [template options or explicit plugin override...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces> [--categorize]

Examples Reference

Commonly used command examples: examples.md
Keep this guide focused on run-control rules and workflow constraints.
Use examples as reference material only, not as default intent for new operator requests.
Keep examples.md limited to reusable example commands; keep workflow rules, defaults, blacklist policy, and reporting rules in this guide or run-learnings.md.

Example Option Patterns (Guide-Only)

Distro-scoped VM selection:
- --containsVm redhat
- --containsVm redhat9
Explicit VM selection:
- --specify_vms <vm1> <vm2> ...
Compute migrateops platform:
- --vm_platforms vmware|ovirt|openshift|proxmox

Blacklisted Machines

Always exclude these machines from broad-scope ATVM automation runs by adding them to --exclude_partial_match. If the operator explicitly targets one or more named VMs with --specify_vms, do not add the maintained --exclude_partial_match list unless the operator also explicitly asks for it. Even for explicit --specify_vms requests, first check whether any requested VM is on the maintained blacklist and stop instead of launching the run if one is included.

Permanently blacklisted because CMC cannot compile:

atvm6-centos6.0
atvm41-redhat6.0
atvm73-oracle6.0

Temporarily blacklisted because the run crashes when creating a migration session:

atvm144-suse15.0

Temporarily blacklisted while support requests are waiting:

atvm113-debian9.0.0
atvm115-debian9.1.0
atvm116-debian9.2.0

Temporarily blacklisted because re-creation might be needed:

atvm156-debian9.3.0

Preferred exclude list:

--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0

Running-Automation Check (Mandatory)

Before any new automation request:

SSH to root@192.168.3.190.
Check for active automation processes (for example run-sorry-cypress.py, cmc-templates.py, and related Cypress runners).
Report:
- Running with process details, or
- Not running.
If Running, ask operator whether to terminate.
If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
If termination is not approved, do not start a new run.

Approval Workflow (Mandatory)

Build exact command(s) for the request.
Present them verbatim as planned commands before running anything.
Wait for explicit approval.
When the watcher is available, present the watcher-start command separately from the core run commands.
Treat approve as approval to execute the ATVM run and start the watcher for that build.
Treat approve without watcher as approval to execute the ATVM run without starting the watcher.
If the run uses --categorize and the watcher is requested, include --categorize on the watcher start command too so the watcher tracks sequential categorized sub-runs correctly.
Run only approved command(s), no extra options and no silent substitutions.
When both template generation and the Cypress runner are requested, run them sequentially, not in parallel.
Do not launch run-sorry-cypress.py until cmc-templates.py has exited successfully and finished updating the intended config/spec files.
After cmc-templates.py, always verify that the generated spec files on disk and the config specPattern both contain the full requested VM set before launching run-sorry-cypress.py.
If any requested VM is missing from the generated files or specPattern, stop and report the mismatch instead of launching the runner.
Treat displayed commands as a review gate: do not execute either command until the operator has had a chance to review them and explicitly approve.
If the operator asks to change plugin, config, filters, build name, Gold Disk, or scope after commands are shown, discard the old plan, show the revised commands, and wait for new approval.
If monitoring was not requested, report immediate success/failure for each command.
If monitoring was requested, keep monitoring until completion and report final outcome.
When the watcher is requested, launch the watcher before run-sorry-cypress.py.
Do not start the runner before the watcher, because the watcher helper clears stale /tmp/<build-name>.log and can delete the fresh live runner log if the runner starts first.

Requested Test Style

When asked for one VM or a VM set:

choose requested template/options,
choose correct config file for intended Gold Disk,
default to a config filename containing gold unless the operator explicitly says otherwise,
always include --ignore_force_shutdown on the template-generation command unless the operator explicitly overrides that default,
default to --use_specified_plugin iscsi unless the operator explicitly requests another plugin or the template does not use plugin selection,
use a descriptive --build_name without Gold Disk IDs.

Update Rule

After each run, update this guide only for workflow/rule/default changes.
Update examples.md for reusable command/option examples.
Add run-specific learnings only to run-learnings.md when the run produced new information.

Monitoring Policy

Monitor only when the operator explicitly asks to monitor.
If monitoring was not requested, run commands and report execution success/failure and any errors.
If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.

Mattermost Status Posting

Treat a normal ATVM status request as local-only output by default.
When the operator asks to send ATVM automation run status to Mattermost, use the local defaults from /home/aw/code/cds/.env.credentials.local.
Default Mattermost variables:
- MATTERMOST_ATVM_WEBHOOK
- MATTERMOST_ATVM_CHANNEL
Treat these as the default destination for ATVM automation run-status posts unless the operator explicitly overrides them.
Send the final ATVM run status only after the run has fully completed, regardless of whether the run passed or failed.
Do not send interim or in-progress ATVM run status updates to Mattermost unless the operator explicitly asks for that.
Use the same ATVM status layout that would be shown to the operator locally when posting to Mattermost.
Default status template: /home/aw/code/cds/atvm/docs/automation/status-template.md
Do not post to Mattermost unless the operator explicitly asks for the run status to be sent there.
For categorized execution with watcher enabled, send one Mattermost status per completed categorized sub-run/group after that grouped run fully finishes.

Status Reporting Format

When the operator asks for the status of an ATVM automation run, report in this order:

Heading/title using the run build_name.
SUMMARY: section with finished, passed, failed, and skipped counts.
HOSTS: section with the machine rows.
TIMING: section with start, end, total, quickest, longest, and average.
COVERAGE: section describing what the run was intended to cover, excluding the target-host list.
TEST FLOW: section describing the template-specific numbered run flow for the test.
NOTES: section for broader context and anomalies.
Remaining machines still to run.
Summary counts for finished, passed, failed, and skipped machines.
Timing details:
- start time
- end time if complete
- total run time if complete, or elapsed run time if still running
- quickest completed test runtime
- longest completed test runtime
- average completed test runtime
Estimated completion time.

Status-report expectations:

Use the same display layout for every ATVM automation status response regardless of test type (e2e, systemOS, reboot, migrateops, and others).
Use /home/aw/code/cds/atvm/docs/automation/status-template.md as the default template for both local status output and Mattermost status posts.
The default ATVM status template uses flat bullet-list sections for COVERAGE: and TEST FLOW:, Markdown tables for SUMMARY:, HOSTS:, and TIMING:, and uses NOTES: for flat operator-facing notes.
Order the status sections as SUMMARY:, HOSTS:, TIMING:, COVERAGE:, TEST FLOW:, then NOTES:.
Keep NOTES: focused on operator-facing value such as the Currents run URL, real anomalies, failure context, or material fallback behavior.
Do not include generic watcher bookkeeping messages in NOTES: such as artifact-detection confirmations.
Do not include internal watcher fallback notes in NOTES: such as check-xml-files.ts validation confirmations or reporter-artifact recovery details.
The HOSTS: table includes Host, Kernel, Status, and Detail columns in that order.
In COVERAGE:, describe the template, datastore/config family, migration style, and plugin/integration path, but do not list target hosts there.
In TEST FLOW:, show the template-specific numbered run flow once for the whole test, not per host.
Resolve the flow from the run template name.
cmc-e2e currently uses the 22-step migration flow documented in /home/aw/code/cds/atvm/docs/automation/status-template.md.
For the Kernel column, cross-reference the host name against /home/aw/code/cds/atvm/inventory/vm-inventory.md.
If the hostname is not present in vm-inventory.md, report the kernel value as unknown.
Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at 192.168.3.190, not to Cirrus project operations such as the atvm - cypress project.
Treat a status request as a request for live status by default.
Unless the operator explicitly asks to send the status to Mattermost, print the status only in the local terminal response.
Use the live automation VM state when available.
If no automation is currently running, fall back to the most recent historical run artifacts and logs.
Prefer local automation evidence in this order: active runner processes, live automation-VM files, shell history for the last launch command, then historical reporter artifacts.
For detailed machine-level failure information, use /root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/ on the automation VM.
Derive the heading/title from the run build_name when available.
Format every machine entry as machine-name - STATUS.
Put each machine on its own line; never combine multiple machines into one paragraph or comma-separated line.
Use a separate Notes section for failure reasons, anomalies, or operator-relevant context rather than cramming those details into the completed-machine list.
For categorized runs, reconstruct the whole run across all category batches; do not treat the current live category batch as the full run scope.
For categorized runs with no active automation, reconstruct the status from the full historical run across all category batches, not only the most recent category batch.
Always report the status of the entire requested run, even when the runner split execution into multiple category batches or cloud sub-runs.
Derive completed-machine status from completed spec results already written during the same run.
Parse all same-run test-result-*.xml files, not only machine-named test-result-atvm*.xml files.
When XML filenames are hash-named, extract the machine name from XML contents such as testsuite file=, testsuite name=, or testcase name=.
Ignore check-xml-files.ts XML outputs when counting machine completion because they are bookkeeping steps, not machine runs.
When multiple same-run XML files exist for one machine, use the most recently written XML for that machine.
Include the run start time in every status response when it can be derived from the run log.
If the run is complete, include the end time and total run time.
If the run is still active, include the elapsed run time so far.
Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
For skipped machines, include the reason category:
- BLACKLISTED: CMC INSTALL - CAN'T COMPILE
- BLACKLISTED: SUPPORT REQUEST - WAITING
- BLACKLISTED: RE-CREATE MIGHT BE NEEDED
- BLACKLISTED: RE-CREATE NEEDED
If a machine is currently in progress, show it under remaining machines as RUNNING.
If a machine has not started yet, show it under remaining machines as NOT STARTED.
If no failures are present in completed spec results, report those completed machines as PASS.
If a completed spec result shows a failure, report that machine as FAIL in the completed list and append a longer same-line failure description when the extra detail is useful to the operator.
Use Notes for extra context beyond the machine-specific same-line failure description.
Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.
When the operator also asks to send the status to Mattermost, send this same final status output to the configured Mattermost destination only after the run has fully completed.

20 KiB Raw Blame History