Files
cds-ai/atvm/atvm-automation-guide.md
anthony.wen c18b42549c Update ATVM blacklist and enforce sequential runner launch
Adjust the maintained ATVM blacklist so atvm156-debian9.3.0 uses the reason RE-CREATE MIGHT BE NEEDED and remove atvm157-debian13.0.0 from blacklist entries and reusable exclude examples. Also clarify the ATVM automation workflow so cmc-templates.py must finish successfully before run-sorry-cypress.py is started.
2026-03-20 14:45:05 -04:00

12 KiB

Run ATVM Automation Guide

This file is guide-only documentation for operating ATVM CMC automation. Do not put specific run examples here. For reusable command examples and common option combinations, use atvm-automation-examples.md. Treat atvm-automation-examples.md as reference-only. Do not assume the operator wants the extra options shown in examples unless they explicitly request them.

Purpose

Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.

ATVM Cypress Automation Controller Client

  • Hostname: atvm-cypres-vm-1
  • IP: 192.168.3.190
  • Credentials: root / atvmcdsi2012

Operating Constraints

  • Run only scripts/commands explicitly requested.
  • Do not make manual system configuration changes on the client.
  • Do not edit client files unless explicitly requested.

Operator Preferences

  • Do not include Gold Disk identifiers in --build_name.
  • --build_name must not contain spaces; use - between words.
  • For multiple VMs in same distro, use distro-scoped filtering (--containsVm) instead of long explicit VM lists.
  • Always include --ignore_force_shutdown on cmc-templates.py commands unless the operator explicitly asks not to.
  • Default to --use_specified_plugin iscsi unless the operator explicitly requests a different plugin.
  • Before preparing a new run, always check whether automation is already running.
  • Always report whether automation is currently running.
  • If running, ask whether to terminate; terminate only with explicit approval.
  • After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
  • Before any run, always show exact planned command(s) and wait for explicit approval.
  • Execute only after explicit approval (for example approve).
  • After execution, report immediate success/failure only.
  • Do not actively monitor completion unless explicitly requested.
  • If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
  • Report command errors immediately.
  • sshpass may be used where password-based SSH automation is required.

Core Scripts

  • Template prep: /root/cdc-e2e-cyp-12.17.4/cmc-templates.py
  • Test execution: ./run-sorry-cypress.py

Typical sequence:

  1. Run cmc-templates.py with requested template/options.
  2. Wait for cmc-templates.py to fully finish and confirm success.
  3. Run run-sorry-cypress.py with matching config and build name.

Config File / Gold Disk Mapping

  • cypress.atvm-config-gold.ts -> Gold Disk 1
  • cypress.atvm-config-gold-2.ts -> Gold Disk 2
  • Additional numbered config variants map to corresponding Gold Disks.
  • Do not default to cypress.atvm-config.ts.
  • Unless the operator explicitly requests another config, use a config file with gold in the filename.
  • If the operator-specified config file is missing, stop immediately and report the missing file.
  • Do not search for substitute ATVM config files and do not switch to another config unless the operator explicitly instructs it.

Available Templates

  • cmc-e2e
  • cmc-group-consistency
  • cmc-h2h-diff-platf
  • cmc-h2h-same-platf
  • cmc-migrateops
  • cmc-migrateops-compute-migration
  • cmc-reboot
  • cmc-systemOS

Command Pattern

python3 cmc-templates.py --template <template> --ignore_force_shutdown --config_file_path ./<config-file> --use_specified_plugin iscsi [template options or explicit plugin override...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces> [--categorize]

Examples Reference

  • Commonly used command examples: atvm-automation-examples.md
  • Keep this guide focused on run-control rules and workflow constraints.
  • Use examples as reference material only, not as default intent for new operator requests.
  • Keep atvm-automation-examples.md limited to reusable example commands; keep workflow rules, defaults, blacklist policy, and reporting rules in this guide or atvm-automation-runs.md.

Example Option Patterns (Guide-Only)

  • Distro-scoped VM selection:
    • --containsVm redhat
    • --containsVm redhat9
  • Explicit VM selection:
    • --specify_vms <vm1> <vm2> ...
  • Compute migrateops platform:
    • --vm_platforms vmware|ovirt|openshift|proxmox

Blacklisted Machines

Always exclude these machines from ATVM automation runs by adding them to --exclude_partial_match.

Permanently blacklisted because CMC cannot compile:

  • atvm6-centos6.0
  • atvm41-redhat6.0
  • atvm73-oracle6.0

Temporarily blacklisted because the run crashes when creating a migration session:

  • atvm144-suse15.0

Temporarily blacklisted while support requests are waiting:

  • atvm113-debian9.0.0
  • atvm115-debian9.1.0
  • atvm116-debian9.2.0

Temporarily blacklisted because re-creation might be needed:

  • atvm156-debian9.3.0

Preferred exclude list:

  • --exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0

Running-Automation Check (Mandatory)

Before any new automation request:

  1. SSH to root@192.168.3.190.
  2. Check for active automation processes (for example run-sorry-cypress.py, cmc-templates.py, and related Cypress runners).
  3. Report:
    • Running with process details, or
    • Not running.
  4. If Running, ask operator whether to terminate.
  5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
  6. If termination is not approved, do not start a new run.

Approval Workflow (Mandatory)

  1. Build exact command(s) for the request.
  2. Present them verbatim as planned commands.
  3. Wait for explicit approval.
  4. Run only approved command(s), no extra options.
  5. When both template generation and the Cypress runner are requested, run them sequentially, not in parallel.
  6. Do not launch run-sorry-cypress.py until cmc-templates.py has exited successfully and finished updating the intended config/spec files.
  7. If monitoring was not requested, report immediate success/failure for each command.
  8. If monitoring was requested, keep monitoring until completion and report final outcome.

Requested Test Style

When asked for one VM or a VM set:

  • choose requested template/options,
  • choose correct config file for intended Gold Disk,
  • default to a config filename containing gold unless the operator explicitly says otherwise,
  • always include --ignore_force_shutdown on the template-generation command unless the operator explicitly overrides that default,
  • default to --use_specified_plugin iscsi unless the operator explicitly requests another plugin or the template does not use plugin selection,
  • use a descriptive --build_name without Gold Disk IDs.

Update Rule

  • After each run, update this guide only for workflow/rule/default changes.
  • Update atvm-automation-examples.md for reusable command/option examples.
  • Add run-specific learnings only to atvm-automation-runs.md when the run produced new information.

Monitoring Policy

  • Monitor only when the operator explicitly asks to monitor.
  • If monitoring was not requested, run commands and report execution success/failure and any errors.
  • If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.

Status Reporting Format

When the operator asks for the status of an ATVM automation run, report in this order:

  1. Heading/title using the run build_name.
  2. Completed machines with machine name first and status second for each machine.
  3. Notes.
  4. Skipped machines with reason.
  5. Remaining machines still to run.
  6. Summary counts for finished, passed, failed, and skipped machines.
  7. Timing details:
    • start time
    • end time if complete
    • total run time if complete, or elapsed run time if still running
    • quickest completed test runtime
    • longest completed test runtime
    • average completed test runtime
  8. Estimated completion time.

Status-report expectations:

  • Use the same display layout for every ATVM automation status response regardless of test type (e2e, systemOS, reboot, migrateops, and others).
  • Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at 192.168.3.190, not to Cirrus project operations such as the atvm - cypress project.
  • Treat a status request as a request for live status by default.
  • Use the live automation VM state when available.
  • If no automation is currently running, fall back to the most recent historical run artifacts and logs.
  • Prefer local automation evidence in this order: active runner processes, live automation-VM files, shell history for the last launch command, then historical reporter artifacts.
  • Derive the heading/title from the run build_name when available.
  • Format every machine entry as machine-name - STATUS.
  • Put each machine on its own line; never combine multiple machines into one paragraph or comma-separated line.
  • Use a separate Notes section for failure reasons, anomalies, or operator-relevant context rather than cramming those details into the completed-machine list.
  • For categorized runs, reconstruct the whole run across all category batches; do not treat the current live category batch as the full run scope.
  • For categorized runs with no active automation, reconstruct the status from the full historical run across all category batches, not only the most recent category batch.
  • Always report the status of the entire requested run, even when the runner split execution into multiple category batches or cloud sub-runs.
  • Derive completed-machine status from completed spec results already written during the same run.
  • Parse all same-run test-result-*.xml files, not only machine-named test-result-atvm*.xml files.
  • When XML filenames are hash-named, extract the machine name from XML contents such as testsuite file=, testsuite name=, or testcase name=.
  • Ignore check-xml-files.ts XML outputs when counting machine completion because they are bookkeeping steps, not machine runs.
  • When multiple same-run XML files exist for one machine, use the most recently written XML for that machine.
  • Include the run start time in every status response when it can be derived from the run log.
  • If the run is complete, include the end time and total run time.
  • If the run is still active, include the elapsed run time so far.
  • Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
  • Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
  • For skipped machines, include the reason category:
    • BLACKLISTED: CMC INSTALL - CAN'T COMPILE
    • BLACKLISTED: SUPPORT REQUEST - WAITING
    • BLACKLISTED: RE-CREATE MIGHT BE NEEDED
    • BLACKLISTED: RE-CREATE NEEDED
  • If a machine is currently in progress, show it under remaining machines as RUNNING.
  • If a machine has not started yet, show it under remaining machines as NOT STARTED.
  • If no failures are present in completed spec results, report those completed machines as PASS.
  • If a completed spec result shows a failure, report that machine as FAIL in the completed list and append a longer same-line failure description when the extra detail is useful to the operator.
  • Use Notes for extra context beyond the machine-specific same-line failure description.
  • Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
  • Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.