Files
cds-ai/atvm/atvm-automation-guide.md
anthony.wen eb66f93432 Update ATVM status reporting to use longer inline failure descriptions
- show failed machines with a longer failure description on the same status line

- keep Notes for broader context beyond the machine-specific failure reason

- update the ATVM automation guide and AGENTS rules to match

- record the reporting preference in atvm-automation-runs.md
2026-03-12 21:13:54 -04:00

9.7 KiB

Run ATVM Automation Guide

This file is guide-only documentation for operating ATVM CMC automation. Do not put specific run examples here. For reusable command examples and common option combinations, use atvm-automation-examples.md. Treat atvm-automation-examples.md as reference-only. Do not assume the operator wants the extra options shown in examples unless they explicitly request them.

Purpose

Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.

ATVM Cypress Automation Controller Client

  • Hostname: atvm-cypres-vm-1
  • IP: 192.168.3.190
  • Credentials: root / atvmcdsi2012

Operating Constraints

  • Run only scripts/commands explicitly requested.
  • Do not make manual system configuration changes on the client.
  • Do not edit client files unless explicitly requested.

Operator Preferences

  • Do not include Gold Disk identifiers in --build_name.
  • --build_name must not contain spaces; use - between words.
  • For multiple VMs in same distro, use distro-scoped filtering (--containsVm) instead of long explicit VM lists.
  • Before preparing a new run, always check whether automation is already running.
  • Always report whether automation is currently running.
  • If running, ask whether to terminate; terminate only with explicit approval.
  • After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
  • Before any run, always show exact planned command(s) and wait for explicit approval.
  • Execute only after explicit approval (for example approve).
  • After execution, report immediate success/failure only.
  • Do not actively monitor completion unless explicitly requested.
  • If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
  • Report command errors immediately.
  • sshpass may be used where password-based SSH automation is required.

Core Scripts

  • Template prep: /root/cdc-e2e-cyp-12.17.4/cmc-templates.py
  • Test execution: ./run-sorry-cypress.py

Typical sequence:

  1. Run cmc-templates.py with requested template/options.
  2. Run run-sorry-cypress.py with matching config and build name.

Config File / Gold Disk Mapping

  • cypress.atvm-config-gold.ts -> Gold Disk 1
  • cypress.atvm-config-gold-2.ts -> Gold Disk 2
  • Additional numbered config variants map to corresponding Gold Disks.
  • Do not default to cypress.atvm-config.ts.
  • Unless the operator explicitly requests another config, use a config file with gold in the filename.

Available Templates

  • cmc-e2e
  • cmc-group-consistency
  • cmc-h2h-diff-platf
  • cmc-h2h-same-platf
  • cmc-migrateops
  • cmc-migrateops-compute-migration
  • cmc-reboot
  • cmc-systemOS

Command Pattern

python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces>

Examples Reference

  • Commonly used command examples: atvm-automation-examples.md
  • Keep this guide focused on run-control rules and workflow constraints.
  • Use examples as reference material only, not as default intent for new operator requests.
  • Keep atvm-automation-examples.md limited to reusable example commands; keep workflow rules, defaults, blacklist policy, and reporting rules in this guide or atvm-automation-runs.md.

Example Option Patterns (Guide-Only)

  • Distro-scoped VM selection:
    • --containsVm redhat
    • --containsVm redhat9
  • Explicit VM selection:
    • --specify_vms <vm1> <vm2> ...
  • Compute migrateops platform:
    • --vm_platforms vmware|ovirt|openshift|proxmox

Blacklisted Machines

Always exclude these machines from ATVM automation runs by adding them to --exclude_partial_match.

Permanently blacklisted because CMC cannot compile:

  • atvm6-centos6.0
  • atvm41-redhat6.0
  • atvm73-oracle6.0

Temporarily blacklisted because the run crashes when creating a migration session:

  • atvm144-suse15.0

Temporarily blacklisted while support requests are waiting:

  • atvm113-debian9.0.0
  • atvm115-debian9.1.0
  • atvm116-debian9.2.0
  • atvm156-debian9.3.0

Temporarily blacklisted until re-created:

  • atvm157-debian13.0.0

Preferred exclude list:

  • --exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0

Running-Automation Check (Mandatory)

Before any new automation request:

  1. SSH to root@192.168.3.190.
  2. Check for active automation processes (for example run-sorry-cypress.py, cmc-templates.py, and related Cypress runners).
  3. Report:
    • Running with process details, or
    • Not running.
  4. If Running, ask operator whether to terminate.
  5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
  6. If termination is not approved, do not start a new run.

Approval Workflow (Mandatory)

  1. Build exact command(s) for the request.
  2. Present them verbatim as planned commands.
  3. Wait for explicit approval.
  4. Run only approved command(s), no extra options.
  5. If monitoring was not requested, report immediate success/failure for each command.
  6. If monitoring was requested, keep monitoring until completion and report final outcome.

Requested Test Style

When asked for one VM or a VM set:

  • choose requested template/options,
  • choose correct config file for intended Gold Disk,
  • default to a config filename containing gold unless the operator explicitly says otherwise,
  • use a descriptive --build_name without Gold Disk IDs.

Update Rule

  • After each run, update this guide only for workflow/rule/default changes.
  • Update atvm-automation-examples.md for reusable command/option examples.
  • Add run-specific learnings only to atvm-automation-runs.md when the run produced new information.

Monitoring Policy

  • Monitor only when the operator explicitly asks to monitor.
  • If monitoring was not requested, run commands and report execution success/failure and any errors.
  • If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.

Status Reporting Format

When the operator asks for the status of an ATVM automation run, report in this order:

  1. Heading/title using the run build_name.
  2. Completed machines with machine name first and status second for each machine.
  3. Notes.
  4. Skipped machines with reason.
  5. Remaining machines still to run.
  6. Summary counts for finished, passed, failed, and skipped machines.
  7. Timing details:
    • start time
    • end time if complete
    • total run time if complete, or elapsed run time if still running
    • quickest completed test runtime
    • longest completed test runtime
    • average completed test runtime
  8. Estimated completion time.

Status-report expectations:

  • Use the same display layout for every ATVM automation status response regardless of test type (e2e, systemOS, reboot, migrateops, and others).
  • Use the live automation VM state when available.
  • Derive the heading/title from the run build_name when available.
  • Format every machine entry as machine-name - STATUS.
  • Put each machine on its own line; never combine multiple machines into one paragraph or comma-separated line.
  • Use a separate Notes section for failure reasons, anomalies, or operator-relevant context rather than cramming those details into the completed-machine list.
  • For categorized runs, reconstruct the whole run across all category batches; do not treat the current live category batch as the full run scope.
  • Derive completed-machine status from completed spec results already written during the same run.
  • Parse all same-run test-result-*.xml files, not only machine-named test-result-atvm*.xml files.
  • When XML filenames are hash-named, extract the machine name from XML contents such as testsuite file=, testsuite name=, or testcase name=.
  • Ignore check-xml-files.ts XML outputs when counting machine completion because they are bookkeeping steps, not machine runs.
  • When multiple same-run XML files exist for one machine, use the most recently written XML for that machine.
  • Include the run start time in every status response when it can be derived from the run log.
  • If the run is complete, include the end time and total run time.
  • If the run is still active, include the elapsed run time so far.
  • Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
  • Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
  • For skipped machines, include the reason category:
    • BLACKLISTED: CMC INSTALL - CAN'T COMPILE
    • BLACKLISTED: SUPPORT REQUEST - WAITING
    • BLACKLISTED: RE-CREATE NEEDED
  • If a machine is currently in progress, show it under remaining machines as RUNNING.
  • If a machine has not started yet, show it under remaining machines as NOT STARTED.
  • If no failures are present in completed spec results, report those completed machines as PASS.
  • If a completed spec result shows a failure, report that machine as FAIL in the completed list and append a longer same-line failure description when the extra detail is useful to the operator.
  • Use Notes for extra context beyond the machine-specific same-line failure description.
  • Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
  • Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.