cds-ai/atvm/atvm-automation-guide.md

# Run ATVM Automation Guide

This file is guide-only documentation for operating ATVM CMC automation.
Do not put specific run examples here.
For reusable command examples and common option combinations, use `atvm-automation-examples.md`.
Treat `atvm-automation-examples.md` as reference-only.
Do not assume the operator wants the extra options shown in examples unless they explicitly request them.

## Purpose
Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes.

## ATVM Cypress Automation Controller Client
- Hostname: `atvm-cypres-vm-1`
- IP: `192.168.3.190`
- Credentials: `root / atvmcdsi2012`

## Operating Constraints
- Run only scripts/commands explicitly requested.
- Do not make manual system configuration changes on the client.
- Do not edit client files unless explicitly requested.

## Operator Preferences
- Do not include Gold Disk identifiers in `--build_name`.
- `--build_name` must not contain spaces; use `-` between words.
- For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists.
- Before preparing a new run, always check whether automation is already running.
- Always report whether automation is currently running.
- If running, ask whether to terminate; terminate only with explicit approval.
- After termination approval, terminate first, then present planned command(s), then wait for separate execution approval.
- Before any run, always show exact planned command(s) and wait for explicit approval.
- Execute only after explicit approval (for example `approve`).
- After execution, report immediate success/failure only.
- Do not actively monitor completion unless explicitly requested.
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
- Report command errors immediately.
- `sshpass` may be used where password-based SSH automation is required.

## Core Scripts
- Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py`
- Test execution: `./run-sorry-cypress.py`

Typical sequence:
1. Run `cmc-templates.py` with requested template/options.
2. Run `run-sorry-cypress.py` with matching config and build name.

## Config File / Gold Disk Mapping
- `cypress.atvm-config-gold.ts` -> Gold Disk 1
- `cypress.atvm-config-gold-2.ts` -> Gold Disk 2
- Additional numbered config variants map to corresponding Gold Disks.
- Do not default to `cypress.atvm-config.ts`.
- Unless the operator explicitly requests another config, use a config file with `gold` in the filename.
- If the operator-specified config file is missing, stop immediately and report the missing file.
- Do not search for substitute ATVM config files and do not switch to another config unless the operator explicitly instructs it.

## Available Templates
- `cmc-e2e`
- `cmc-group-consistency`
- `cmc-h2h-diff-platf`
- `cmc-h2h-same-platf`
- `cmc-migrateops`
- `cmc-migrateops-compute-migration`
- `cmc-reboot`
- `cmc-systemOS`

## Command Pattern
```bash
python3 cmc-templates.py --template <template> --config_file_path ./<config-file> [template options...]; \
python3 ./run-sorry-cypress.py --config_file <config-file> --build_name <hyphenated-description-no-spaces> [--categorize]
```

## Examples Reference
- Commonly used command examples: `atvm-automation-examples.md`
- Keep this guide focused on run-control rules and workflow constraints.
- Use examples as reference material only, not as default intent for new operator requests.
- Keep `atvm-automation-examples.md` limited to reusable example commands; keep workflow rules, defaults, blacklist policy, and reporting rules in this guide or `atvm-automation-runs.md`.

## Example Option Patterns (Guide-Only)
- Distro-scoped VM selection:
  - `--containsVm redhat`
  - `--containsVm redhat9`
- Explicit VM selection:
  - `--specify_vms <vm1> <vm2> ...`
- Compute migrateops platform:
  - `--vm_platforms vmware|ovirt|openshift|proxmox`

## Blacklisted Machines
Always exclude these machines from ATVM automation runs by adding them to `--exclude_partial_match`.

Permanently blacklisted because CMC cannot compile:
- `atvm6-centos6.0`
- `atvm41-redhat6.0`
- `atvm73-oracle6.0`

Temporarily blacklisted because the run crashes when creating a migration session:
- `atvm144-suse15.0`

Temporarily blacklisted while support requests are waiting:
- `atvm113-debian9.0.0`
- `atvm115-debian9.1.0`
- `atvm116-debian9.2.0`
- `atvm156-debian9.3.0`

Temporarily blacklisted until re-created:
- `atvm157-debian13.0.0`

Preferred exclude list:
- `--exclude_partial_match atvm6-centos6.0 atvm41-redhat6.0 atvm73-oracle6.0 atvm144-suse15.0 atvm113-debian9.0.0 atvm115-debian9.1.0 atvm116-debian9.2.0 atvm156-debian9.3.0 atvm157-debian13.0.0`

## Running-Automation Check (Mandatory)
Before any new automation request:
1. SSH to `root@192.168.3.190`.
2. Check for active automation processes (for example `run-sorry-cypress.py`, `cmc-templates.py`, and related Cypress runners).
3. Report:
   - `Running` with process details, or
   - `Not running`.
4. If `Running`, ask operator whether to terminate.
5. If termination is approved, terminate matching process(es), confirm termination, then proceed to planned-command approval.
6. If termination is not approved, do not start a new run.

## Approval Workflow (Mandatory)
1. Build exact command(s) for the request.
2. Present them verbatim as planned commands.
3. Wait for explicit approval.
4. Run only approved command(s), no extra options.
5. If monitoring was not requested, report immediate success/failure for each command.
6. If monitoring was requested, keep monitoring until completion and report final outcome.

## Requested Test Style
When asked for one VM or a VM set:
- choose requested template/options,
- choose correct config file for intended Gold Disk,
- default to a config filename containing `gold` unless the operator explicitly says otherwise,
- use a descriptive `--build_name` without Gold Disk IDs.

## Update Rule
- After each run, update this guide only for workflow/rule/default changes.
- Update `atvm-automation-examples.md` for reusable command/option examples.
- Add run-specific learnings only to `atvm-automation-runs.md` when the run produced new information.

## Monitoring Policy
- Monitor only when the operator explicitly asks to monitor.
- If monitoring was not requested, run commands and report execution success/failure and any errors.
- If monitoring was requested, do not terminate processes automatically; only terminate if the operator explicitly instructs termination.

## Status Reporting Format
When the operator asks for the status of an ATVM automation run, report in this order:
1. Heading/title using the run `build_name`.
2. Completed machines with machine name first and status second for each machine.
3. Notes.
4. Skipped machines with reason.
5. Remaining machines still to run.
6. Summary counts for finished, passed, failed, and skipped machines.
7. Timing details:
   - start time
   - end time if complete
   - total run time if complete, or elapsed run time if still running
   - quickest completed test runtime
   - longest completed test runtime
   - average completed test runtime
8. Estimated completion time.

Status-report expectations:
- Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others).
- Use the live automation VM state when available.
- Derive the heading/title from the run `build_name` when available.
- Format every machine entry as `machine-name - STATUS`.
- Put each machine on its own line; never combine multiple machines into one paragraph or comma-separated line.
- Use a separate `Notes` section for failure reasons, anomalies, or operator-relevant context rather than cramming those details into the completed-machine list.
- For categorized runs, reconstruct the whole run across all category batches; do not treat the current live category batch as the full run scope.
- Derive completed-machine status from completed spec results already written during the same run.
- Parse all same-run `test-result-*.xml` files, not only machine-named `test-result-atvm*.xml` files.
- When XML filenames are hash-named, extract the machine name from XML contents such as `testsuite file=`, `testsuite name=`, or `testcase name=`.
- Ignore `check-xml-files.ts` XML outputs when counting machine completion because they are bookkeeping steps, not machine runs.
- When multiple same-run XML files exist for one machine, use the most recently written XML for that machine.
- Include the run start time in every status response when it can be derived from the run log.
- If the run is complete, include the end time and total run time.
- If the run is still active, include the elapsed run time so far.
- Include quickest completed test runtime, longest completed test runtime, and average completed test runtime under timing details when they can be derived from the run log.
- Show blacklisted machines under skipped machines even if they are part of the broader machine family requested by the operator.
- For skipped machines, include the reason category:
  - `BLACKLISTED: CMC INSTALL - CAN'T COMPILE`
  - `BLACKLISTED: SUPPORT REQUEST - WAITING`
  - `BLACKLISTED: RE-CREATE NEEDED`
- If a machine is currently in progress, show it under remaining machines as `RUNNING`.
- If a machine has not started yet, show it under remaining machines as `NOT STARTED`.
- If no failures are present in completed spec results, report those completed machines as `PASS`.
- If a completed spec result shows a failure, report that machine as `FAIL` in the completed list and append a longer same-line failure description when the extra detail is useful to the operator.
- Use `Notes` for extra context beyond the machine-specific same-line failure description.
- Base the completion estimate on the full remaining machine count and recent per-machine runtime visible in the run log.
- Make the estimate explicitly refer to completion of the entire remaining run, not only the current machine/spec.