# Run ATVM Automation Guide This file is guide-only documentation for operating ATVM CMC automation. Do not put specific run examples here. For reusable command examples and common option combinations, use `examples.md`. Treat `examples.md` as reference-only. Do not assume the operator wants the extra options shown in examples unless they explicitly request them. ## Purpose Run ATVM CMC automation tests on the designated automation VM without unintended system or file changes. ## ATVM Cypress Automation Controller Client - Hostname: `atvm-cypres-vm-1` - IP: `192.168.3.190` - Credentials: source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_CONTROLLER_USER` plus `ATVM_CONTROLLER_PASSWORD` ## ATVM Target Host Default - Treat `192.168.3.191` as the default ATVM target host reference. - For SSH to `192.168.3.191`, ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`. - For Linux SSH access to `192.168.3.191`, source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_TARGET_USER` plus `ATVM_TARGET_PASSWORD` unless the operator explicitly overrides them. - `ATVM_LINUX_TARGET_HOST`, `ATVM_LINUX_TARGET_USER`, and `ATVM_LINUX_TARGET_PASSWORD` mirror the Linux default values when an OS-specific reference is clearer. - For Windows guest access to `192.168.3.191`, source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_WINDOWS_TARGET_USER` plus `ATVM_WINDOWS_TARGET_PASSWORD` unless the operator explicitly overrides them. ## Operating Constraints - Run only scripts/commands explicitly requested. - Do not make manual system configuration changes on the client. - Do not edit client files unless explicitly requested. ## Operator Preferences - Do not include Gold Disk identifiers in `--build_name`. - `--build_name` must not contain spaces; use `-` between words. - For multiple VMs in same distro, use distro-scoped filtering (`--containsVm`) instead of long explicit VM lists. - Always include `--ignore_force_shutdown` on `cmc-templates.py` commands unless the operator explicitly asks not to. - Always include `--test_partition` on `cmc-templates.py` commands unless the operator explicitly asks not to. - Default plugin-bearing templates to `--use_specified_plugin iscsi` unless the operator explicitly requests a different plugin. - Do not add plugin or integration-type arguments to `cmc-systemOS`; that template should be planned without `--use_specified_plugin`, without `--integration_type`, and without watcher integration/plugin metadata. - For `cmc-migrateops-compute-migration`, default to `--set_static_ip_dest` unless the operator explicitly says otherwise. - For `cmc-migrateops-compute-migration` to VMware, default to `--vm_platforms vmware` unless the operator explicitly says otherwise. - For ATVM automation runs that involve Windows guests, default `run-sorry-cypress.py` to `--hang_retries 0` unless the operator explicitly says otherwise. - For `cmc-reboot`, treat `--use_specified_plugin both` as an exception case that requires an extra confirmation. - When `cmc-reboot` is planned with `--use_specified_plugin both`, warn that FC+iSCSI together may hit a "chicken before the egg" timing problem where iSCSI disks are not attached before mTDI / CMC services start. - For `cmc-reboot`, prefer `--use_specified_plugin fc` or `--use_specified_plugin iscsi` unless the operator explicitly reconfirms that `both` is really intended after seeing that warning. - Before preparing a new run, always check whether automation is already running. - Treat a prior status check as stale once control returns to the operator or a new ATVM request arrives; perform a fresh live controller check at request time instead of relying on the immediately previous result. - Always report whether automation is currently running. - If running, ask whether to terminate; terminate only with explicit approval. - After termination approval, terminate first, then execute the new run command set. - By default, execute `cmc-templates.py` and `run-sorry-cypress.py` without a pre-run approval gate. - If the operator explicitly asks to review planned commands first, show them before execution. - If the operator changes any part of the request before execution, rebuild commands and execute the revised command set. - Default to watcher-backed execution for every run unless the operator explicitly asks to run without watcher. - When `--categorize` is used with watcher enabled, treat the watcher as a sequential grouped-run watcher: - it must post one final Mattermost status per completed categorized group/sub-run - it must stay active between grouped sub-runs while the parent categorized request is still running - it must not stop after the first grouped run simply because one grouped run completed - if the child build id label does not match the actual host/spec being executed, report the grouped run using the inferred host-based group instead of the raw child build id label - it must not wait and replace those with one single parent-only post - After execution, report immediate success/failure only. - After execution, include the exact executed `cmc-templates.py` and `run-sorry-cypress.py` commands in the response. - Do not include expected, harmless `systemctl reset-failed ... unit not loaded` output in routine run-start confirmations. - Mention `reset-failed` output only when it prevents watcher startup or becomes relevant to debugging. - Do not actively monitor completion unless explicitly requested. - If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise. - Report command errors immediately. - `sshpass` may be used where password-based SSH automation is required. - Treat runner hang-kill events (`Sending SIGKILL ... due to no change` / `Max hang retries reached`) as explicit `FAILED` outcomes, not `RUNNING` or ambiguous termination. - For manual `run-sorry-cypress.py` execution, treat `ATVM_HANG_FAIL ...` log markers and `/tmp/atvm-runner-state-.json` terminal state files as the source of truth for hang-failure terminal status. ## Core Scripts - Template prep: `/root/cdc-e2e-cyp-12.17.4/cmc-templates.py` - Test execution: `./run-sorry-cypress.py` - Detailed host-level test artifacts: `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter` ## Detailed Test Artifacts - Use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter` on the automation controller for detailed per-host test evidence. - Reporter subdirectories of interest: - `logs/` - per-host text and JSON logs for the executed tests - `xml/` - machine result XML files and the final `check-xml-files.ts` bookkeeping output - `mochawesome/` - per-run HTML reports - When a machine fails, use the matching `logs/` entry first to capture the detailed failure context for that host. - Apply the failed-host detail recovery path to every ATVM template type, not just reboot. - For any failed host, recover detail in this order when available: - consolidated run log - matching `mochawesome` HTML - structured reporter artifacts such as per-host JSON or XML - text reporter artifacts - When reconstructing historical status, prefer `cmcReporter` artifacts over less-specific runner output because they preserve per-host results after the live run has ended. - Do not treat the existence of a per-host reporter artifact by itself as proof that the host passed. - For categorized grouped recovery, prefer the matching per-host reporter JSON or mochawesome result and carry through the real `failures`, `pending`, and failure message instead of assuming `PASS completed`. - If grouped XML only contains `check-xml-files.ts`, cross-check the grouped result against the per-host reporter artifacts before posting or repeating status for that grouped sub-run. - Do not report a categorized grouped sub-run as `PASS` from watcher `host_results`, grouped XML, or a lone `check-xml-files.ts` result by itself. - Before reporting a categorized grouped sub-run as `PASS`, confirm that the matching child batch also passed in the live launch log or the final `Cloud Run Finished` summary for that child run. - Treat saved watcher state under `/var/lib/atvm-run-watcher//state.json` as cached status only. - For completed-run verification, confirm in this order: - launch log under `/tmp/.launch.log` - matching `cmcReporter` artifacts - `Cloud Run Finished` summary and Currents URL - saved watcher state only as a comparison layer - If saved watcher state disagrees with the launch log or with a replay of the exact artifacts through the current watcher code, treat the saved state as stale and do not use it as the reported result. - Never confirm a completed run from `state.json` alone. Typical sequence: 1. Build the exact `cmc-templates.py` and `run-sorry-cypress.py` commands for the request. 2. Run `cmc-templates.py` with the requested options. 3. Wait for `cmc-templates.py` to fully finish and confirm success. 4. Verify the generated `.ts` files and the config `specPattern` include every requested VM before starting the runner. 5. By default, use watcher-backed execution unless the operator explicitly asks not to. 6. For watcher-backed runs, make sure the controller's deployed watcher code is the intended version before relying on its posts. 7. For watcher-backed runs, build the watcher-start command so it automatically includes the exact `cmc-templates.py` command via `--template-command` and the exact `run-sorry-cypress.py` command via `--runner-command`. 8. For watcher-backed runs, prefer the controller-local `atvm-runner@...` systemd service instead of detached SSH background launch patterns for `run-sorry-cypress.py`. 9. For watcher-backed runs, start the watcher before launching the runner service. 10. Start the runner with the matching config and build name. 11. Report immediate start success/failure and include the exact executed template and runner commands. Completed-run verification sequence: 1. Read the launch log for the build. 2. Inspect the matching reporter artifacts for the relevant host(s). 3. Use the `Cloud Run Finished` summary and Currents URL as the final parent-run check when present. 4. Compare that result against saved watcher state. 5. If there is any disagreement, replay the exact artifacts through the current watcher code in an isolated temp state directory before confirming the result. 6. For categorized runs, do not let a `check-xml-files.ts` child result override a failing child batch shown in the launch log or `Cloud Run Finished` summary. ## Config File / Gold Disk Mapping - `cypress.atvm-config-gold.ts` -> Gold Disk 1 - `cypress.atvm-config-gold-2.ts` -> Gold Disk 2 - Additional numbered config variants map to corresponding Gold Disks. - Do not default to `cypress.atvm-config.ts`. - Unless the operator explicitly requests another config, use a config file with `gold` in the filename. - If the operator-specified config file is missing, stop immediately and report the missing file. - Do not search for substitute ATVM config files and do not switch to another config unless the operator explicitly instructs it. - Treat `AutomatedTest-VMBootImg-Gold` as `gold` and `AutomatedTest-VMBootImg-Gold-2` as `gold-2`. - Use live vCenter inventory as the source of truth for current VM membership on those datastores. - Query vCenter datastore membership at request time when selecting `gold` vs `gold-2`; do not maintain or rely on a repo-side live reference file for this decision. - When the operator asks to inventory or show the contents of `AutomatedTest-VMBootImg-Gold` and `AutomatedTest-VMBootImg-Gold-2`, return hostname-only VM lists unless the operator explicitly asks for more detail. - When the operator provides an explicit VM list, check vCenter placement for every requested VM before choosing the config file. - Before presenting any ATVM run commands for an explicit VM-list request, tell the operator that the next step is a live vCenter placement check for the requested VMs and that the result will determine whether the run must use `gold` or `gold-2`. - For vCenter inspection and placement checks, prefer `govc` and raw vCenter REST calls when they are available before reaching for alternate wrappers. - For `govc`-based placement checks, use `govc vm.info -json ` and parse the lowercase JSON keys such as `virtualMachines` and `datastore`. - Resolve each returned datastore managed-object reference to a datastore name with `govc object.collect -s name` before deciding between `gold` and `gold-2`. - Ignore non-boot helper datastores such as install ISO attachments when applying the `gold` vs `gold-2` rule; base the family decision on the ATVM boot datastore membership. - If every requested VM is on `AutomatedTest-VMBootImg-Gold`, plan the run with the `gold` config. - If every requested VM is on `AutomatedTest-VMBootImg-Gold-2`, plan the run with the `gold-2` config. - If the requested VM set spans both `AutomatedTest-VMBootImg-Gold` and `AutomatedTest-VMBootImg-Gold-2`, stop immediately and do not prepare or run the test. - For a mixed-datastore request, report it as a discrepancy, list which requested VMs are on `AutomatedTest-VMBootImg-Gold` and which are on `AutomatedTest-VMBootImg-Gold-2`, tell the operator they need to correct the list, and ask whether they want the full VM inventories for both datastores so they can adjust the request. - Do not run an ATVM test against a mixed set of VMs from both `AutomatedTest-VMBootImg-Gold` and `AutomatedTest-VMBootImg-Gold-2`. ## Available Templates - `cmc-e2e` - `cmc-group-consistency` - `cmc-h2h-diff-platf` - `cmc-h2h-same-platf` - `cmc-migrateops` - `cmc-migrateops-compute-migration` - `cmc-reboot` - `cmc-systemOS` ## Command Pattern ```bash python3 cmc-templates.py --template