Files
cds-ai/atvm/watcher-service/README.md
anthony.wen d60b8b9b18 Update ATVM watcher for categorized sub-run posting
- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases
- document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post
- add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup
- update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
2026-03-26 11:00:39 -04:00

4.4 KiB

ATVM Watcher Service

This folder contains a per-run ATVM watcher service package that is intended to be reviewed locally first and installed on the ATVM Cypress controller later only when explicitly requested.

Purpose

Watch an ATVM automation request until it reaches a terminal state, then:

  • for non-categorized runs:
    • post one final status to Mattermost if the run state is COMPLETED or FAILED
  • for categorized runs:
    • detect each sequential categorized sub-run
    • post one final status per completed categorized sub-run if that grouped run state is COMPLETED or FAILED
  • verify each Mattermost post succeeded
  • write durable watcher state
  • exit cleanly so the service stops

The watcher does not run indefinitely. It is designed for one run per service instance.

Files

  • atvm_run_watcher.py
    • main watcher implementation
  • atvm-run-watcher@.service
    • systemd template unit for one watcher instance per build name
  • start-atvm-run-watcher.sh
    • helper to write per-run environment data and start a watcher instance
  • cancel-atvm-run-watcher.sh
    • helper to mark a run cancelled and stop the watcher instance

Intended Controller Paths

These are the default install targets assumed by the included unit file:

  • service package root: /opt/atvm-watcher-service
  • watcher state root: /var/lib/atvm-run-watcher
  • controller ATVM automation root: /root/cdc-e2e-cyp-12.17.4
  • watcher environment file: /etc/atvm-run-watcher.env

Use /opt/atvm-watcher-service as the controller install root for future installs and reinstalls. Do not treat /root/atvm-watcher-service as the preferred long-term install location.

Per-Run Behavior

Each watcher instance is tied to one requested build name.

Typical workflow:

  1. Launch the ATVM run.
  2. Start the watcher for that run.
  3. The watcher polls the run log, process state, and cmcReporter artifacts.
  4. For non-categorized runs, when the run reaches a terminal state:
    • COMPLETED or FAILED
      • build the final ATVM status
      • send the status to Mattermost
      • verify Mattermost returned ok
      • mark the run as posted
      • exit
    • CANCELLED, TERMINATED, HUNG, or UNKNOWN
      • do not post
      • mark the final state
      • exit
  5. For categorized runs:
    • detect each grouped sub-run in sequence from the parent run log
    • wait for that grouped sub-run to finish
    • send one Mattermost post for that grouped sub-run if it reached COMPLETED or FAILED
    • continue to the next grouped sub-run
    • exit after the parent request reaches a terminal state

Required Environment

The service expects the local credentials file values to be made available on the controller through the service environment:

  • MATTERMOST_ATVM_WEBHOOK
  • MATTERMOST_ATVM_CHANNEL

Optional metadata for better status formatting:

  • ATVM_WATCHER_TEMPLATE
  • ATVM_WATCHER_CONFIG_FAMILY
  • ATVM_WATCHER_MIGRATION_STYLE
  • ATVM_WATCHER_INTEGRATION_PLUGIN
  • ATVM_WATCHER_SCOPE_DESCRIPTION
  • ATVM_WATCHER_CATEGORIZED

Start Example

This helper writes a per-run environment file and starts the matching instance:

./start-atvm-run-watcher.sh \
  --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
  --template cmc-e2e \
  --config-family gold \
  --migration-style "ATVM end-to-end migration validation" \
  --integration-plugin "pure with fc" \
  --categorize \
  --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"

That results in:

  • state dir:
    • /var/lib/atvm-run-watcher/e2e-redhat9.6-ubuntu24.04-w2k25-fc
  • service instance:
    • atvm-run-watcher@e2e-redhat9.6-ubuntu24.04-w2k25-fc.service

Cancel Example

./cancel-atvm-run-watcher.sh --build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc

This writes a cancellation marker, updates state.json to CANCELLED, and stops the watcher instance. The watcher will not send Mattermost results for that run.

Notes

  • The watcher uses the same ATVM status layout documented in atvm/docs/automation/status-template.md.
  • Kernel values are resolved from atvm/inventory/vm-inventory.md.
  • Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
  • In categorized mode, the watcher writes per-subrun state under subruns/ and posts each completed grouped run separately.
  • Best-practice controller install path: /opt/atvm-watcher-service.
  • This package is local-only right now. Nothing here is installed on the controller yet.