Update ATVM watcher for categorized sub-run posting

- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases - document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post - add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup - update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
2026-03-26 11:00:39 -04:00
parent 68cd428733
commit d60b8b9b18
6 changed files with 399 additions and 89 deletions
--- a/atvm/watcher-service/INSTALL.md
+++ b/atvm/watcher-service/INSTALL.md
@@ -8,8 +8,9 @@ This is a deployment plan only. It does not perform the installation.

 Install the local watcher package so the controller can:

- watch one ATVM run per watcher instance
- send final Mattermost status only for `COMPLETED` or `FAILED`
+- watch one requested ATVM run per watcher instance
+- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
+- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
 - suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN`
 - stop automatically after the watched run reaches a terminal state

@@ -116,7 +117,9 @@ Recommended permissions:
 9. Do a real ATVM run test.
   - launch a real run
   - start the watcher for that build name
+   - if the run uses `--categorize`, also pass `--categorize` to the watcher start helper
   - confirm final Mattermost delivery for a completed run
+   - confirm categorized execution sends one post per completed grouped sub-run

 ## Recommended Validation Commands

@@ -163,6 +166,7 @@ Example:
  --config-family gold \
  --migration-style "ATVM end-to-end migration validation" \
  --integration-plugin "pure with fc" \
+  --categorize \
  --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
 ```

@@ -184,9 +188,11 @@ The cancel helper should:

 - This is not a daemon.
 - One watcher instance is started per ATVM run.
+- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
 - The watcher exits after the run reaches a terminal state.
 - The watcher writes state under `/var/lib/atvm-run-watcher/<build-name>`.
- The watcher prevents duplicate Mattermost posts by writing a posted marker.
+- The watcher prevents duplicate Mattermost posts by writing posted markers.
+- Categorized sub-run state is written under `/var/lib/atvm-run-watcher/<build-name>/subruns/<subrun-key>/`.

 ## Failure Handling

@@ -200,6 +206,10 @@ Expected terminal behavior:
  - post to Mattermost
  - verify `ok`
  - exit
+- categorized `COMPLETED` / `FAILED`
+  - post once for that grouped sub-run
+  - verify `ok`
+  - continue until the parent request finishes
 - `CANCELLED`
  - write final `CANCELLED` state to `state.json`
  - do not post
--- a/atvm/watcher-service/README.md
+++ b/atvm/watcher-service/README.md
@@ -4,10 +4,14 @@ This folder contains a per-run ATVM watcher service package that is intended to

 ## Purpose

-Watch a single ATVM automation run until it reaches a terminal state, then:
+Watch an ATVM automation request until it reaches a terminal state, then:

- post the final status to Mattermost if the run state is `COMPLETED` or `FAILED`
- verify the Mattermost post succeeded
+- for non-categorized runs:
+  - post one final status to Mattermost if the run state is `COMPLETED` or `FAILED`
+- for categorized runs:
+  - detect each sequential categorized sub-run
+  - post one final status per completed categorized sub-run if that grouped run state is `COMPLETED` or `FAILED`
+- verify each Mattermost post succeeded
 - write durable watcher state
 - exit cleanly so the service stops

@@ -38,14 +42,14 @@ Do not treat `/root/atvm-watcher-service` as the preferred long-term install loc

 ## Per-Run Behavior

-Each watcher instance is tied to one build name.
+Each watcher instance is tied to one requested build name.

 Typical workflow:

 1. Launch the ATVM run.
 2. Start the watcher for that run.
 3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
-4. When the run reaches a terminal state:
+4. For non-categorized runs, when the run reaches a terminal state:
   - `COMPLETED` or `FAILED`
     - build the final ATVM status
     - send the status to Mattermost
@@ -56,6 +60,12 @@ Typical workflow:
     - do not post
     - mark the final state
     - exit
+5. For categorized runs:
+   - detect each grouped sub-run in sequence from the parent run log
+   - wait for that grouped sub-run to finish
+   - send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
+   - continue to the next grouped sub-run
+   - exit after the parent request reaches a terminal state

 ## Required Environment

@@ -71,6 +81,7 @@ Optional metadata for better status formatting:
 - `ATVM_WATCHER_MIGRATION_STYLE`
 - `ATVM_WATCHER_INTEGRATION_PLUGIN`
 - `ATVM_WATCHER_SCOPE_DESCRIPTION`
+- `ATVM_WATCHER_CATEGORIZED`

 ## Start Example

@@ -83,6 +94,7 @@ This helper writes a per-run environment file and starts the matching instance:
  --config-family gold \
  --migration-style "ATVM end-to-end migration validation" \
  --integration-plugin "pure with fc" \
+  --categorize \
  --scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
 ```

@@ -105,5 +117,7 @@ This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stop

 - The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
 - Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
+- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
+- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
 - Best-practice controller install path: `/opt/atvm-watcher-service`.
 - This package is local-only right now. Nothing here is installed on the controller yet.
--- a/atvm/watcher-service/atvm_run_watcher.py
+++ b/atvm/watcher-service/atvm_run_watcher.py
@@ -40,6 +40,17 @@ class HostResult:
    timestamp: Optional[datetime] = None


+@dataclass
+class SubRun:
+    key: str
+    display_name: str
+    started_at: datetime
+    expected_hosts: List[str]
+    completed: bool
+    currents_url: Optional[str]
+    notes: List[str]
+
+
 def now_utc() -> datetime:
    return datetime.now(timezone.utc)

@@ -152,6 +163,13 @@ def parse_xml_timestamp(raw: Optional[str]) -> Optional[datetime]:
        return None


+def parse_log_timestamp(raw: str) -> Optional[datetime]:
+    try:
+        return datetime.strptime(raw, "%Y-%m-%d %H:%M:%S,%f").replace(tzinfo=timezone.utc)
+    except ValueError:
+        return None
+
+
 def parse_host_xml(xml_path: Path) -> Optional[Tuple[str, HostResult]]:
    try:
        tree = ET.parse(xml_path)
@@ -194,6 +212,7 @@ def collect_host_results(
    expected_hosts: List[str],
    kernels: Dict[str, str],
    run_started_at: datetime,
+    run_ended_at: Optional[datetime] = None,
 ) -> Dict[str, HostResult]:
    xml_dir = reporter_root / "xml"
    results: Dict[str, HostResult] = {}
@@ -203,6 +222,8 @@ def collect_host_results(
        xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
        if xml_mtime < run_started_at:
            continue
+        if run_ended_at and xml_mtime >= run_ended_at:
+            continue
        parsed = parse_host_xml(xml_path)
        if not parsed:
            continue
@@ -214,21 +235,46 @@ def collect_host_results(
    return results


-def find_current_running_host(log_text: str, completed_hosts: List[str]) -> Optional[str]:
-    matches = re.findall(r"Running:\s+(?:cypress/cmcRegressionTest/)?(atvm[^/\s]+)\.ts", log_text)
-    for host in reversed(matches):
-        if host not in completed_hosts:
-            return host
-    return None
+def find_check_xml_end(
+    reporter_root: Path,
+    started_at: datetime,
+    ended_at: Optional[datetime] = None,
+) -> Optional[datetime]:
+    xml_dir = reporter_root / "xml"
+    if not xml_dir.exists():
+        return None
+    latest: Optional[datetime] = None
+    for xml_path in sorted(xml_dir.glob("test-result-*.xml"), key=lambda p: p.stat().st_mtime):
+        xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
+        if xml_mtime < started_at:
+            continue
+        if ended_at and xml_mtime >= ended_at:
+            continue
+        text = read_text(xml_path)
+        if "check-xml-files.ts" not in text:
+            continue
+        try:
+            tree = ET.parse(xml_path)
+            root = tree.getroot()
+            suite = root.find("testsuite")
+            if suite is None:
+                continue
+            ts = parse_xml_timestamp(suite.attrib.get("timestamp"))
+            if ts:
+                latest = ts
+        except ET.ParseError:
+            continue
+    return latest


-def infer_metadata() -> Dict[str, str]:
+def infer_metadata() -> Dict[str, object]:
    return {
        "template": os.environ.get("ATVM_WATCHER_TEMPLATE", "unknown"),
        "config_family": os.environ.get("ATVM_WATCHER_CONFIG_FAMILY", "unknown"),
        "migration_style": os.environ.get("ATVM_WATCHER_MIGRATION_STYLE", "ATVM automation validation"),
        "integration_plugin": os.environ.get("ATVM_WATCHER_INTEGRATION_PLUGIN", "unknown"),
        "scope_description": os.environ.get("ATVM_WATCHER_SCOPE_DESCRIPTION", "requested ATVM run scope"),
+        "categorized": os.environ.get("ATVM_WATCHER_CATEGORIZED", "false").lower() == "true",
    }


@@ -253,7 +299,7 @@ def format_timestamp_local(ts: Optional[datetime]) -> str:

 def build_status_markdown(
    build_name: str,
-    metadata: Dict[str, str],
+    metadata: Dict[str, object],
    host_results: Dict[str, HostResult],
    run_state: str,
    currents_url: Optional[str],
@@ -348,80 +394,225 @@ def post_to_mattermost(text: str) -> str:
        return response.read().decode().strip()


+def sanitize_key(raw: str) -> str:
+    return re.sub(r"[^A-Za-z0-9_.-]+", "-", raw).strip("-") or "subrun"
+
+
+def infer_group_label(hosts: List[str], index: int) -> str:
+    if not hosts:
+        return f"group{index}"
+    labels: List[str] = []
+    for host in hosts:
+        short = host.split("-", 1)[-1]
+        if short.startswith("w2k"):
+            label = "windows"
+        else:
+            label = re.sub(r"\d.*$", "", short) or short
+        if label not in labels:
+            labels.append(label)
+    return "-".join(labels) if labels else f"group{index}"
+
+
+def extract_segment_build_name(segment_text: str, parent_build_name: str) -> Optional[str]:
+    patterns = [
+        rf"({re.escape(parent_build_name)}-[A-Za-z0-9_.-]*batch\d+_\d+)",
+        r"([A-Za-z0-9_.-]+-batch\d+_\d+)",
+    ]
+    for pattern in patterns:
+        match = re.search(pattern, segment_text)
+        if match:
+            return match.group(1)
+    return None
+
+
+def split_log_segments(log_text: str, parent_build_name: str, categorized: bool, default_started_at: datetime) -> List[SubRun]:
+    if not categorized:
+        return [
+            SubRun(
+                key=sanitize_key(parent_build_name),
+                display_name=parent_build_name,
+                started_at=default_started_at,
+                expected_hosts=extract_expected_hosts(log_text),
+                completed=False,
+                currents_url=extract_currents_url(log_text),
+                notes=[],
+            )
+        ]
+
+    segment_starts: List[Tuple[int, Optional[datetime]]] = []
+    for match in re.finditer(r"^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) - INFO - Extracted specPattern:", log_text, re.M):
+        segment_starts.append((match.start(), parse_log_timestamp(match.group(1))))
+
+    if not segment_starts:
+        return [
+            SubRun(
+                key=sanitize_key(parent_build_name),
+                display_name=parent_build_name,
+                started_at=default_started_at,
+                expected_hosts=extract_expected_hosts(log_text),
+                completed=False,
+                currents_url=extract_currents_url(log_text),
+                notes=["Categorized mode was requested but no sub-run segment has appeared in the log yet."],
+            )
+        ]
+
+    segments: List[SubRun] = []
+    for index, (start_offset, start_ts) in enumerate(segment_starts, start=1):
+        end_offset = segment_starts[index][0] if index < len(segment_starts) else len(log_text)
+        segment_text = log_text[start_offset:end_offset]
+        expected_hosts = extract_expected_hosts(segment_text)
+        display_name = extract_segment_build_name(segment_text, parent_build_name)
+        if not display_name:
+            display_name = f"{parent_build_name}-{infer_group_label(expected_hosts, index)}"
+        segments.append(
+            SubRun(
+                key=sanitize_key(display_name),
+                display_name=display_name,
+                started_at=start_ts or default_started_at,
+                expected_hosts=expected_hosts,
+                completed=index < len(segment_starts),
+                currents_url=extract_currents_url(segment_text),
+                notes=[f"Categorized sub-run {index} of {len(segment_starts)}."],
+            )
+        )
+    return segments
+
+
+def evaluate_subrun(
+    subrun: SubRun,
+    reporter_root: Path,
+    inventory: Dict[str, str],
+    end_boundary: Optional[datetime],
+    parent_active: bool,
+    cancelled: bool,
+) -> Tuple[str, Dict[str, HostResult], Optional[datetime], Optional[datetime], Optional[str], List[str]]:
+    notes = list(subrun.notes)
+    host_results = collect_host_results(
+        reporter_root=reporter_root,
+        expected_hosts=subrun.expected_hosts,
+        kernels=inventory,
+        run_started_at=subrun.started_at,
+        run_ended_at=end_boundary,
+    )
+    check_end = find_check_xml_end(reporter_root, subrun.started_at, end_boundary)
+    start_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
+    end_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
+    if check_end:
+        end_candidates.append(check_end)
+    start_ts = min(start_candidates) if start_candidates else subrun.started_at
+    end_ts = max(end_candidates) if end_candidates else None
+
+    if cancelled:
+        notes.append("Cancellation marker detected.")
+        return "CANCELLED", host_results, start_ts, end_ts, subrun.currents_url, notes
+
+    if subrun.completed:
+        if not host_results:
+            notes.append("This categorized sub-run ended but no host results were detected.")
+            return "UNKNOWN", host_results, start_ts, end_ts, subrun.currents_url, notes
+        notes.append("Categorized sub-run completed and the next grouped run was launched.")
+        if check_end:
+            notes.append("Final `check-xml-files.ts` validation passed.")
+        state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
+        return state, host_results, start_ts, end_ts, subrun.currents_url, notes
+
+    if parent_active:
+        current_host = next((host for host in subrun.expected_hosts if host not in host_results), None)
+        if current_host and current_host not in host_results:
+            host_results[current_host] = HostResult(
+                host=current_host,
+                kernel=inventory.get(current_host, "unknown"),
+                status="RUN",
+                detail="in progress",
+            )
+        return "RUNNING", host_results, start_ts, end_ts, subrun.currents_url, notes
+
+    if host_results:
+        notes.append("Categorized sub-run completed after the parent runner exited.")
+        if check_end:
+            notes.append("Final `check-xml-files.ts` validation passed.")
+        state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
+        return state, host_results, start_ts, end_ts, subrun.currents_url, notes
+
+    notes.append("Parent run exited before this categorized sub-run produced host results.")
+    return "TERMINATED", host_results, start_ts, end_ts, subrun.currents_url, notes
+
+
 def determine_state(
    build_name: str,
    build_dir: Path,
    run_log: Path,
    reporter_root: Path,
    inventory: Dict[str, str],
+    metadata: Dict[str, object],
    started_at: datetime,
    process_gone_since: Optional[datetime],
    process_exit_grace_seconds: int,
-) -> Tuple[str, Dict[str, HostResult], str, Optional[datetime], Optional[datetime], Optional[str], List[str]]:
+) -> Tuple[str, List[Dict[str, object]], Dict[str, HostResult], Optional[datetime], Optional[datetime], Optional[str], List[str]]:
    cancelled_marker = build_dir / "cancelled.marker"
    log_text = read_text(run_log)
-    expected_hosts = extract_expected_hosts(log_text)
-    host_results = collect_host_results(reporter_root, expected_hosts, inventory, started_at)
    active = process_active(build_name)
-    currents_url = extract_currents_url(log_text)
+    cancelled = cancelled_marker.exists()
    notes: List[str] = []
+    subrun_states: List[Dict[str, object]] = []
+    parent_host_results: Dict[str, HostResult] = {}

-    current_host = find_current_running_host(log_text, list(host_results.keys()))
-    if current_host and current_host not in host_results:
-        host_results[current_host] = HostResult(
-            host=current_host,
-            kernel=inventory.get(current_host, "unknown"),
-            status="RUN",
-            detail="in progress",
+    subruns = split_log_segments(log_text, build_name, bool(metadata.get("categorized")), started_at)
+    for index, subrun in enumerate(subruns):
+        next_started_at = subruns[index + 1].started_at if index + 1 < len(subruns) else None
+        state, host_results, start_ts, end_ts, currents_url, subrun_notes = evaluate_subrun(
+            subrun=subrun,
+            reporter_root=reporter_root,
+            inventory=inventory,
+            end_boundary=next_started_at,
+            parent_active=active,
+            cancelled=cancelled,
+        )
+        for host, result in host_results.items():
+            parent_host_results[host] = result
+        subrun_states.append(
+            {
+                "key": subrun.key,
+                "display_name": subrun.display_name,
+                "state": state,
+                "host_results": host_results,
+                "start_ts": start_ts,
+                "end_ts": end_ts,
+                "currents_url": currents_url,
+                "notes": subrun_notes,
+            }
        )

-    start_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
-    end_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
-    check_xml = reporter_root / "xml"
-    for xml_path in sorted(check_xml.glob("test-result-*.xml"), key=lambda p: p.stat().st_mtime, reverse=True):
-        xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
-        if xml_mtime < started_at:
-            continue
-        text = read_text(xml_path)
-        if "check-xml-files.ts" in text:
-            try:
-                tree = ET.parse(xml_path)
-                root = tree.getroot()
-                suite = root.find("testsuite")
-                if suite is not None:
-                    ts = parse_xml_timestamp(suite.attrib.get("timestamp"))
-                    if ts:
-                        end_candidates.append(ts)
-            except ET.ParseError:
-                pass
-            break
+    parent_start_candidates = [subrun["start_ts"] for subrun in subrun_states if subrun["start_ts"]]
+    parent_end_candidates = [subrun["end_ts"] for subrun in subrun_states if subrun["end_ts"]]
+    start_ts = min(parent_start_candidates) if parent_start_candidates else started_at
+    end_ts = max(parent_end_candidates) if parent_end_candidates else find_check_xml_end(reporter_root, started_at)
+    currents_url = extract_currents_url(log_text)

-    start_ts = min(start_candidates) if start_candidates else started_at
-    end_ts = max(end_candidates) if end_candidates else None
-
-    if cancelled_marker.exists():
+    if cancelled:
        notes.append("Cancellation marker detected.")
-        return "CANCELLED", host_results, log_text, start_ts, end_ts, currents_url, notes
+        return "CANCELLED", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes

    if active:
        elapsed = (now_utc() - started_at).total_seconds()
        if elapsed > args.max_watch_seconds:
            notes.append("Watcher exceeded max watch duration while the run still appears active.")
-            return "HUNG", host_results, log_text, start_ts, end_ts, currents_url, notes
-        return "RUNNING", host_results, log_text, start_ts, end_ts, currents_url, notes
+            return "HUNG", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
+        return "RUNNING", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes

-    if "Cloud Run Finished" in log_text or currents_url:
-        state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
-        notes.append("Run finished and final reporting artifacts were detected.")
-        if any("check-xml-files.ts" in line for line in log_text.splitlines()):
-            notes.append("Final `check-xml-files.ts` validation passed.")
-        return state, host_results, log_text, start_ts, end_ts, currents_url, notes
+    terminal_subruns = [subrun for subrun in subrun_states if subrun["state"] in {"COMPLETED", "FAILED"}]
+    if terminal_subruns:
+        state = "FAILED" if any(result.failures for result in parent_host_results.values()) else "COMPLETED"
+        notes.append("Run finished and one or more sub-run result artifacts were detected.")
+        if end_ts:
+            notes.append("Final reporting artifacts were detected.")
+        return state, subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes

    if process_gone_since and (now_utc() - process_gone_since).total_seconds() >= process_exit_grace_seconds:
        notes.append("Run process exited without a clean completion signal.")
-        return "TERMINATED", host_results, log_text, start_ts, end_ts, currents_url, notes
+        return "TERMINATED", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes

-    return "RUNNING", host_results, log_text, start_ts, end_ts, currents_url, notes
+    return "RUNNING", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes


 if __name__ == "__main__":
@@ -455,12 +646,13 @@ if __name__ == "__main__":
        if active:
            process_gone_since = None

-        run_state, host_results, log_text, start_ts, end_ts, currents_url, notes = determine_state(
+        run_state, subrun_states, host_results, start_ts, end_ts, currents_url, notes = determine_state(
            build_name=build_name,
            build_dir=build_dir,
            run_log=run_log,
            reporter_root=reporter_root,
            inventory=inventory,
+            metadata=metadata,
            started_at=started_at,
            process_gone_since=process_gone_since,
            process_exit_grace_seconds=args.process_exit_grace_seconds,
@@ -478,8 +670,64 @@ if __name__ == "__main__":
            }
            for host, result in host_results.items()
        }
+        state["subruns"] = {
+            subrun["display_name"]: {
+                "state": subrun["state"],
+                "hosts": sorted(subrun["host_results"].keys()),
+                "start_ts": subrun["start_ts"].isoformat() if subrun["start_ts"] else None,
+                "end_ts": subrun["end_ts"].isoformat() if subrun["end_ts"] else None,
+                "currents_url": subrun["currents_url"],
+                "notes": subrun["notes"],
+            }
+            for subrun in subrun_states
+        }
        write_state(state_file, state)

+        for subrun in subrun_states:
+            subrun_dir = build_dir / "subruns" / subrun["key"]
+            ensure_dir(subrun_dir)
+            subrun_state_file = subrun_dir / "state.json"
+            subrun_posted_marker = subrun_dir / "posted.marker"
+            subrun_state = {
+                "display_name": subrun["display_name"],
+                "last_state": subrun["state"],
+                "last_seen_at": now_utc().isoformat(),
+                "host_results": {
+                    host: {
+                        "status": result.status,
+                        "detail": result.detail,
+                        "kernel": result.kernel,
+                        "tests": result.tests,
+                        "failures": result.failures,
+                    }
+                    for host, result in subrun["host_results"].items()
+                },
+                "notes": subrun["notes"],
+                "currents_url": subrun["currents_url"],
+                "started_at": subrun["start_ts"].isoformat() if subrun["start_ts"] else None,
+                "ended_at": subrun["end_ts"].isoformat() if subrun["end_ts"] else None,
+            }
+            if subrun["state"] in {"COMPLETED", "FAILED"} and not subrun_posted_marker.exists():
+                status_text = build_status_markdown(
+                    build_name=subrun["display_name"],
+                    metadata=metadata,
+                    host_results=dict(sorted(subrun["host_results"].items())),
+                    run_state=subrun["state"],
+                    currents_url=subrun["currents_url"],
+                    start_ts=subrun["start_ts"],
+                    end_ts=subrun["end_ts"],
+                    notes=subrun["notes"],
+                )
+                print(status_text)
+                response = post_to_mattermost(status_text)
+                if response != "ok":
+                    raise SystemExit(f"Mattermost webhook did not return ok for {subrun['display_name']}: {response!r}")
+                subrun_posted_marker.write_text("ok\n", encoding="utf-8")
+                subrun_state["mattermost_posted"] = True
+                subrun_state["mattermost_response"] = response
+                print(f"[watcher] Mattermost post confirmed for {subrun['display_name']}.")
+            write_state(subrun_state_file, subrun_state)
+
        if run_state == "RUNNING":
            print(f"[watcher] {build_name}: RUNNING")
            time.sleep(args.poll_interval)
@@ -497,7 +745,7 @@ if __name__ == "__main__":
        )
        print(status_text)

-        if run_state in {"COMPLETED", "FAILED"} and not posted_marker.exists():
+        if not metadata.get("categorized") and run_state in {"COMPLETED", "FAILED"} and not posted_marker.exists():
            response = post_to_mattermost(status_text)
            if response != "ok":
                raise SystemExit(f"Mattermost webhook did not return ok: {response!r}")
--- a/atvm/watcher-service/start-atvm-run-watcher.sh
+++ b/atvm/watcher-service/start-atvm-run-watcher.sh
@@ -13,6 +13,7 @@ Options:
  --migration-style <text>
  --integration-plugin <text>
  --scope-description <text>
+  --categorize
  --state-root <path>   Default: /var/lib/atvm-run-watcher
 EOF
 }
@@ -23,6 +24,7 @@ CONFIG_FAMILY=""
 MIGRATION_STYLE=""
 INTEGRATION_PLUGIN=""
 SCOPE_DESCRIPTION=""
+WATCHER_CATEGORIZED="false"
 STATE_ROOT="/var/lib/atvm-run-watcher"

 while [[ $# -gt 0 ]]; do
@@ -33,6 +35,7 @@ while [[ $# -gt 0 ]]; do
    --migration-style) MIGRATION_STYLE="${2:-}"; shift 2 ;;
    --integration-plugin) INTEGRATION_PLUGIN="${2:-}"; shift 2 ;;
    --scope-description) SCOPE_DESCRIPTION="${2:-}"; shift 2 ;;
+    --categorize) WATCHER_CATEGORIZED="true"; shift ;;
    --state-root) STATE_ROOT="${2:-}"; shift 2 ;;
    -h|--help) usage; exit 0 ;;
    *) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
@@ -54,6 +57,7 @@ ATVM_WATCHER_CONFIG_FAMILY=${CONFIG_FAMILY@Q}
 ATVM_WATCHER_MIGRATION_STYLE=${MIGRATION_STYLE@Q}
 ATVM_WATCHER_INTEGRATION_PLUGIN=${INTEGRATION_PLUGIN@Q}
 ATVM_WATCHER_SCOPE_DESCRIPTION=${SCOPE_DESCRIPTION@Q}
+ATVM_WATCHER_CATEGORIZED=${WATCHER_CATEGORIZED@Q}
 EOF

 systemctl start "atvm-run-watcher@${BUILD_NAME}.service"