Update ATVM watcher for categorized sub-run posting
- update the watcher design and automation guide to treat --categorize as sequential ATVM sub-runs rather than one parent run with internal phases - document that categorized runs should send one Mattermost status per completed grouped sub-run instead of one parent-only final post - add a --categorize option to the watcher start helper so categorized mode is explicit in watcher startup - update the watcher implementation to track categorized sub-runs separately, write per-subrun state, and post each completed grouped run once
This commit is contained in:
@@ -43,6 +43,9 @@ Run ATVM CMC automation tests on the designated automation VM without unintended
|
|||||||
- Execute ATVM run commands only after explicit approval.
|
- Execute ATVM run commands only after explicit approval.
|
||||||
- Treat `approve` as approval to run without the watcher service.
|
- Treat `approve` as approval to run without the watcher service.
|
||||||
- Treat `approve with watcher` as approval to run and also start the per-run watcher service for that build.
|
- Treat `approve with watcher` as approval to run and also start the per-run watcher service for that build.
|
||||||
|
- When `--categorize` is used with watcher enabled, treat the watcher as a sequential grouped-run watcher:
|
||||||
|
- it must post one final Mattermost status per completed categorized group/sub-run
|
||||||
|
- it must not wait and replace those with one single parent-only post
|
||||||
- After execution, report immediate success/failure only.
|
- After execution, report immediate success/failure only.
|
||||||
- Do not actively monitor completion unless explicitly requested.
|
- Do not actively monitor completion unless explicitly requested.
|
||||||
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
|
- If monitoring is requested, allow long runtime windows (15-30+ minutes) and continue until completion unless operator instructs otherwise.
|
||||||
@@ -154,13 +157,14 @@ Before any new automation request:
|
|||||||
4. When the watcher is available, present the watcher-start command separately from the core run commands.
|
4. When the watcher is available, present the watcher-start command separately from the core run commands.
|
||||||
5. Treat `approve` as approval to execute the ATVM run without starting the watcher.
|
5. Treat `approve` as approval to execute the ATVM run without starting the watcher.
|
||||||
6. Treat `approve with watcher` as approval to execute the ATVM run and start the watcher for that build.
|
6. Treat `approve with watcher` as approval to execute the ATVM run and start the watcher for that build.
|
||||||
7. Run only approved command(s), no extra options and no silent substitutions.
|
7. If the run uses `--categorize` and the watcher is requested, include `--categorize` on the watcher start command too so the watcher tracks sequential categorized sub-runs correctly.
|
||||||
8. When both template generation and the Cypress runner are requested, run them sequentially, not in parallel.
|
8. Run only approved command(s), no extra options and no silent substitutions.
|
||||||
9. Do not launch `run-sorry-cypress.py` until `cmc-templates.py` has exited successfully and finished updating the intended config/spec files.
|
9. When both template generation and the Cypress runner are requested, run them sequentially, not in parallel.
|
||||||
10. Treat displayed commands as a review gate: do not execute either command until the operator has had a chance to review them and explicitly approve.
|
10. Do not launch `run-sorry-cypress.py` until `cmc-templates.py` has exited successfully and finished updating the intended config/spec files.
|
||||||
11. If the operator asks to change plugin, config, filters, build name, Gold Disk, or scope after commands are shown, discard the old plan, show the revised commands, and wait for new approval.
|
11. Treat displayed commands as a review gate: do not execute either command until the operator has had a chance to review them and explicitly approve.
|
||||||
12. If monitoring was not requested, report immediate success/failure for each command.
|
12. If the operator asks to change plugin, config, filters, build name, Gold Disk, or scope after commands are shown, discard the old plan, show the revised commands, and wait for new approval.
|
||||||
13. If monitoring was requested, keep monitoring until completion and report final outcome.
|
13. If monitoring was not requested, report immediate success/failure for each command.
|
||||||
|
14. If monitoring was requested, keep monitoring until completion and report final outcome.
|
||||||
|
|
||||||
## Requested Test Style
|
## Requested Test Style
|
||||||
When asked for one VM or a VM set:
|
When asked for one VM or a VM set:
|
||||||
@@ -193,6 +197,7 @@ When asked for one VM or a VM set:
|
|||||||
- Use the same ATVM status layout that would be shown to the operator locally when posting to Mattermost.
|
- Use the same ATVM status layout that would be shown to the operator locally when posting to Mattermost.
|
||||||
- Default status template: `/home/aw/code/cds/atvm/docs/automation/status-template.md`
|
- Default status template: `/home/aw/code/cds/atvm/docs/automation/status-template.md`
|
||||||
- Do not post to Mattermost unless the operator explicitly asks for the run status to be sent there.
|
- Do not post to Mattermost unless the operator explicitly asks for the run status to be sent there.
|
||||||
|
- For categorized execution with watcher enabled, send one Mattermost status per completed categorized sub-run/group after that grouped run fully finishes.
|
||||||
|
|
||||||
## Status Reporting Format
|
## Status Reporting Format
|
||||||
When the operator asks for the status of an ATVM automation run, report in this order:
|
When the operator asks for the status of an ATVM automation run, report in this order:
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# ATVM Mattermost Watcher Design
|
# ATVM Mattermost Watcher Design
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts the final run status to Mattermost only after the run has fully completed.
|
Design a controller-local watcher on the ATVM Cypress machine (`192.168.3.190`) that monitors an ATVM automation run and posts final run status to Mattermost only after the watched scope has fully completed.
|
||||||
|
|
||||||
This watcher must continue working even if the local operator machine is offline.
|
This watcher must continue working even if the local operator machine is offline.
|
||||||
|
|
||||||
@@ -9,9 +9,10 @@ This watcher must continue working even if the local operator machine is offline
|
|||||||
Use a `systemd`-managed watcher on the ATVM Cypress controller.
|
Use a `systemd`-managed watcher on the ATVM Cypress controller.
|
||||||
|
|
||||||
Recommended structure:
|
Recommended structure:
|
||||||
- one watcher script that evaluates the state of a specific ATVM run
|
- one watcher script that evaluates a specific ATVM run request
|
||||||
- one `systemd` service to execute the watcher
|
- one `systemd` service to execute the watcher
|
||||||
- optionally one `systemd` timer for periodic polling if the watcher is not implemented as a long-running process
|
- no always-on daemon
|
||||||
|
- for categorized ATVM runs, one watcher instance tracks the parent request and posts each categorized sub-run separately as those grouped runs complete
|
||||||
|
|
||||||
Preferred deployment target:
|
Preferred deployment target:
|
||||||
- controller host: `192.168.3.190`
|
- controller host: `192.168.3.190`
|
||||||
@@ -26,14 +27,23 @@ Expected variables:
|
|||||||
- `MATTERMOST_ATVM_CHANNEL`
|
- `MATTERMOST_ATVM_CHANNEL`
|
||||||
|
|
||||||
## Run Completion Rule
|
## Run Completion Rule
|
||||||
The watcher must send Mattermost results only after the ATVM run has fully completed.
|
The watcher must send Mattermost results only after the watched scope has fully completed.
|
||||||
|
|
||||||
A run is considered fully completed only when:
|
A non-categorized run is considered fully completed only when:
|
||||||
- there are no active runner processes for the run
|
- there are no active runner processes for the run
|
||||||
- the expected machine scope has final result artifacts
|
- the expected machine scope has final result artifacts
|
||||||
- no machine remains in `RUNNING` or `NOT STARTED`
|
- no machine remains in `RUNNING` or `NOT STARTED`
|
||||||
- final reporter artifacts confirm the run has ended
|
- final reporter artifacts confirm the run has ended
|
||||||
|
|
||||||
|
A categorized run must be treated differently:
|
||||||
|
- `--categorize` splits the request into sequential ATVM sub-runs
|
||||||
|
- each categorized group is its own run/job
|
||||||
|
- the watcher must detect each grouped sub-run in order
|
||||||
|
- the watcher must wait for that grouped sub-run to complete
|
||||||
|
- then send that grouped sub-run's final Mattermost status
|
||||||
|
- then continue watching for the next grouped sub-run
|
||||||
|
- the watcher must not wait until the very end to send one single parent-only post
|
||||||
|
|
||||||
Evidence sources:
|
Evidence sources:
|
||||||
- live runner processes on `192.168.3.190`
|
- live runner processes on `192.168.3.190`
|
||||||
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/`
|
- `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter/logs/`
|
||||||
@@ -70,7 +80,7 @@ Definitions:
|
|||||||
- the run is still active and not yet complete
|
- the run is still active and not yet complete
|
||||||
|
|
||||||
## Mattermost Posting Rule
|
## Mattermost Posting Rule
|
||||||
Post to Mattermost only when the run has fully completed.
|
Post to Mattermost only when the watched scope has fully completed.
|
||||||
|
|
||||||
Send Mattermost status for:
|
Send Mattermost status for:
|
||||||
- `COMPLETED`
|
- `COMPLETED`
|
||||||
@@ -86,6 +96,9 @@ Do not send Mattermost status for:
|
|||||||
Important clarification:
|
Important clarification:
|
||||||
- a completed run with failed hosts should still be posted
|
- a completed run with failed hosts should still be posted
|
||||||
- a cancelled, terminated, hung, or unknown run should not be posted
|
- a cancelled, terminated, hung, or unknown run should not be posted
|
||||||
|
- for categorized execution, this rule applies per categorized sub-run
|
||||||
|
- one categorized group completion should produce one Mattermost post
|
||||||
|
- do not send one parent-level aggregate post in place of the per-group posts
|
||||||
|
|
||||||
## Required Cancellation / Termination Handling
|
## Required Cancellation / Termination Handling
|
||||||
If a run is cancelled or terminated, the watcher must:
|
If a run is cancelled or terminated, the watcher must:
|
||||||
@@ -106,33 +119,47 @@ For each run, keep durable state such as:
|
|||||||
- last observed machine summary
|
- last observed machine summary
|
||||||
- timestamps for first seen, last seen, closed
|
- timestamps for first seen, last seen, closed
|
||||||
|
|
||||||
|
For categorized runs, keep durable state for:
|
||||||
|
- the parent request build name
|
||||||
|
- each detected categorized sub-run
|
||||||
|
- whether each categorized sub-run has already been posted
|
||||||
|
|
||||||
## Duplicate-Post Prevention
|
## Duplicate-Post Prevention
|
||||||
The watcher must prevent duplicate Mattermost posts.
|
The watcher must prevent duplicate Mattermost posts.
|
||||||
|
|
||||||
Required behavior:
|
Required behavior:
|
||||||
- only one final post per run
|
- for non-categorized execution, only one final post per run
|
||||||
- if a run is already marked as posted, do not send again
|
- for categorized execution, only one final post per categorized sub-run
|
||||||
- if a run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
- if a watched scope is already marked as posted, do not send again
|
||||||
|
- if a run or categorized sub-run is marked `CANCELLED`, `TERMINATED`, `HUNG`, or `UNKNOWN`, do not later convert it into a posted completion unless explicitly reset by an operator workflow
|
||||||
|
|
||||||
## Recommended State Files
|
## Recommended State Files
|
||||||
Use a durable controller-local state directory, for example:
|
Use a durable controller-local state directory, for example:
|
||||||
- `/var/lib/atvm-run-watcher/`
|
- `/var/lib/atvm-run-watcher/`
|
||||||
|
|
||||||
Possible contents:
|
Possible contents:
|
||||||
- one state file per run id
|
- one parent state file per requested build name
|
||||||
- one posted marker per run id
|
- one posted marker per non-categorized run
|
||||||
- one cancellation marker per run id
|
- one subdirectory per categorized sub-run with its own state and posted marker
|
||||||
|
- one cancellation marker per parent run id
|
||||||
- optional lock file to prevent multiple watcher instances from racing
|
- optional lock file to prevent multiple watcher instances from racing
|
||||||
|
|
||||||
## Recommended Operator Workflow
|
## Recommended Operator Workflow
|
||||||
Normal completion workflow:
|
Normal completion workflow:
|
||||||
1. ATVM run starts.
|
1. ATVM run starts.
|
||||||
2. Watcher tracks the run id / build name.
|
2. Watcher tracks the requested build name.
|
||||||
3. Watcher polls run state and artifacts.
|
3. Watcher polls run state and artifacts.
|
||||||
4. Run fully completes.
|
4. For non-categorized execution:
|
||||||
5. Watcher builds final status summary.
|
- wait for the run to fully complete
|
||||||
6. Watcher posts final status to Mattermost once.
|
- build one final status summary
|
||||||
7. Watcher marks the run as posted and closed.
|
- post one final Mattermost status
|
||||||
|
5. For categorized execution:
|
||||||
|
- detect each grouped sub-run in order
|
||||||
|
- wait for that grouped sub-run to fully complete
|
||||||
|
- build that grouped sub-run's final status summary
|
||||||
|
- post that grouped sub-run's final Mattermost status
|
||||||
|
- continue to the next grouped sub-run
|
||||||
|
6. Watcher marks the completed watched scope as posted and closed.
|
||||||
|
|
||||||
Cancellation / termination workflow:
|
Cancellation / termination workflow:
|
||||||
1. Operator stops the ATVM run.
|
1. Operator stops the ATVM run.
|
||||||
@@ -173,7 +200,9 @@ This watcher design must satisfy all of the following:
|
|||||||
- survive local operator machine downtime
|
- survive local operator machine downtime
|
||||||
- use `systemd`
|
- use `systemd`
|
||||||
- distinguish run states clearly
|
- distinguish run states clearly
|
||||||
- send Mattermost only after full completion
|
- send Mattermost only after full completion of the watched scope
|
||||||
- send completion results whether hosts passed or failed
|
- send completion results whether hosts passed or failed
|
||||||
- never send Mattermost for cancelled, terminated, hung, or unknown runs
|
- never send Mattermost for cancelled, terminated, hung, or unknown runs
|
||||||
- prevent duplicate or misleading posts
|
- prevent duplicate or misleading posts
|
||||||
|
- treat `--categorize` as sequential ATVM sub-runs, not as one parent run with internal phases
|
||||||
|
- send one Mattermost post per completed categorized sub-run
|
||||||
|
|||||||
@@ -8,8 +8,9 @@ This is a deployment plan only. It does not perform the installation.
|
|||||||
|
|
||||||
Install the local watcher package so the controller can:
|
Install the local watcher package so the controller can:
|
||||||
|
|
||||||
- watch one ATVM run per watcher instance
|
- watch one requested ATVM run per watcher instance
|
||||||
- send final Mattermost status only for `COMPLETED` or `FAILED`
|
- for non-categorized runs, send one final Mattermost status only for `COMPLETED` or `FAILED`
|
||||||
|
- for categorized runs, send one final Mattermost status per completed categorized sub-run/group
|
||||||
- suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN`
|
- suppress Mattermost posts for `CANCELLED`, `TERMINATED`, `HUNG`, and `UNKNOWN`
|
||||||
- stop automatically after the watched run reaches a terminal state
|
- stop automatically after the watched run reaches a terminal state
|
||||||
|
|
||||||
@@ -116,7 +117,9 @@ Recommended permissions:
|
|||||||
9. Do a real ATVM run test.
|
9. Do a real ATVM run test.
|
||||||
- launch a real run
|
- launch a real run
|
||||||
- start the watcher for that build name
|
- start the watcher for that build name
|
||||||
|
- if the run uses `--categorize`, also pass `--categorize` to the watcher start helper
|
||||||
- confirm final Mattermost delivery for a completed run
|
- confirm final Mattermost delivery for a completed run
|
||||||
|
- confirm categorized execution sends one post per completed grouped sub-run
|
||||||
|
|
||||||
## Recommended Validation Commands
|
## Recommended Validation Commands
|
||||||
|
|
||||||
@@ -163,6 +166,7 @@ Example:
|
|||||||
--config-family gold \
|
--config-family gold \
|
||||||
--migration-style "ATVM end-to-end migration validation" \
|
--migration-style "ATVM end-to-end migration validation" \
|
||||||
--integration-plugin "pure with fc" \
|
--integration-plugin "pure with fc" \
|
||||||
|
--categorize \
|
||||||
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -184,9 +188,11 @@ The cancel helper should:
|
|||||||
|
|
||||||
- This is not a daemon.
|
- This is not a daemon.
|
||||||
- One watcher instance is started per ATVM run.
|
- One watcher instance is started per ATVM run.
|
||||||
|
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
|
||||||
- The watcher exits after the run reaches a terminal state.
|
- The watcher exits after the run reaches a terminal state.
|
||||||
- The watcher writes state under `/var/lib/atvm-run-watcher/<build-name>`.
|
- The watcher writes state under `/var/lib/atvm-run-watcher/<build-name>`.
|
||||||
- The watcher prevents duplicate Mattermost posts by writing a posted marker.
|
- The watcher prevents duplicate Mattermost posts by writing posted markers.
|
||||||
|
- Categorized sub-run state is written under `/var/lib/atvm-run-watcher/<build-name>/subruns/<subrun-key>/`.
|
||||||
|
|
||||||
## Failure Handling
|
## Failure Handling
|
||||||
|
|
||||||
@@ -200,6 +206,10 @@ Expected terminal behavior:
|
|||||||
- post to Mattermost
|
- post to Mattermost
|
||||||
- verify `ok`
|
- verify `ok`
|
||||||
- exit
|
- exit
|
||||||
|
- categorized `COMPLETED` / `FAILED`
|
||||||
|
- post once for that grouped sub-run
|
||||||
|
- verify `ok`
|
||||||
|
- continue until the parent request finishes
|
||||||
- `CANCELLED`
|
- `CANCELLED`
|
||||||
- write final `CANCELLED` state to `state.json`
|
- write final `CANCELLED` state to `state.json`
|
||||||
- do not post
|
- do not post
|
||||||
|
|||||||
@@ -4,10 +4,14 @@ This folder contains a per-run ATVM watcher service package that is intended to
|
|||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
Watch a single ATVM automation run until it reaches a terminal state, then:
|
Watch an ATVM automation request until it reaches a terminal state, then:
|
||||||
|
|
||||||
- post the final status to Mattermost if the run state is `COMPLETED` or `FAILED`
|
- for non-categorized runs:
|
||||||
- verify the Mattermost post succeeded
|
- post one final status to Mattermost if the run state is `COMPLETED` or `FAILED`
|
||||||
|
- for categorized runs:
|
||||||
|
- detect each sequential categorized sub-run
|
||||||
|
- post one final status per completed categorized sub-run if that grouped run state is `COMPLETED` or `FAILED`
|
||||||
|
- verify each Mattermost post succeeded
|
||||||
- write durable watcher state
|
- write durable watcher state
|
||||||
- exit cleanly so the service stops
|
- exit cleanly so the service stops
|
||||||
|
|
||||||
@@ -38,14 +42,14 @@ Do not treat `/root/atvm-watcher-service` as the preferred long-term install loc
|
|||||||
|
|
||||||
## Per-Run Behavior
|
## Per-Run Behavior
|
||||||
|
|
||||||
Each watcher instance is tied to one build name.
|
Each watcher instance is tied to one requested build name.
|
||||||
|
|
||||||
Typical workflow:
|
Typical workflow:
|
||||||
|
|
||||||
1. Launch the ATVM run.
|
1. Launch the ATVM run.
|
||||||
2. Start the watcher for that run.
|
2. Start the watcher for that run.
|
||||||
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
|
3. The watcher polls the run log, process state, and `cmcReporter` artifacts.
|
||||||
4. When the run reaches a terminal state:
|
4. For non-categorized runs, when the run reaches a terminal state:
|
||||||
- `COMPLETED` or `FAILED`
|
- `COMPLETED` or `FAILED`
|
||||||
- build the final ATVM status
|
- build the final ATVM status
|
||||||
- send the status to Mattermost
|
- send the status to Mattermost
|
||||||
@@ -56,6 +60,12 @@ Typical workflow:
|
|||||||
- do not post
|
- do not post
|
||||||
- mark the final state
|
- mark the final state
|
||||||
- exit
|
- exit
|
||||||
|
5. For categorized runs:
|
||||||
|
- detect each grouped sub-run in sequence from the parent run log
|
||||||
|
- wait for that grouped sub-run to finish
|
||||||
|
- send one Mattermost post for that grouped sub-run if it reached `COMPLETED` or `FAILED`
|
||||||
|
- continue to the next grouped sub-run
|
||||||
|
- exit after the parent request reaches a terminal state
|
||||||
|
|
||||||
## Required Environment
|
## Required Environment
|
||||||
|
|
||||||
@@ -71,6 +81,7 @@ Optional metadata for better status formatting:
|
|||||||
- `ATVM_WATCHER_MIGRATION_STYLE`
|
- `ATVM_WATCHER_MIGRATION_STYLE`
|
||||||
- `ATVM_WATCHER_INTEGRATION_PLUGIN`
|
- `ATVM_WATCHER_INTEGRATION_PLUGIN`
|
||||||
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
|
- `ATVM_WATCHER_SCOPE_DESCRIPTION`
|
||||||
|
- `ATVM_WATCHER_CATEGORIZED`
|
||||||
|
|
||||||
## Start Example
|
## Start Example
|
||||||
|
|
||||||
@@ -83,6 +94,7 @@ This helper writes a per-run environment file and starts the matching instance:
|
|||||||
--config-family gold \
|
--config-family gold \
|
||||||
--migration-style "ATVM end-to-end migration validation" \
|
--migration-style "ATVM end-to-end migration validation" \
|
||||||
--integration-plugin "pure with fc" \
|
--integration-plugin "pure with fc" \
|
||||||
|
--categorize \
|
||||||
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
--scope-description "mixed Linux and Windows FC E2E validation on the gold datastore set"
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -105,5 +117,7 @@ This writes a cancellation marker, updates `state.json` to `CANCELLED`, and stop
|
|||||||
|
|
||||||
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
||||||
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
||||||
|
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
||||||
|
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
||||||
- Best-practice controller install path: `/opt/atvm-watcher-service`.
|
- Best-practice controller install path: `/opt/atvm-watcher-service`.
|
||||||
- This package is local-only right now. Nothing here is installed on the controller yet.
|
- This package is local-only right now. Nothing here is installed on the controller yet.
|
||||||
|
|||||||
@@ -40,6 +40,17 @@ class HostResult:
|
|||||||
timestamp: Optional[datetime] = None
|
timestamp: Optional[datetime] = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SubRun:
|
||||||
|
key: str
|
||||||
|
display_name: str
|
||||||
|
started_at: datetime
|
||||||
|
expected_hosts: List[str]
|
||||||
|
completed: bool
|
||||||
|
currents_url: Optional[str]
|
||||||
|
notes: List[str]
|
||||||
|
|
||||||
|
|
||||||
def now_utc() -> datetime:
|
def now_utc() -> datetime:
|
||||||
return datetime.now(timezone.utc)
|
return datetime.now(timezone.utc)
|
||||||
|
|
||||||
@@ -152,6 +163,13 @@ def parse_xml_timestamp(raw: Optional[str]) -> Optional[datetime]:
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def parse_log_timestamp(raw: str) -> Optional[datetime]:
|
||||||
|
try:
|
||||||
|
return datetime.strptime(raw, "%Y-%m-%d %H:%M:%S,%f").replace(tzinfo=timezone.utc)
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def parse_host_xml(xml_path: Path) -> Optional[Tuple[str, HostResult]]:
|
def parse_host_xml(xml_path: Path) -> Optional[Tuple[str, HostResult]]:
|
||||||
try:
|
try:
|
||||||
tree = ET.parse(xml_path)
|
tree = ET.parse(xml_path)
|
||||||
@@ -194,6 +212,7 @@ def collect_host_results(
|
|||||||
expected_hosts: List[str],
|
expected_hosts: List[str],
|
||||||
kernels: Dict[str, str],
|
kernels: Dict[str, str],
|
||||||
run_started_at: datetime,
|
run_started_at: datetime,
|
||||||
|
run_ended_at: Optional[datetime] = None,
|
||||||
) -> Dict[str, HostResult]:
|
) -> Dict[str, HostResult]:
|
||||||
xml_dir = reporter_root / "xml"
|
xml_dir = reporter_root / "xml"
|
||||||
results: Dict[str, HostResult] = {}
|
results: Dict[str, HostResult] = {}
|
||||||
@@ -203,6 +222,8 @@ def collect_host_results(
|
|||||||
xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
|
xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
|
||||||
if xml_mtime < run_started_at:
|
if xml_mtime < run_started_at:
|
||||||
continue
|
continue
|
||||||
|
if run_ended_at and xml_mtime >= run_ended_at:
|
||||||
|
continue
|
||||||
parsed = parse_host_xml(xml_path)
|
parsed = parse_host_xml(xml_path)
|
||||||
if not parsed:
|
if not parsed:
|
||||||
continue
|
continue
|
||||||
@@ -214,21 +235,46 @@ def collect_host_results(
|
|||||||
return results
|
return results
|
||||||
|
|
||||||
|
|
||||||
def find_current_running_host(log_text: str, completed_hosts: List[str]) -> Optional[str]:
|
def find_check_xml_end(
|
||||||
matches = re.findall(r"Running:\s+(?:cypress/cmcRegressionTest/)?(atvm[^/\s]+)\.ts", log_text)
|
reporter_root: Path,
|
||||||
for host in reversed(matches):
|
started_at: datetime,
|
||||||
if host not in completed_hosts:
|
ended_at: Optional[datetime] = None,
|
||||||
return host
|
) -> Optional[datetime]:
|
||||||
return None
|
xml_dir = reporter_root / "xml"
|
||||||
|
if not xml_dir.exists():
|
||||||
|
return None
|
||||||
|
latest: Optional[datetime] = None
|
||||||
|
for xml_path in sorted(xml_dir.glob("test-result-*.xml"), key=lambda p: p.stat().st_mtime):
|
||||||
|
xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
|
||||||
|
if xml_mtime < started_at:
|
||||||
|
continue
|
||||||
|
if ended_at and xml_mtime >= ended_at:
|
||||||
|
continue
|
||||||
|
text = read_text(xml_path)
|
||||||
|
if "check-xml-files.ts" not in text:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
tree = ET.parse(xml_path)
|
||||||
|
root = tree.getroot()
|
||||||
|
suite = root.find("testsuite")
|
||||||
|
if suite is None:
|
||||||
|
continue
|
||||||
|
ts = parse_xml_timestamp(suite.attrib.get("timestamp"))
|
||||||
|
if ts:
|
||||||
|
latest = ts
|
||||||
|
except ET.ParseError:
|
||||||
|
continue
|
||||||
|
return latest
|
||||||
|
|
||||||
|
|
||||||
def infer_metadata() -> Dict[str, str]:
|
def infer_metadata() -> Dict[str, object]:
|
||||||
return {
|
return {
|
||||||
"template": os.environ.get("ATVM_WATCHER_TEMPLATE", "unknown"),
|
"template": os.environ.get("ATVM_WATCHER_TEMPLATE", "unknown"),
|
||||||
"config_family": os.environ.get("ATVM_WATCHER_CONFIG_FAMILY", "unknown"),
|
"config_family": os.environ.get("ATVM_WATCHER_CONFIG_FAMILY", "unknown"),
|
||||||
"migration_style": os.environ.get("ATVM_WATCHER_MIGRATION_STYLE", "ATVM automation validation"),
|
"migration_style": os.environ.get("ATVM_WATCHER_MIGRATION_STYLE", "ATVM automation validation"),
|
||||||
"integration_plugin": os.environ.get("ATVM_WATCHER_INTEGRATION_PLUGIN", "unknown"),
|
"integration_plugin": os.environ.get("ATVM_WATCHER_INTEGRATION_PLUGIN", "unknown"),
|
||||||
"scope_description": os.environ.get("ATVM_WATCHER_SCOPE_DESCRIPTION", "requested ATVM run scope"),
|
"scope_description": os.environ.get("ATVM_WATCHER_SCOPE_DESCRIPTION", "requested ATVM run scope"),
|
||||||
|
"categorized": os.environ.get("ATVM_WATCHER_CATEGORIZED", "false").lower() == "true",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -253,7 +299,7 @@ def format_timestamp_local(ts: Optional[datetime]) -> str:
|
|||||||
|
|
||||||
def build_status_markdown(
|
def build_status_markdown(
|
||||||
build_name: str,
|
build_name: str,
|
||||||
metadata: Dict[str, str],
|
metadata: Dict[str, object],
|
||||||
host_results: Dict[str, HostResult],
|
host_results: Dict[str, HostResult],
|
||||||
run_state: str,
|
run_state: str,
|
||||||
currents_url: Optional[str],
|
currents_url: Optional[str],
|
||||||
@@ -348,80 +394,225 @@ def post_to_mattermost(text: str) -> str:
|
|||||||
return response.read().decode().strip()
|
return response.read().decode().strip()
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_key(raw: str) -> str:
|
||||||
|
return re.sub(r"[^A-Za-z0-9_.-]+", "-", raw).strip("-") or "subrun"
|
||||||
|
|
||||||
|
|
||||||
|
def infer_group_label(hosts: List[str], index: int) -> str:
|
||||||
|
if not hosts:
|
||||||
|
return f"group{index}"
|
||||||
|
labels: List[str] = []
|
||||||
|
for host in hosts:
|
||||||
|
short = host.split("-", 1)[-1]
|
||||||
|
if short.startswith("w2k"):
|
||||||
|
label = "windows"
|
||||||
|
else:
|
||||||
|
label = re.sub(r"\d.*$", "", short) or short
|
||||||
|
if label not in labels:
|
||||||
|
labels.append(label)
|
||||||
|
return "-".join(labels) if labels else f"group{index}"
|
||||||
|
|
||||||
|
|
||||||
|
def extract_segment_build_name(segment_text: str, parent_build_name: str) -> Optional[str]:
|
||||||
|
patterns = [
|
||||||
|
rf"({re.escape(parent_build_name)}-[A-Za-z0-9_.-]*batch\d+_\d+)",
|
||||||
|
r"([A-Za-z0-9_.-]+-batch\d+_\d+)",
|
||||||
|
]
|
||||||
|
for pattern in patterns:
|
||||||
|
match = re.search(pattern, segment_text)
|
||||||
|
if match:
|
||||||
|
return match.group(1)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def split_log_segments(log_text: str, parent_build_name: str, categorized: bool, default_started_at: datetime) -> List[SubRun]:
|
||||||
|
if not categorized:
|
||||||
|
return [
|
||||||
|
SubRun(
|
||||||
|
key=sanitize_key(parent_build_name),
|
||||||
|
display_name=parent_build_name,
|
||||||
|
started_at=default_started_at,
|
||||||
|
expected_hosts=extract_expected_hosts(log_text),
|
||||||
|
completed=False,
|
||||||
|
currents_url=extract_currents_url(log_text),
|
||||||
|
notes=[],
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
segment_starts: List[Tuple[int, Optional[datetime]]] = []
|
||||||
|
for match in re.finditer(r"^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) - INFO - Extracted specPattern:", log_text, re.M):
|
||||||
|
segment_starts.append((match.start(), parse_log_timestamp(match.group(1))))
|
||||||
|
|
||||||
|
if not segment_starts:
|
||||||
|
return [
|
||||||
|
SubRun(
|
||||||
|
key=sanitize_key(parent_build_name),
|
||||||
|
display_name=parent_build_name,
|
||||||
|
started_at=default_started_at,
|
||||||
|
expected_hosts=extract_expected_hosts(log_text),
|
||||||
|
completed=False,
|
||||||
|
currents_url=extract_currents_url(log_text),
|
||||||
|
notes=["Categorized mode was requested but no sub-run segment has appeared in the log yet."],
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
segments: List[SubRun] = []
|
||||||
|
for index, (start_offset, start_ts) in enumerate(segment_starts, start=1):
|
||||||
|
end_offset = segment_starts[index][0] if index < len(segment_starts) else len(log_text)
|
||||||
|
segment_text = log_text[start_offset:end_offset]
|
||||||
|
expected_hosts = extract_expected_hosts(segment_text)
|
||||||
|
display_name = extract_segment_build_name(segment_text, parent_build_name)
|
||||||
|
if not display_name:
|
||||||
|
display_name = f"{parent_build_name}-{infer_group_label(expected_hosts, index)}"
|
||||||
|
segments.append(
|
||||||
|
SubRun(
|
||||||
|
key=sanitize_key(display_name),
|
||||||
|
display_name=display_name,
|
||||||
|
started_at=start_ts or default_started_at,
|
||||||
|
expected_hosts=expected_hosts,
|
||||||
|
completed=index < len(segment_starts),
|
||||||
|
currents_url=extract_currents_url(segment_text),
|
||||||
|
notes=[f"Categorized sub-run {index} of {len(segment_starts)}."],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
def evaluate_subrun(
|
||||||
|
subrun: SubRun,
|
||||||
|
reporter_root: Path,
|
||||||
|
inventory: Dict[str, str],
|
||||||
|
end_boundary: Optional[datetime],
|
||||||
|
parent_active: bool,
|
||||||
|
cancelled: bool,
|
||||||
|
) -> Tuple[str, Dict[str, HostResult], Optional[datetime], Optional[datetime], Optional[str], List[str]]:
|
||||||
|
notes = list(subrun.notes)
|
||||||
|
host_results = collect_host_results(
|
||||||
|
reporter_root=reporter_root,
|
||||||
|
expected_hosts=subrun.expected_hosts,
|
||||||
|
kernels=inventory,
|
||||||
|
run_started_at=subrun.started_at,
|
||||||
|
run_ended_at=end_boundary,
|
||||||
|
)
|
||||||
|
check_end = find_check_xml_end(reporter_root, subrun.started_at, end_boundary)
|
||||||
|
start_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
|
||||||
|
end_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
|
||||||
|
if check_end:
|
||||||
|
end_candidates.append(check_end)
|
||||||
|
start_ts = min(start_candidates) if start_candidates else subrun.started_at
|
||||||
|
end_ts = max(end_candidates) if end_candidates else None
|
||||||
|
|
||||||
|
if cancelled:
|
||||||
|
notes.append("Cancellation marker detected.")
|
||||||
|
return "CANCELLED", host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
|
||||||
|
if subrun.completed:
|
||||||
|
if not host_results:
|
||||||
|
notes.append("This categorized sub-run ended but no host results were detected.")
|
||||||
|
return "UNKNOWN", host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
notes.append("Categorized sub-run completed and the next grouped run was launched.")
|
||||||
|
if check_end:
|
||||||
|
notes.append("Final `check-xml-files.ts` validation passed.")
|
||||||
|
state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
|
||||||
|
return state, host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
|
||||||
|
if parent_active:
|
||||||
|
current_host = next((host for host in subrun.expected_hosts if host not in host_results), None)
|
||||||
|
if current_host and current_host not in host_results:
|
||||||
|
host_results[current_host] = HostResult(
|
||||||
|
host=current_host,
|
||||||
|
kernel=inventory.get(current_host, "unknown"),
|
||||||
|
status="RUN",
|
||||||
|
detail="in progress",
|
||||||
|
)
|
||||||
|
return "RUNNING", host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
|
||||||
|
if host_results:
|
||||||
|
notes.append("Categorized sub-run completed after the parent runner exited.")
|
||||||
|
if check_end:
|
||||||
|
notes.append("Final `check-xml-files.ts` validation passed.")
|
||||||
|
state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
|
||||||
|
return state, host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
|
||||||
|
notes.append("Parent run exited before this categorized sub-run produced host results.")
|
||||||
|
return "TERMINATED", host_results, start_ts, end_ts, subrun.currents_url, notes
|
||||||
|
|
||||||
|
|
||||||
def determine_state(
|
def determine_state(
|
||||||
build_name: str,
|
build_name: str,
|
||||||
build_dir: Path,
|
build_dir: Path,
|
||||||
run_log: Path,
|
run_log: Path,
|
||||||
reporter_root: Path,
|
reporter_root: Path,
|
||||||
inventory: Dict[str, str],
|
inventory: Dict[str, str],
|
||||||
|
metadata: Dict[str, object],
|
||||||
started_at: datetime,
|
started_at: datetime,
|
||||||
process_gone_since: Optional[datetime],
|
process_gone_since: Optional[datetime],
|
||||||
process_exit_grace_seconds: int,
|
process_exit_grace_seconds: int,
|
||||||
) -> Tuple[str, Dict[str, HostResult], str, Optional[datetime], Optional[datetime], Optional[str], List[str]]:
|
) -> Tuple[str, List[Dict[str, object]], Dict[str, HostResult], Optional[datetime], Optional[datetime], Optional[str], List[str]]:
|
||||||
cancelled_marker = build_dir / "cancelled.marker"
|
cancelled_marker = build_dir / "cancelled.marker"
|
||||||
log_text = read_text(run_log)
|
log_text = read_text(run_log)
|
||||||
expected_hosts = extract_expected_hosts(log_text)
|
|
||||||
host_results = collect_host_results(reporter_root, expected_hosts, inventory, started_at)
|
|
||||||
active = process_active(build_name)
|
active = process_active(build_name)
|
||||||
currents_url = extract_currents_url(log_text)
|
cancelled = cancelled_marker.exists()
|
||||||
notes: List[str] = []
|
notes: List[str] = []
|
||||||
|
subrun_states: List[Dict[str, object]] = []
|
||||||
|
parent_host_results: Dict[str, HostResult] = {}
|
||||||
|
|
||||||
current_host = find_current_running_host(log_text, list(host_results.keys()))
|
subruns = split_log_segments(log_text, build_name, bool(metadata.get("categorized")), started_at)
|
||||||
if current_host and current_host not in host_results:
|
for index, subrun in enumerate(subruns):
|
||||||
host_results[current_host] = HostResult(
|
next_started_at = subruns[index + 1].started_at if index + 1 < len(subruns) else None
|
||||||
host=current_host,
|
state, host_results, start_ts, end_ts, currents_url, subrun_notes = evaluate_subrun(
|
||||||
kernel=inventory.get(current_host, "unknown"),
|
subrun=subrun,
|
||||||
status="RUN",
|
reporter_root=reporter_root,
|
||||||
detail="in progress",
|
inventory=inventory,
|
||||||
|
end_boundary=next_started_at,
|
||||||
|
parent_active=active,
|
||||||
|
cancelled=cancelled,
|
||||||
|
)
|
||||||
|
for host, result in host_results.items():
|
||||||
|
parent_host_results[host] = result
|
||||||
|
subrun_states.append(
|
||||||
|
{
|
||||||
|
"key": subrun.key,
|
||||||
|
"display_name": subrun.display_name,
|
||||||
|
"state": state,
|
||||||
|
"host_results": host_results,
|
||||||
|
"start_ts": start_ts,
|
||||||
|
"end_ts": end_ts,
|
||||||
|
"currents_url": currents_url,
|
||||||
|
"notes": subrun_notes,
|
||||||
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
start_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
|
parent_start_candidates = [subrun["start_ts"] for subrun in subrun_states if subrun["start_ts"]]
|
||||||
end_candidates = [result.timestamp for result in host_results.values() if result.timestamp]
|
parent_end_candidates = [subrun["end_ts"] for subrun in subrun_states if subrun["end_ts"]]
|
||||||
check_xml = reporter_root / "xml"
|
start_ts = min(parent_start_candidates) if parent_start_candidates else started_at
|
||||||
for xml_path in sorted(check_xml.glob("test-result-*.xml"), key=lambda p: p.stat().st_mtime, reverse=True):
|
end_ts = max(parent_end_candidates) if parent_end_candidates else find_check_xml_end(reporter_root, started_at)
|
||||||
xml_mtime = datetime.fromtimestamp(xml_path.stat().st_mtime, tz=timezone.utc)
|
currents_url = extract_currents_url(log_text)
|
||||||
if xml_mtime < started_at:
|
|
||||||
continue
|
|
||||||
text = read_text(xml_path)
|
|
||||||
if "check-xml-files.ts" in text:
|
|
||||||
try:
|
|
||||||
tree = ET.parse(xml_path)
|
|
||||||
root = tree.getroot()
|
|
||||||
suite = root.find("testsuite")
|
|
||||||
if suite is not None:
|
|
||||||
ts = parse_xml_timestamp(suite.attrib.get("timestamp"))
|
|
||||||
if ts:
|
|
||||||
end_candidates.append(ts)
|
|
||||||
except ET.ParseError:
|
|
||||||
pass
|
|
||||||
break
|
|
||||||
|
|
||||||
start_ts = min(start_candidates) if start_candidates else started_at
|
if cancelled:
|
||||||
end_ts = max(end_candidates) if end_candidates else None
|
|
||||||
|
|
||||||
if cancelled_marker.exists():
|
|
||||||
notes.append("Cancellation marker detected.")
|
notes.append("Cancellation marker detected.")
|
||||||
return "CANCELLED", host_results, log_text, start_ts, end_ts, currents_url, notes
|
return "CANCELLED", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
|
|
||||||
if active:
|
if active:
|
||||||
elapsed = (now_utc() - started_at).total_seconds()
|
elapsed = (now_utc() - started_at).total_seconds()
|
||||||
if elapsed > args.max_watch_seconds:
|
if elapsed > args.max_watch_seconds:
|
||||||
notes.append("Watcher exceeded max watch duration while the run still appears active.")
|
notes.append("Watcher exceeded max watch duration while the run still appears active.")
|
||||||
return "HUNG", host_results, log_text, start_ts, end_ts, currents_url, notes
|
return "HUNG", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
return "RUNNING", host_results, log_text, start_ts, end_ts, currents_url, notes
|
return "RUNNING", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
|
|
||||||
if "Cloud Run Finished" in log_text or currents_url:
|
terminal_subruns = [subrun for subrun in subrun_states if subrun["state"] in {"COMPLETED", "FAILED"}]
|
||||||
state = "FAILED" if any(result.failures for result in host_results.values()) else "COMPLETED"
|
if terminal_subruns:
|
||||||
notes.append("Run finished and final reporting artifacts were detected.")
|
state = "FAILED" if any(result.failures for result in parent_host_results.values()) else "COMPLETED"
|
||||||
if any("check-xml-files.ts" in line for line in log_text.splitlines()):
|
notes.append("Run finished and one or more sub-run result artifacts were detected.")
|
||||||
notes.append("Final `check-xml-files.ts` validation passed.")
|
if end_ts:
|
||||||
return state, host_results, log_text, start_ts, end_ts, currents_url, notes
|
notes.append("Final reporting artifacts were detected.")
|
||||||
|
return state, subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
|
|
||||||
if process_gone_since and (now_utc() - process_gone_since).total_seconds() >= process_exit_grace_seconds:
|
if process_gone_since and (now_utc() - process_gone_since).total_seconds() >= process_exit_grace_seconds:
|
||||||
notes.append("Run process exited without a clean completion signal.")
|
notes.append("Run process exited without a clean completion signal.")
|
||||||
return "TERMINATED", host_results, log_text, start_ts, end_ts, currents_url, notes
|
return "TERMINATED", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
|
|
||||||
return "RUNNING", host_results, log_text, start_ts, end_ts, currents_url, notes
|
return "RUNNING", subrun_states, parent_host_results, start_ts, end_ts, currents_url, notes
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
@@ -455,12 +646,13 @@ if __name__ == "__main__":
|
|||||||
if active:
|
if active:
|
||||||
process_gone_since = None
|
process_gone_since = None
|
||||||
|
|
||||||
run_state, host_results, log_text, start_ts, end_ts, currents_url, notes = determine_state(
|
run_state, subrun_states, host_results, start_ts, end_ts, currents_url, notes = determine_state(
|
||||||
build_name=build_name,
|
build_name=build_name,
|
||||||
build_dir=build_dir,
|
build_dir=build_dir,
|
||||||
run_log=run_log,
|
run_log=run_log,
|
||||||
reporter_root=reporter_root,
|
reporter_root=reporter_root,
|
||||||
inventory=inventory,
|
inventory=inventory,
|
||||||
|
metadata=metadata,
|
||||||
started_at=started_at,
|
started_at=started_at,
|
||||||
process_gone_since=process_gone_since,
|
process_gone_since=process_gone_since,
|
||||||
process_exit_grace_seconds=args.process_exit_grace_seconds,
|
process_exit_grace_seconds=args.process_exit_grace_seconds,
|
||||||
@@ -478,8 +670,64 @@ if __name__ == "__main__":
|
|||||||
}
|
}
|
||||||
for host, result in host_results.items()
|
for host, result in host_results.items()
|
||||||
}
|
}
|
||||||
|
state["subruns"] = {
|
||||||
|
subrun["display_name"]: {
|
||||||
|
"state": subrun["state"],
|
||||||
|
"hosts": sorted(subrun["host_results"].keys()),
|
||||||
|
"start_ts": subrun["start_ts"].isoformat() if subrun["start_ts"] else None,
|
||||||
|
"end_ts": subrun["end_ts"].isoformat() if subrun["end_ts"] else None,
|
||||||
|
"currents_url": subrun["currents_url"],
|
||||||
|
"notes": subrun["notes"],
|
||||||
|
}
|
||||||
|
for subrun in subrun_states
|
||||||
|
}
|
||||||
write_state(state_file, state)
|
write_state(state_file, state)
|
||||||
|
|
||||||
|
for subrun in subrun_states:
|
||||||
|
subrun_dir = build_dir / "subruns" / subrun["key"]
|
||||||
|
ensure_dir(subrun_dir)
|
||||||
|
subrun_state_file = subrun_dir / "state.json"
|
||||||
|
subrun_posted_marker = subrun_dir / "posted.marker"
|
||||||
|
subrun_state = {
|
||||||
|
"display_name": subrun["display_name"],
|
||||||
|
"last_state": subrun["state"],
|
||||||
|
"last_seen_at": now_utc().isoformat(),
|
||||||
|
"host_results": {
|
||||||
|
host: {
|
||||||
|
"status": result.status,
|
||||||
|
"detail": result.detail,
|
||||||
|
"kernel": result.kernel,
|
||||||
|
"tests": result.tests,
|
||||||
|
"failures": result.failures,
|
||||||
|
}
|
||||||
|
for host, result in subrun["host_results"].items()
|
||||||
|
},
|
||||||
|
"notes": subrun["notes"],
|
||||||
|
"currents_url": subrun["currents_url"],
|
||||||
|
"started_at": subrun["start_ts"].isoformat() if subrun["start_ts"] else None,
|
||||||
|
"ended_at": subrun["end_ts"].isoformat() if subrun["end_ts"] else None,
|
||||||
|
}
|
||||||
|
if subrun["state"] in {"COMPLETED", "FAILED"} and not subrun_posted_marker.exists():
|
||||||
|
status_text = build_status_markdown(
|
||||||
|
build_name=subrun["display_name"],
|
||||||
|
metadata=metadata,
|
||||||
|
host_results=dict(sorted(subrun["host_results"].items())),
|
||||||
|
run_state=subrun["state"],
|
||||||
|
currents_url=subrun["currents_url"],
|
||||||
|
start_ts=subrun["start_ts"],
|
||||||
|
end_ts=subrun["end_ts"],
|
||||||
|
notes=subrun["notes"],
|
||||||
|
)
|
||||||
|
print(status_text)
|
||||||
|
response = post_to_mattermost(status_text)
|
||||||
|
if response != "ok":
|
||||||
|
raise SystemExit(f"Mattermost webhook did not return ok for {subrun['display_name']}: {response!r}")
|
||||||
|
subrun_posted_marker.write_text("ok\n", encoding="utf-8")
|
||||||
|
subrun_state["mattermost_posted"] = True
|
||||||
|
subrun_state["mattermost_response"] = response
|
||||||
|
print(f"[watcher] Mattermost post confirmed for {subrun['display_name']}.")
|
||||||
|
write_state(subrun_state_file, subrun_state)
|
||||||
|
|
||||||
if run_state == "RUNNING":
|
if run_state == "RUNNING":
|
||||||
print(f"[watcher] {build_name}: RUNNING")
|
print(f"[watcher] {build_name}: RUNNING")
|
||||||
time.sleep(args.poll_interval)
|
time.sleep(args.poll_interval)
|
||||||
@@ -497,7 +745,7 @@ if __name__ == "__main__":
|
|||||||
)
|
)
|
||||||
print(status_text)
|
print(status_text)
|
||||||
|
|
||||||
if run_state in {"COMPLETED", "FAILED"} and not posted_marker.exists():
|
if not metadata.get("categorized") and run_state in {"COMPLETED", "FAILED"} and not posted_marker.exists():
|
||||||
response = post_to_mattermost(status_text)
|
response = post_to_mattermost(status_text)
|
||||||
if response != "ok":
|
if response != "ok":
|
||||||
raise SystemExit(f"Mattermost webhook did not return ok: {response!r}")
|
raise SystemExit(f"Mattermost webhook did not return ok: {response!r}")
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ Options:
|
|||||||
--migration-style <text>
|
--migration-style <text>
|
||||||
--integration-plugin <text>
|
--integration-plugin <text>
|
||||||
--scope-description <text>
|
--scope-description <text>
|
||||||
|
--categorize
|
||||||
--state-root <path> Default: /var/lib/atvm-run-watcher
|
--state-root <path> Default: /var/lib/atvm-run-watcher
|
||||||
EOF
|
EOF
|
||||||
}
|
}
|
||||||
@@ -23,6 +24,7 @@ CONFIG_FAMILY=""
|
|||||||
MIGRATION_STYLE=""
|
MIGRATION_STYLE=""
|
||||||
INTEGRATION_PLUGIN=""
|
INTEGRATION_PLUGIN=""
|
||||||
SCOPE_DESCRIPTION=""
|
SCOPE_DESCRIPTION=""
|
||||||
|
WATCHER_CATEGORIZED="false"
|
||||||
STATE_ROOT="/var/lib/atvm-run-watcher"
|
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||||
|
|
||||||
while [[ $# -gt 0 ]]; do
|
while [[ $# -gt 0 ]]; do
|
||||||
@@ -33,6 +35,7 @@ while [[ $# -gt 0 ]]; do
|
|||||||
--migration-style) MIGRATION_STYLE="${2:-}"; shift 2 ;;
|
--migration-style) MIGRATION_STYLE="${2:-}"; shift 2 ;;
|
||||||
--integration-plugin) INTEGRATION_PLUGIN="${2:-}"; shift 2 ;;
|
--integration-plugin) INTEGRATION_PLUGIN="${2:-}"; shift 2 ;;
|
||||||
--scope-description) SCOPE_DESCRIPTION="${2:-}"; shift 2 ;;
|
--scope-description) SCOPE_DESCRIPTION="${2:-}"; shift 2 ;;
|
||||||
|
--categorize) WATCHER_CATEGORIZED="true"; shift ;;
|
||||||
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
|
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
|
||||||
-h|--help) usage; exit 0 ;;
|
-h|--help) usage; exit 0 ;;
|
||||||
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
||||||
@@ -54,6 +57,7 @@ ATVM_WATCHER_CONFIG_FAMILY=${CONFIG_FAMILY@Q}
|
|||||||
ATVM_WATCHER_MIGRATION_STYLE=${MIGRATION_STYLE@Q}
|
ATVM_WATCHER_MIGRATION_STYLE=${MIGRATION_STYLE@Q}
|
||||||
ATVM_WATCHER_INTEGRATION_PLUGIN=${INTEGRATION_PLUGIN@Q}
|
ATVM_WATCHER_INTEGRATION_PLUGIN=${INTEGRATION_PLUGIN@Q}
|
||||||
ATVM_WATCHER_SCOPE_DESCRIPTION=${SCOPE_DESCRIPTION@Q}
|
ATVM_WATCHER_SCOPE_DESCRIPTION=${SCOPE_DESCRIPTION@Q}
|
||||||
|
ATVM_WATCHER_CATEGORIZED=${WATCHER_CATEGORIZED@Q}
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
systemctl start "atvm-run-watcher@${BUILD_NAME}.service"
|
systemctl start "atvm-run-watcher@${BUILD_NAME}.service"
|
||||||
|
|||||||
Reference in New Issue
Block a user