Prevent ATVM watcher and runner log race
This commit is contained in:
@@ -80,6 +80,7 @@ This file defines how to operate and maintain the ATVM workspace in `/home/aw/co
|
|||||||
- When the watcher is requested, build the watcher-start command so it automatically includes the exact approved `cmc-templates.py` command via `--template-command` and the exact approved `run-sorry-cypress.py` command via `--runner-command`; the operator should not need to restate them separately.
|
- When the watcher is requested, build the watcher-start command so it automatically includes the exact approved `cmc-templates.py` command via `--template-command` and the exact approved `run-sorry-cypress.py` command via `--runner-command`; the operator should not need to restate them separately.
|
||||||
- When watcher-backed execution is used, prefer the controller-local `atvm-runner@...` systemd service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
- When watcher-backed execution is used, prefer the controller-local `atvm-runner@...` systemd service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
||||||
- Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
- Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
||||||
|
- Prefer the combined `start-atvm-run.sh` wrapper when starting both services so watcher and runner are never launched in parallel against the same `/tmp/<build-name>.log`.
|
||||||
- For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`.
|
- For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`.
|
||||||
- Apply failed-host detail recovery consistently for every ATVM template run, not just `cmc-reboot`.
|
- Apply failed-host detail recovery consistently for every ATVM template run, not just `cmc-reboot`.
|
||||||
- For any failed ATVM host, recover failure detail in this order when available: consolidated run log, `mochawesome`, structured reporter artifacts (`json`/`xml`), then text reporter artifacts.
|
- For any failed ATVM host, recover failure detail in this order when available: consolidated run log, `mochawesome`, structured reporter artifacts (`json`/`xml`), then text reporter artifacts.
|
||||||
|
|||||||
@@ -216,6 +216,7 @@ Before any new automation request:
|
|||||||
18. If monitoring was requested, keep monitoring until completion and report final outcome.
|
18. If monitoring was requested, keep monitoring until completion and report final outcome.
|
||||||
19. When the watcher is requested, launch the watcher before the runner service.
|
19. When the watcher is requested, launch the watcher before the runner service.
|
||||||
20. Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
20. Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
||||||
|
21. Prefer the combined `start-atvm-run.sh` wrapper when both services are used, so watcher and runner are not launched in parallel.
|
||||||
|
|
||||||
## Requested Test Style
|
## Requested Test Style
|
||||||
When asked for one VM or a VM set:
|
When asked for one VM or a VM set:
|
||||||
|
|||||||
@@ -47,6 +47,7 @@ From the local workspace:
|
|||||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
|
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
|
||||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
|
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
|
||||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
|
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
|
||||||
|
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run.sh`
|
||||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
|
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
|
||||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
|
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
|
||||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
|
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
|
||||||
@@ -103,6 +104,7 @@ Recommended permissions:
|
|||||||
- `run-atvm-runner.sh`
|
- `run-atvm-runner.sh`
|
||||||
- `start-atvm-runner.sh`
|
- `start-atvm-runner.sh`
|
||||||
- `cancel-atvm-runner.sh`
|
- `cancel-atvm-runner.sh`
|
||||||
|
- `start-atvm-run.sh`
|
||||||
- `atvm_run_watcher.py`
|
- `atvm_run_watcher.py`
|
||||||
- `start-atvm-run-watcher.sh`
|
- `start-atvm-run-watcher.sh`
|
||||||
- `cancel-atvm-run-watcher.sh`
|
- `cancel-atvm-run-watcher.sh`
|
||||||
@@ -149,6 +151,7 @@ mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
|
|||||||
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
|
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
|
||||||
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
|
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
|
||||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
|
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
|
||||||
|
chmod 755 /opt/atvm-watcher-service/start-atvm-run.sh
|
||||||
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
|
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
|
||||||
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
|
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
|
||||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
||||||
@@ -164,6 +167,10 @@ systemctl cat atvm-run-watcher@.service
|
|||||||
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/opt/atvm-watcher-service/start-atvm-run.sh --help
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
/opt/atvm-watcher-service/start-atvm-runner.sh --help
|
/opt/atvm-watcher-service/start-atvm-runner.sh --help
|
||||||
```
|
```
|
||||||
@@ -201,6 +208,21 @@ Example:
|
|||||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Preferred combined start:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/opt/atvm-watcher-service/start-atvm-run.sh \
|
||||||
|
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||||
|
--template cmc-e2e \
|
||||||
|
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
||||||
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
||||||
|
--config-family gold \
|
||||||
|
--config-file cypress.atvm-config-gold.ts \
|
||||||
|
--migration-style "ATVM end-to-end migration validation" \
|
||||||
|
--integration-plugin "pure with fc" \
|
||||||
|
--categorize
|
||||||
|
```
|
||||||
|
|
||||||
Cancel example:
|
Cancel example:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -226,6 +248,7 @@ The cancel helper should:
|
|||||||
- One runner instance is started per ATVM run.
|
- One runner instance is started per ATVM run.
|
||||||
- One watcher instance is started per ATVM run.
|
- One watcher instance is started per ATVM run.
|
||||||
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
||||||
|
- Prefer `start-atvm-run.sh` over launching watcher and runner separately when both are needed, because it enforces the safe watcher-first order.
|
||||||
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
|
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
|
||||||
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
|
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
|
||||||
- The watcher exits after the run reaches a terminal state.
|
- The watcher exits after the run reaches a terminal state.
|
||||||
|
|||||||
@@ -31,6 +31,8 @@ The watcher does not run indefinitely. It is designed for one run per service in
|
|||||||
- helper to write per-run runner environment data and start a runner instance
|
- helper to write per-run runner environment data and start a runner instance
|
||||||
- `cancel-atvm-runner.sh`
|
- `cancel-atvm-runner.sh`
|
||||||
- helper to stop a runner instance
|
- helper to stop a runner instance
|
||||||
|
- `start-atvm-run.sh`
|
||||||
|
- wrapper that starts watcher first, waits for it to be active, then starts the runner
|
||||||
- `start-atvm-run-watcher.sh`
|
- `start-atvm-run-watcher.sh`
|
||||||
- helper to write per-run environment data and start a watcher instance
|
- helper to write per-run environment data and start a watcher instance
|
||||||
- `cancel-atvm-run-watcher.sh`
|
- `cancel-atvm-run-watcher.sh`
|
||||||
@@ -127,6 +129,21 @@ These helpers write per-run environment files and start the matching instances:
|
|||||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Preferred one-shot wrapper:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./start-atvm-run.sh \
|
||||||
|
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||||
|
--template cmc-e2e \
|
||||||
|
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
||||||
|
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
||||||
|
--config-family gold \
|
||||||
|
--config-file cypress.atvm-config-gold.ts \
|
||||||
|
--migration-style "ATVM end-to-end migration validation" \
|
||||||
|
--integration-plugin "pure with fc" \
|
||||||
|
--categorize
|
||||||
|
```
|
||||||
|
|
||||||
That results in:
|
That results in:
|
||||||
|
|
||||||
- state dir:
|
- state dir:
|
||||||
@@ -159,6 +176,7 @@ Runner cancel example:
|
|||||||
|
|
||||||
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
||||||
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
|
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
|
||||||
|
- Prefer `start-atvm-run.sh` when launching both services together because it prevents the watcher/runner log-path race by enforcing watcher-first ordering.
|
||||||
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
||||||
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
||||||
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
||||||
|
|||||||
129
atvm/watcher-service/start-atvm-run.sh
Normal file
129
atvm/watcher-service/start-atvm-run.sh
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<'EOF'
|
||||||
|
Usage:
|
||||||
|
start-atvm-run.sh --build-name <name> --runner-command <text> [options]
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--build-name <name>
|
||||||
|
--template <name>
|
||||||
|
--template-command <text>
|
||||||
|
--runner-command <text>
|
||||||
|
--config-family <name>
|
||||||
|
--config-file <path>
|
||||||
|
--migration-style <text>
|
||||||
|
--integration-plugin <text>
|
||||||
|
--extra-option <text> Repeatable
|
||||||
|
--scope-description <text>
|
||||||
|
--categorize
|
||||||
|
--workdir <path> Default: /root/cdc-e2e-cyp-12.17.4
|
||||||
|
--log-path <path> Default: /tmp/<build-name>.log
|
||||||
|
--state-root <path> Default: /var/lib/atvm-run-watcher
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
BUILD_NAME=""
|
||||||
|
TEMPLATE=""
|
||||||
|
TEMPLATE_COMMAND=""
|
||||||
|
RUNNER_COMMAND=""
|
||||||
|
CONFIG_FAMILY=""
|
||||||
|
CONFIG_FILE=""
|
||||||
|
MIGRATION_STYLE=""
|
||||||
|
INTEGRATION_PLUGIN=""
|
||||||
|
EXTRA_OPTIONS=()
|
||||||
|
SCOPE_DESCRIPTION=""
|
||||||
|
WATCHER_CATEGORIZED="false"
|
||||||
|
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
||||||
|
RUNNER_LOG=""
|
||||||
|
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||||
|
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
|
||||||
|
--template) TEMPLATE="${2:-}"; shift 2 ;;
|
||||||
|
--template-command) TEMPLATE_COMMAND="${2:-}"; shift 2 ;;
|
||||||
|
--runner-command) RUNNER_COMMAND="${2:-}"; shift 2 ;;
|
||||||
|
--config-family) CONFIG_FAMILY="${2:-}"; shift 2 ;;
|
||||||
|
--config-file) CONFIG_FILE="${2:-}"; shift 2 ;;
|
||||||
|
--migration-style) MIGRATION_STYLE="${2:-}"; shift 2 ;;
|
||||||
|
--integration-plugin) INTEGRATION_PLUGIN="${2:-}"; shift 2 ;;
|
||||||
|
--extra-option) EXTRA_OPTIONS+=("${2:-}"); shift 2 ;;
|
||||||
|
--scope-description) SCOPE_DESCRIPTION="${2:-}"; shift 2 ;;
|
||||||
|
--categorize) WATCHER_CATEGORIZED="true"; shift ;;
|
||||||
|
--workdir) RUNNER_WORKDIR="${2:-}"; shift 2 ;;
|
||||||
|
--log-path) RUNNER_LOG="${2:-}"; shift 2 ;;
|
||||||
|
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
|
||||||
|
-h|--help) usage; exit 0 ;;
|
||||||
|
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ -z "$BUILD_NAME" ]]; then
|
||||||
|
echo "--build-name is required" >&2
|
||||||
|
usage >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$RUNNER_COMMAND" ]]; then
|
||||||
|
echo "--runner-command is required" >&2
|
||||||
|
usage >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "$RUNNER_LOG" ]]; then
|
||||||
|
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
||||||
|
fi
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
|
||||||
|
watcher_cmd=(
|
||||||
|
"${SCRIPT_DIR}/start-atvm-run-watcher.sh"
|
||||||
|
--build-name "${BUILD_NAME}"
|
||||||
|
--template "${TEMPLATE}"
|
||||||
|
--template-command "${TEMPLATE_COMMAND}"
|
||||||
|
--runner-command "${RUNNER_COMMAND}"
|
||||||
|
--config-family "${CONFIG_FAMILY}"
|
||||||
|
--config-file "${CONFIG_FILE}"
|
||||||
|
--migration-style "${MIGRATION_STYLE}"
|
||||||
|
--integration-plugin "${INTEGRATION_PLUGIN}"
|
||||||
|
--state-root "${STATE_ROOT}"
|
||||||
|
)
|
||||||
|
|
||||||
|
for option in "${EXTRA_OPTIONS[@]}"; do
|
||||||
|
watcher_cmd+=(--extra-option "${option}")
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ -n "${SCOPE_DESCRIPTION}" ]]; then
|
||||||
|
watcher_cmd+=(--scope-description "${SCOPE_DESCRIPTION}")
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "${WATCHER_CATEGORIZED}" == "true" ]]; then
|
||||||
|
watcher_cmd+=(--categorize)
|
||||||
|
fi
|
||||||
|
|
||||||
|
runner_cmd=(
|
||||||
|
"${SCRIPT_DIR}/start-atvm-runner.sh"
|
||||||
|
--build-name "${BUILD_NAME}"
|
||||||
|
--runner-command "${RUNNER_COMMAND}"
|
||||||
|
--workdir "${RUNNER_WORKDIR}"
|
||||||
|
--log-path "${RUNNER_LOG}"
|
||||||
|
--state-root "${STATE_ROOT}"
|
||||||
|
)
|
||||||
|
|
||||||
|
"${watcher_cmd[@]}"
|
||||||
|
|
||||||
|
for _ in {1..15}; do
|
||||||
|
if systemctl is-active --quiet "atvm-run-watcher@${BUILD_NAME}.service"; then
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
sleep 1
|
||||||
|
done
|
||||||
|
|
||||||
|
if ! systemctl is-active --quiet "atvm-run-watcher@${BUILD_NAME}.service"; then
|
||||||
|
echo "Watcher service did not become active for ${BUILD_NAME}" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
"${runner_cmd[@]}"
|
||||||
Reference in New Issue
Block a user