Prevent ATVM watcher and runner log race
This commit is contained in:
@@ -80,6 +80,7 @@ This file defines how to operate and maintain the ATVM workspace in `/home/aw/co
|
||||
- When the watcher is requested, build the watcher-start command so it automatically includes the exact approved `cmc-templates.py` command via `--template-command` and the exact approved `run-sorry-cypress.py` command via `--runner-command`; the operator should not need to restate them separately.
|
||||
- When watcher-backed execution is used, prefer the controller-local `atvm-runner@...` systemd service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
||||
- Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
||||
- Prefer the combined `start-atvm-run.sh` wrapper when starting both services so watcher and runner are never launched in parallel against the same `/tmp/<build-name>.log`.
|
||||
- For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`.
|
||||
- Apply failed-host detail recovery consistently for every ATVM template run, not just `cmc-reboot`.
|
||||
- For any failed ATVM host, recover failure detail in this order when available: consolidated run log, `mochawesome`, structured reporter artifacts (`json`/`xml`), then text reporter artifacts.
|
||||
|
||||
@@ -216,6 +216,7 @@ Before any new automation request:
|
||||
18. If monitoring was requested, keep monitoring until completion and report final outcome.
|
||||
19. When the watcher is requested, launch the watcher before the runner service.
|
||||
20. Do not start the runner before the watcher, because the watcher helper clears stale `/tmp/<build-name>.log` and can delete the fresh live runner log if the runner starts first.
|
||||
21. Prefer the combined `start-atvm-run.sh` wrapper when both services are used, so watcher and runner are not launched in parallel.
|
||||
|
||||
## Requested Test Style
|
||||
When asked for one VM or a VM set:
|
||||
|
||||
@@ -47,6 +47,7 @@ From the local workspace:
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-runner@.service`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-runner.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-runner.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/atvm-run-watcher@.service`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/start-atvm-run-watcher.sh`
|
||||
- `/home/aw/code/cds/atvm/watcher-service/cancel-atvm-run-watcher.sh`
|
||||
@@ -103,6 +104,7 @@ Recommended permissions:
|
||||
- `run-atvm-runner.sh`
|
||||
- `start-atvm-runner.sh`
|
||||
- `cancel-atvm-runner.sh`
|
||||
- `start-atvm-run.sh`
|
||||
- `atvm_run_watcher.py`
|
||||
- `start-atvm-run-watcher.sh`
|
||||
- `cancel-atvm-run-watcher.sh`
|
||||
@@ -149,6 +151,7 @@ mkdir -p /opt/atvm-watcher-service /var/lib/atvm-run-watcher
|
||||
chmod 755 /opt/atvm-watcher-service/run-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-runner.sh
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-run.sh
|
||||
chmod 755 /opt/atvm-watcher-service/atvm_run_watcher.py
|
||||
chmod 755 /opt/atvm-watcher-service/start-atvm-run-watcher.sh
|
||||
chmod 755 /opt/atvm-watcher-service/cancel-atvm-run-watcher.sh
|
||||
@@ -164,6 +167,10 @@ systemctl cat atvm-run-watcher@.service
|
||||
python3 /opt/atvm-watcher-service/atvm_run_watcher.py --help
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-run.sh --help
|
||||
```
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-runner.sh --help
|
||||
```
|
||||
@@ -201,6 +208,21 @@ Example:
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||
```
|
||||
|
||||
Preferred combined start:
|
||||
|
||||
```bash
|
||||
/opt/atvm-watcher-service/start-atvm-run.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||
--template cmc-e2e \
|
||||
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
||||
--config-family gold \
|
||||
--config-file cypress.atvm-config-gold.ts \
|
||||
--migration-style "ATVM end-to-end migration validation" \
|
||||
--integration-plugin "pure with fc" \
|
||||
--categorize
|
||||
```
|
||||
|
||||
Cancel example:
|
||||
|
||||
```bash
|
||||
@@ -226,6 +248,7 @@ The cancel helper should:
|
||||
- One runner instance is started per ATVM run.
|
||||
- One watcher instance is started per ATVM run.
|
||||
- Prefer the `atvm-runner@...` service over detached SSH background launch patterns for `run-sorry-cypress.py`.
|
||||
- Prefer `start-atvm-run.sh` over launching watcher and runner separately when both are needed, because it enforces the safe watcher-first order.
|
||||
- Categorized execution is treated as one watcher instance tracking sequential grouped ATVM sub-runs.
|
||||
- In categorized execution, the watcher must remain alive until the parent request has actually gone inactive past the grace window, even if one grouped sub-run already completed.
|
||||
- The watcher exits after the run reaches a terminal state.
|
||||
|
||||
@@ -31,6 +31,8 @@ The watcher does not run indefinitely. It is designed for one run per service in
|
||||
- helper to write per-run runner environment data and start a runner instance
|
||||
- `cancel-atvm-runner.sh`
|
||||
- helper to stop a runner instance
|
||||
- `start-atvm-run.sh`
|
||||
- wrapper that starts watcher first, waits for it to be active, then starts the runner
|
||||
- `start-atvm-run-watcher.sh`
|
||||
- helper to write per-run environment data and start a watcher instance
|
||||
- `cancel-atvm-run-watcher.sh`
|
||||
@@ -127,6 +129,21 @@ These helpers write per-run environment files and start the matching instances:
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize"
|
||||
```
|
||||
|
||||
Preferred one-shot wrapper:
|
||||
|
||||
```bash
|
||||
./start-atvm-run.sh \
|
||||
--build-name e2e-redhat9.6-ubuntu24.04-w2k25-fc \
|
||||
--template cmc-e2e \
|
||||
--template-command "python3 ./cmc-templates.py --template_name cmc-e2e --config_file cypress.atvm-config-gold.ts" \
|
||||
--runner-command "python3 ./run-sorry-cypress.py --config_file cypress.atvm-config-gold.ts --build_name e2e-redhat9.6-ubuntu24.04-w2k25-fc --categorize" \
|
||||
--config-family gold \
|
||||
--config-file cypress.atvm-config-gold.ts \
|
||||
--migration-style "ATVM end-to-end migration validation" \
|
||||
--integration-plugin "pure with fc" \
|
||||
--categorize
|
||||
```
|
||||
|
||||
That results in:
|
||||
|
||||
- state dir:
|
||||
@@ -159,6 +176,7 @@ Runner cancel example:
|
||||
|
||||
- The watcher uses the same ATVM status layout documented in `atvm/docs/automation/status-template.md`.
|
||||
- Prefer the controller-local `atvm-runner@...` service over ad hoc `nohup` or detached SSH launch patterns for `run-sorry-cypress.py`.
|
||||
- Prefer `start-atvm-run.sh` when launching both services together because it prevents the watcher/runner log-path race by enforcing watcher-first ordering.
|
||||
- Kernel values are resolved from `atvm/inventory/vm-inventory.md`.
|
||||
- Categorized execution is treated as sequential grouped ATVM sub-runs, not as one parent run with internal phases.
|
||||
- In categorized mode, the watcher writes per-subrun state under `subruns/` and posts each completed grouped run separately.
|
||||
|
||||
129
atvm/watcher-service/start-atvm-run.sh
Normal file
129
atvm/watcher-service/start-atvm-run.sh
Normal file
@@ -0,0 +1,129 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
Usage:
|
||||
start-atvm-run.sh --build-name <name> --runner-command <text> [options]
|
||||
|
||||
Options:
|
||||
--build-name <name>
|
||||
--template <name>
|
||||
--template-command <text>
|
||||
--runner-command <text>
|
||||
--config-family <name>
|
||||
--config-file <path>
|
||||
--migration-style <text>
|
||||
--integration-plugin <text>
|
||||
--extra-option <text> Repeatable
|
||||
--scope-description <text>
|
||||
--categorize
|
||||
--workdir <path> Default: /root/cdc-e2e-cyp-12.17.4
|
||||
--log-path <path> Default: /tmp/<build-name>.log
|
||||
--state-root <path> Default: /var/lib/atvm-run-watcher
|
||||
EOF
|
||||
}
|
||||
|
||||
BUILD_NAME=""
|
||||
TEMPLATE=""
|
||||
TEMPLATE_COMMAND=""
|
||||
RUNNER_COMMAND=""
|
||||
CONFIG_FAMILY=""
|
||||
CONFIG_FILE=""
|
||||
MIGRATION_STYLE=""
|
||||
INTEGRATION_PLUGIN=""
|
||||
EXTRA_OPTIONS=()
|
||||
SCOPE_DESCRIPTION=""
|
||||
WATCHER_CATEGORIZED="false"
|
||||
RUNNER_WORKDIR="/root/cdc-e2e-cyp-12.17.4"
|
||||
RUNNER_LOG=""
|
||||
STATE_ROOT="/var/lib/atvm-run-watcher"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--build-name) BUILD_NAME="${2:-}"; shift 2 ;;
|
||||
--template) TEMPLATE="${2:-}"; shift 2 ;;
|
||||
--template-command) TEMPLATE_COMMAND="${2:-}"; shift 2 ;;
|
||||
--runner-command) RUNNER_COMMAND="${2:-}"; shift 2 ;;
|
||||
--config-family) CONFIG_FAMILY="${2:-}"; shift 2 ;;
|
||||
--config-file) CONFIG_FILE="${2:-}"; shift 2 ;;
|
||||
--migration-style) MIGRATION_STYLE="${2:-}"; shift 2 ;;
|
||||
--integration-plugin) INTEGRATION_PLUGIN="${2:-}"; shift 2 ;;
|
||||
--extra-option) EXTRA_OPTIONS+=("${2:-}"); shift 2 ;;
|
||||
--scope-description) SCOPE_DESCRIPTION="${2:-}"; shift 2 ;;
|
||||
--categorize) WATCHER_CATEGORIZED="true"; shift ;;
|
||||
--workdir) RUNNER_WORKDIR="${2:-}"; shift 2 ;;
|
||||
--log-path) RUNNER_LOG="${2:-}"; shift 2 ;;
|
||||
--state-root) STATE_ROOT="${2:-}"; shift 2 ;;
|
||||
-h|--help) usage; exit 0 ;;
|
||||
*) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [[ -z "$BUILD_NAME" ]]; then
|
||||
echo "--build-name is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "$RUNNER_COMMAND" ]]; then
|
||||
echo "--runner-command is required" >&2
|
||||
usage >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "$RUNNER_LOG" ]]; then
|
||||
RUNNER_LOG="/tmp/${BUILD_NAME}.log"
|
||||
fi
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
watcher_cmd=(
|
||||
"${SCRIPT_DIR}/start-atvm-run-watcher.sh"
|
||||
--build-name "${BUILD_NAME}"
|
||||
--template "${TEMPLATE}"
|
||||
--template-command "${TEMPLATE_COMMAND}"
|
||||
--runner-command "${RUNNER_COMMAND}"
|
||||
--config-family "${CONFIG_FAMILY}"
|
||||
--config-file "${CONFIG_FILE}"
|
||||
--migration-style "${MIGRATION_STYLE}"
|
||||
--integration-plugin "${INTEGRATION_PLUGIN}"
|
||||
--state-root "${STATE_ROOT}"
|
||||
)
|
||||
|
||||
for option in "${EXTRA_OPTIONS[@]}"; do
|
||||
watcher_cmd+=(--extra-option "${option}")
|
||||
done
|
||||
|
||||
if [[ -n "${SCOPE_DESCRIPTION}" ]]; then
|
||||
watcher_cmd+=(--scope-description "${SCOPE_DESCRIPTION}")
|
||||
fi
|
||||
|
||||
if [[ "${WATCHER_CATEGORIZED}" == "true" ]]; then
|
||||
watcher_cmd+=(--categorize)
|
||||
fi
|
||||
|
||||
runner_cmd=(
|
||||
"${SCRIPT_DIR}/start-atvm-runner.sh"
|
||||
--build-name "${BUILD_NAME}"
|
||||
--runner-command "${RUNNER_COMMAND}"
|
||||
--workdir "${RUNNER_WORKDIR}"
|
||||
--log-path "${RUNNER_LOG}"
|
||||
--state-root "${STATE_ROOT}"
|
||||
)
|
||||
|
||||
"${watcher_cmd[@]}"
|
||||
|
||||
for _ in {1..15}; do
|
||||
if systemctl is-active --quiet "atvm-run-watcher@${BUILD_NAME}.service"; then
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if ! systemctl is-active --quiet "atvm-run-watcher@${BUILD_NAME}.service"; then
|
||||
echo "Watcher service did not become active for ${BUILD_NAME}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
"${runner_cmd[@]}"
|
||||
Reference in New Issue
Block a user