Split ATVM failure notes from general status notes

This commit is contained in:
2026-03-30 22:31:41 -04:00
parent 18dcbc89f9
commit 7ab5daeca8
5 changed files with 49 additions and 16 deletions

View File

@@ -80,7 +80,8 @@ This file defines how to operate and maintain the ATVM workspace in `/home/aw/co
- For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`. - For host-level test detail and failed-test investigation, use `/root/cdc-e2e-cyp-12.17.4/cypress/cmcReporter`, especially `logs/`, `xml/`, and `mochawesome/`.
- Apply failed-host detail recovery consistently for every ATVM template run, not just `cmc-reboot`. - Apply failed-host detail recovery consistently for every ATVM template run, not just `cmc-reboot`.
- For any failed ATVM host, recover failure detail in this order when available: consolidated run log, `mochawesome`, structured reporter artifacts (`json`/`xml`), then text reporter artifacts. - For any failed ATVM host, recover failure detail in this order when available: consolidated run log, `mochawesome`, structured reporter artifacts (`json`/`xml`), then text reporter artifacts.
- Keep the `HOSTS` detail column compact with the failing step plus a short error summary, and put the longer trimmed failure excerpt in `NOTES:`. - Keep the `HOSTS` detail column compact with the failing step plus a short error summary only.
- Put richer per-host error excerpts in a dedicated `FAILURE NOTES:` section, and reserve `NOTES:` for non-failure context such as the template command, Currents URL, and operator-facing caveats.
- When reporting `TEST FLOW:` for an ATVM run, prefer the numbered steps extracted from the generated spec for that exact run. - When reporting `TEST FLOW:` for an ATVM run, prefer the numbered steps extracted from the generated spec for that exact run.
- If the generated spec exists, do not rely on a static template flow list for `TEST FLOW:`. - If the generated spec exists, do not rely on a static template flow list for `TEST FLOW:`.
- Only fall back to template-level or static flow definitions when the generated spec cannot be located or parsed. - Only fall back to template-level or static flow definitions when the generated spec cannot be located or parsed.

View File

@@ -253,7 +253,8 @@ When the operator asks for the status of an ATVM automation run, report in this
4. `TIMING:` section with start, end, total, quickest, longest, and average. 4. `TIMING:` section with start, end, total, quickest, longest, and average.
5. `COVERAGE:` section describing what the run was intended to cover, excluding the target-host list. 5. `COVERAGE:` section describing what the run was intended to cover, excluding the target-host list.
6. `TEST FLOW:` section describing the template-specific numbered run flow for the test. 6. `TEST FLOW:` section describing the template-specific numbered run flow for the test.
7. `NOTES:` section for broader context and anomalies. 7. `FAILURE NOTES:` section for detailed per-host error excerpts when failures exist.
8. `NOTES:` section for broader non-failure context and anomalies.
7. Remaining machines still to run. 7. Remaining machines still to run.
8. Summary counts for finished, passed, failed, and skipped machines. 8. Summary counts for finished, passed, failed, and skipped machines.
9. Timing details: 9. Timing details:
@@ -268,15 +269,15 @@ When the operator asks for the status of an ATVM automation run, report in this
Status-report expectations: Status-report expectations:
- Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others). - Use the same display layout for every ATVM automation status response regardless of test type (`e2e`, `systemOS`, `reboot`, `migrateops`, and others).
- Use `/home/aw/code/cds/atvm/docs/automation/status-template.md` as the default template for both local status output and Mattermost status posts. - Use `/home/aw/code/cds/atvm/docs/automation/status-template.md` as the default template for both local status output and Mattermost status posts.
- The default ATVM status template uses flat bullet-list sections for `COVERAGE:` and `TEST FLOW:`, Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:`, and uses `NOTES:` for flat operator-facing notes. - The default ATVM status template uses flat bullet-list sections for `COVERAGE:`, `TEST FLOW:`, `FAILURE NOTES:`, and `NOTES:`, and Markdown tables for `SUMMARY:`, `HOSTS:`, and `TIMING:`.
- Order the status sections as `SUMMARY:`, `HOSTS:`, `TIMING:`, `COVERAGE:`, `TEST FLOW:`, then `NOTES:`. - Order the status sections as `SUMMARY:`, `HOSTS:`, `TIMING:`, `COVERAGE:`, `TEST FLOW:`, `FAILURE NOTES:`, then `NOTES:`.
- Keep `NOTES:` focused on operator-facing value such as the Currents run URL, real anomalies, failure context, or material fallback behavior. - Keep `NOTES:` focused on non-failure operator-facing value such as the Currents run URL, real anomalies unrelated to the direct failure text, or material fallback behavior.
- Include the exact `cmc-templates.py` command used to trigger the ATVM automation run in `NOTES:`, without the outer `sshpass`/`ssh` wrapper and without trimming it. - Include the exact `cmc-templates.py` command used to trigger the ATVM automation run in `NOTES:`, without the outer `sshpass`/`ssh` wrapper and without trimming it.
- Do not include generic watcher bookkeeping messages in `NOTES:` such as artifact-detection confirmations. - Do not include generic watcher bookkeeping messages in `NOTES:` such as artifact-detection confirmations.
- Do not include internal watcher fallback notes in `NOTES:` such as `check-xml-files.ts` validation confirmations or reporter-artifact recovery details. - Do not include internal watcher fallback notes in `NOTES:` such as `check-xml-files.ts` validation confirmations or reporter-artifact recovery details.
- The `HOSTS:` table includes `Host`, `Kernel`, `Status`, and `Detail` columns in that order. - The `HOSTS:` table includes `Host`, `Kernel`, `Status`, and `Detail` columns in that order.
- For any failed host, keep the `Detail` column compact by showing the failing step plus a short error summary, not the full raw stack trace. - For any failed host, keep the `Detail` column compact by showing the failing step plus a short error summary, not the full raw stack trace.
- If richer failure text is available, put the longer trimmed excerpt in `NOTES:` so the result stays readable in Mattermost and local status output. - If richer failure text is available, put the longer trimmed excerpt in `FAILURE NOTES:` so the result stays readable in Mattermost and local status output.
- In `COVERAGE:`, describe the important `cmc-templates.py` command inputs such as template, categorize mode, datastore/config family, config filename, migration style, any real plugin/integration path, and other operator-relevant run options, but do not list target hosts there or include verbose prose scope descriptions. - In `COVERAGE:`, describe the important `cmc-templates.py` command inputs such as template, categorize mode, datastore/config family, config filename, migration style, any real plugin/integration path, and other operator-relevant run options, but do not list target hosts there or include verbose prose scope descriptions.
- Only include coverage fields that the template command actually used. Do not show empty or irrelevant fields such as an integration/plugin path for templates that did not use one. - Only include coverage fields that the template command actually used. Do not show empty or irrelevant fields such as an integration/plugin path for templates that did not use one.
- If `categorize mode: enabled` is already shown in `COVERAGE:`, do not also repeat `--categorize` under `run options`. - If `categorize mode: enabled` is already shown in `COVERAGE:`, do not also repeat `--categorize` under `run options`.
@@ -289,7 +290,7 @@ Status-report expectations:
- If fallback is required, resolve it from the run template name before using any generic default flow. - If fallback is required, resolve it from the run template name before using any generic default flow.
- `cmc-e2e` currently uses the 22-step migration flow documented in `/home/aw/code/cds/atvm/docs/automation/status-template.md`. - `cmc-e2e` currently uses the 22-step migration flow documented in `/home/aw/code/cds/atvm/docs/automation/status-template.md`.
- `cmc-systemOS` currently uses the 21-step boot-disk migration flow documented in `/home/aw/code/cds/atvm/docs/automation/status-template.md`. - `cmc-systemOS` currently uses the 21-step boot-disk migration flow documented in `/home/aw/code/cds/atvm/docs/automation/status-template.md`.
- Keep `NOTES:` behavior consistent across template types; do not add template-specific internal-source notes such as parent-log-summary recovery details. - Keep `FAILURE NOTES:` and `NOTES:` behavior consistent across template types; do not add template-specific internal-source notes such as parent-log-summary recovery details.
- For the `Kernel` column, cross-reference the host name against `/home/aw/code/cds/atvm/inventory/vm-inventory.md`. - For the `Kernel` column, cross-reference the host name against `/home/aw/code/cds/atvm/inventory/vm-inventory.md`.
- If the hostname is not present in `vm-inventory.md`, report the kernel value as `unknown`. - If the hostname is not present in `vm-inventory.md`, report the kernel value as `unknown`.
- Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at `192.168.3.190`, not to Cirrus project operations such as the `atvm - cypress` project. - Treat references to the "ATVM automation run" or "automation run" as referring to this ATVM folder workflow and the automation VM at `192.168.3.190`, not to Cirrus project operations such as the `atvm - cypress` project.

View File

@@ -477,4 +477,13 @@ This file stores run-specific examples only when a run produced a new learning r
- If any ATVM test template fails, the result should still recover the best available failure detail and present it consistently. - If any ATVM test template fails, the result should still recover the best available failure detail and present it consistently.
- Action for future runs: - Action for future runs:
- Use the same failure-detail recovery order for every ATVM template: consolidated run log, `mochawesome`, structured reporter artifacts, then text reporter artifacts. - Use the same failure-detail recovery order for every ATVM template: consolidated run log, `mochawesome`, structured reporter artifacts, then text reporter artifacts.
- Keep failed-host `Detail` compact and put the longer trimmed excerpt in `NOTES:` for every template type. - Keep failed-host `Detail` compact and put the longer trimmed excerpt in `FAILURE NOTES:` for every template type.
## Run Learning: 2026-03-30 (Separate failure detail from general notes in ATVM status output)
- Observed operator requirement:
- The `HOSTS` detail column should stay short and scannable.
- Detailed per-host error text should not crowd the host table or mix with general `NOTES:`.
- Action for future runs:
- Keep `HOSTS` detail to the failing step plus a short error summary only.
- Put richer per-host error excerpts in `FAILURE NOTES:`.
- Reserve `NOTES:` for non-failure context such as template command, Currents URL, and operator-facing caveats.

View File

@@ -51,13 +51,16 @@ Use this as the default ATVM automation run-status template for:
**TEST FLOW:** **TEST FLOW:**
- <template-specific numbered steps> - <template-specific numbered steps>
**FAILURE NOTES:**
- <detailed per-host failure excerpt when a host fails>
**NOTES:** **NOTES:**
- <note> - <note>
- <note> - <note>
``` ```
## Rules ## Rules
- Keep `SUMMARY:`, `HOSTS:`, `TIMING:`, `COVERAGE:`, `TEST FLOW:`, and `NOTES:` in that order. - Keep `SUMMARY:`, `HOSTS:`, `TIMING:`, `COVERAGE:`, `TEST FLOW:`, `FAILURE NOTES:`, and `NOTES:` in that order.
- Use the title format: - Use the title format:
- `## ATVM Run Status` - `## ATVM Run Status`
- `### <build_name>` - `### <build_name>`
@@ -72,10 +75,12 @@ Use this as the default ATVM automation run-status template for:
- `⏳ RUN` - `⏳ RUN`
- `⏭️ SKIP` - `⏭️ SKIP`
- Keep `Detail` concise. - Keep `Detail` concise.
- Put broader context under `NOTES:`, not in the host table. - Put richer per-host failure excerpts under `FAILURE NOTES:`, not in the host table.
- Put broader non-failure context under `NOTES:`.
- When available, put the persistent Currents run URL in `NOTES:` so operators can open the exact recorded run directly. - When available, put the persistent Currents run URL in `NOTES:` so operators can open the exact recorded run directly.
- Include the exact `cmc-templates.py` command used to trigger the run in `NOTES:`, without the outer `sshpass`/`ssh` wrapper. - Include the exact `cmc-templates.py` command used to trigger the run in `NOTES:`, without the outer `sshpass`/`ssh` wrapper.
- Keep `NOTES:` limited to meaningful operator-facing items such as the Currents link, real anomalies, failure context, or important fallback behavior. - Keep `FAILURE NOTES:` limited to detailed per-host error excerpts.
- Keep `NOTES:` limited to meaningful non-failure operator-facing items such as the Currents link, real anomalies, or important fallback behavior.
- Do not include generic watcher bookkeeping lines in `NOTES:` such as "run artifacts were detected" or "final reporting artifacts were detected." - Do not include generic watcher bookkeeping lines in `NOTES:` such as "run artifacts were detected" or "final reporting artifacts were detected."
- Do not include internal fallback notes in `NOTES:` such as "`check-xml-files.ts` validation passed" or "host details were derived from reporter artifacts." - Do not include internal fallback notes in `NOTES:` such as "`check-xml-files.ts` validation passed" or "host details were derived from reporter artifacts."
- `COVERAGE:` should describe what the run was intended to cover without listing target hosts. - `COVERAGE:` should describe what the run was intended to cover without listing target hosts.

View File

@@ -581,7 +581,22 @@ def extract_failure_from_mochawesome(
def summarize_host_detail_with_mochawesome(detail: str, testcase: str, message: str) -> str: def summarize_host_detail_with_mochawesome(detail: str, testcase: str, message: str) -> str:
prefix_match = re.match(r"^(\d+ tests, \d+ failures(?:, \d+ pending)?)", detail) prefix_match = re.match(r"^(\d+ tests, \d+ failures(?:, \d+ pending)?)", detail)
prefix = prefix_match.group(1) if prefix_match else detail prefix = prefix_match.group(1) if prefix_match else detail
message_summary = compact_failure_detail(message or testcase, limit=260) normalized_message = " ".join((message or "").split())
message_summary = ""
for pattern in (
r"(md5sum:\s.*?)(?:$)",
r"(sshpass does not contain OK(?:\.\s*Output:)?)(?:$)",
r"(AssertionError:\s.*?)(?:$)",
r"(Timed out!? .*?)(?:$)",
r"(Error:\s.*?)(?:$)",
):
match = re.search(pattern, normalized_message, re.I)
if match:
message_summary = match.group(1)
break
if not message_summary:
message_summary = normalized_message or testcase
message_summary = compact_failure_detail(message_summary, limit=120)
testcase_summary = compact_failure_detail(testcase, limit=140) testcase_summary = compact_failure_detail(testcase, limit=140)
return f"{prefix} - {testcase_summary} - {message_summary}" return f"{prefix} - {testcase_summary} - {message_summary}"
@@ -1180,7 +1195,7 @@ def build_status_markdown(
longest = max((h for h in ordered_hosts if h.duration_seconds is not None), key=lambda h: h.duration_seconds, default=None) longest = max((h for h in ordered_hosts if h.duration_seconds is not None), key=lambda h: h.duration_seconds, default=None)
average = (sum(durations) / len(durations)) if durations else None average = (sum(durations) / len(durations)) if durations else None
additional_failure_notes: List[str] = [] failure_notes: List[str] = []
for host in ordered_hosts: for host in ordered_hosts:
if host.status != "FAIL": if host.status != "FAIL":
continue continue
@@ -1190,7 +1205,7 @@ def build_status_markdown(
testcase, message, estack = mochawesome_failure testcase, message, estack = mochawesome_failure
host.detail = summarize_host_detail_with_mochawesome(host.detail, testcase, message) host.detail = summarize_host_detail_with_mochawesome(host.detail, testcase, message)
failure_excerpt_source = estack or message failure_excerpt_source = estack or message
additional_failure_notes.append( failure_notes.append(
f"{host.host} failure excerpt: `{compact_failure_detail(failure_excerpt_source, limit=420)}`" f"{host.host} failure excerpt: `{compact_failure_detail(failure_excerpt_source, limit=420)}`"
) )
@@ -1220,8 +1235,7 @@ def build_status_markdown(
notes = notes + [ notes = notes + [
"Both iscsi and fc disks were used for the reboot test. As a result, iscsi disks may not have attached before the mtdi started. So if the test failed, that is most likely the issue." "Both iscsi and fc disks were used for the reboot test. As a result, iscsi disks may not have attached before the mtdi started. So if the test failed, that is most likely the issue."
] ]
notes = notes + additional_failure_notes failure_notes_block = "\n".join(f"- {note}" for note in failure_notes) if failure_notes else "- none"
notes_block = "\n".join(f"- {note}" for note in notes) if notes else "- none" notes_block = "\n".join(f"- {note}" for note in notes) if notes else "- none"
resolved_flow = extract_test_flow_from_generated_spec(reporter_root, log_text) or get_test_flow(metadata.get("template")) resolved_flow = extract_test_flow_from_generated_spec(reporter_root, log_text) or get_test_flow(metadata.get("template"))
test_flow_lines = [f"- {step}" for step in resolved_flow] test_flow_lines = [f"- {step}" for step in resolved_flow]
@@ -1261,6 +1275,9 @@ def build_status_markdown(
"**TEST FLOW:**", "**TEST FLOW:**",
*test_flow_lines, *test_flow_lines,
"", "",
"**FAILURE NOTES:**",
failure_notes_block,
"",
"**NOTES:**", "**NOTES:**",
notes_block, notes_block,
] ]