Просмотр исходного кода

docs: define report generation gate and retries

jimmylee 2 недель назад
Родитель
Сommit
457af99003

+ 112 - 8
docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md

@@ -105,17 +105,15 @@ Both keys are required. Values may be empty strings only if the evaluator fails
 
 ## Overall Evaluation
 
-Overall evaluation is generated only when every expected student answer has a completed pronunciation evaluation.
+Overall evaluation is generated only when the conversation has ended and every expected student answer has a completed pronunciation evaluation.
 
-The ready condition is:
+The report API status should be one of:
 
-- The session has at least one student message.
-- Every student message has a `pronunciation_evaluation` row.
-- Every evaluation has `status = "completed"`.
-
-If any evaluation is still `pending`, `/report` returns `status: "evaluating"` and should not generate the overall report.
+```text
+evaluating | ready | failed | incomplete
+```
 
-If any evaluation has `status = "failed"`, `/report` returns `status: "failed"` for the full report and should not generate the complete overall report. The frontend may still show the available detailed rows. A failed pronunciation evaluation can still include text-only `contentFeedback`.
+If the conversation has not ended or any evaluation is still `pending`, `/report` returns `status: "evaluating"` and should not generate the overall report. A failed pronunciation evaluation can still include text-only `contentFeedback`, but it blocks Full OverallReport.
 
 When all sentence evaluations are completed, the backend generates and caches the overall report. Overall output owns:
 
@@ -130,6 +128,111 @@ These fields must be based on all expected student answers, all Azure scores, al
 
 Full OverallReport still requires all expected student answers to have completed Azure pronunciation scores. Text-only sentence feedback is useful for `DetailedReport`, but it is not enough to compute complete aggregate pronunciation scores.
 
+## Overall Generation Gate
+
+`/report` should decide whether Full OverallReport can be generated with an explicit gate:
+
+```text
+1. The conversation has ended.
+2. At least one valid student answer exists.
+3. Every saved student answer has a PronunciationEvaluation row.
+4. Every PronunciationEvaluation is terminal.
+5. Every PronunciationEvaluation has status = completed.
+6. OverallReport has not already been generated and cached.
+```
+
+The first implementation should use saved student messages as the expected answer set, not `total_rounds`. `total_rounds` is the configured target, but it may not equal the final number of valid answers because time mode can expire, the user can end practice manually, ASR can fail before a student message is created, or the final AI closing message should not create another expected student answer.
+
+Recommended status decisions:
+
+```text
+evaluating:
+  - conversation has not ended; or
+  - a saved student answer has no evaluation row yet; or
+  - at least one evaluation is still pending/running/retrying
+
+ready:
+  - conversation has ended; and
+  - at least one valid student answer exists; and
+  - every saved student answer has completed Azure pronunciation scores; and
+  - overall has been generated or can be generated now
+
+failed:
+  - conversation has ended; and
+  - at least one valid student answer exists; and
+  - any evaluation is failed after retries
+
+incomplete:
+  - conversation has ended; and
+  - no valid student answers exist; or
+  - the product later requires a minimum valid answer count and the session did not reach it
+```
+
+If the product later requires exactly three valid answers for a completed practice, that requirement should be added as a product-level gate. It should not be inferred from `total_rounds` inside the report generator without an explicit requirement.
+
+## Bounded Retries
+
+Automatic retries must be bounded and owned by the worker performing that stage.
+
+`/report` must never create or retry sentence-evaluation work. It only reads sentence evaluation state and, when the Overall Generation Gate passes, generates or returns cached OverallReport data. This prevents frontend polling from starting duplicate sentence jobs.
+
+Each external call should have an explicit maximum attempt count:
+
+```text
+Azure pronunciation scoring: max_attempts = 3
+Sentence feedback LLM: max_attempts = 1 or 2
+OverallReport LLM: max_attempts = 1 or 2
+```
+
+Attempt exhaustion must always produce a terminal state:
+
+```text
+Azure exhausted:
+  evaluation.status = failed
+  optional text-only contentFeedback may still be attempted
+
+Sentence feedback exhausted after Azure success:
+  evaluation.status = completed
+  contentFeedback = null
+
+Sentence feedback exhausted after Azure failure:
+  evaluation.status = failed
+  contentFeedback = null
+
+OverallReport exhausted:
+  report status = failed
+  no automatic retry on the next /report poll
+```
+
+Once a stage reaches a terminal state, normal `/report` polling must not rerun it:
+
+```text
+evaluation.status = completed
+evaluation.status = failed
+overall report exists
+overall generation failed
+```
+
+Future explicit retry can be added as a separate admin or user action, but it must not be implicit in report polling.
+
+If the first implementation does not add persistent retry counters or job rows, retries should still be bounded inside the single background task:
+
+```py
+for attempt in range(1, max_attempts + 1):
+    try:
+        run_stage()
+        mark_success()
+        return
+    except Exception as exc:
+        log_attempt_failure(attempt, exc)
+        if attempt == max_attempts:
+            mark_terminal_failure()
+            return
+        await sleep(backoff_seconds(attempt))
+```
+
+OverallReport lazy generation needs the same protection. If the backend attempts OverallReport generation inside `/report`, a failed generation must set a terminal failed state for that report response path or use an in-process guard so frontend polling does not repeatedly invoke the OverallReport LLM.
+
 ## API Contract
 
 `GET /report?sessionId=...` remains the frontend entry point.
@@ -249,6 +352,7 @@ If `status !== "ready"`, the frontend should show a report-generating state and
 
 - `status === "ready"`
 - `status === "failed"`
+- `status === "incomplete"`
 - a retry limit is reached
 - the request fails with an unrecoverable error