Просмотр исходного кода

docs: allow text-only sentence feedback fallback

jimmylee 2 недель назад
Родитель
Сommit
5ea87d3c83

+ 21 - 7
docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md

@@ -57,9 +57,10 @@ Each student answer is evaluated asynchronously after `/speak-stream` writes the
 The sentence evaluation job runs in this order:
 
 1. Call Azure Speech API for pronunciation scoring.
-2. If Azure succeeds, call the sentence feedback LLM evaluator.
-3. Store Azure scores and sentence feedback in `pronunciation_evaluation`.
-4. If Azure fails, mark the evaluation `failed` and do not generate sentence feedback.
+2. If Azure succeeds, call the sentence feedback LLM evaluator with the current turn's pronunciation scores.
+3. If Azure fails after retries are exhausted, still call the sentence feedback LLM evaluator in text-only mode.
+4. Store any available Azure scores and sentence feedback in `pronunciation_evaluation`.
+5. If Azure fails, mark the pronunciation evaluation `failed`; `contentFeedback` may still be present from text-only feedback.
 
 The sentence feedback evaluator input is:
 
@@ -87,7 +88,9 @@ The sentence feedback evaluator input is:
 
 The conversation history for round N includes only dialogue text: prior AI/student turns and the latest student turn being evaluated. It should not include any previous rounds' Azure scores, previous sentence feedback, or the AI reply after the latest student turn. Previous scores and feedback are not needed for `DetailedReport`; they would bias the per-turn comment away from the student's current utterance.
 
-The only pronunciation scores passed to the sentence feedback evaluator are the Azure scores for the latest student turn.
+The only pronunciation scores passed to the sentence feedback evaluator are the Azure scores for the latest student turn. If Azure failed after retries, `latestStudentTurn.pronunciation` should be `null` and the evaluator must generate text-only expression feedback.
+
+In text-only mode, the evaluator must not mention pronunciation, fluency, intonation, stress, prosody, or any speech-score-dependent observation. It should focus on grammar, vocabulary, sentence completeness, communication intent, grade fit, target vocabulary, and target sentence patterns.
 
 The evaluator output is:
 
@@ -112,7 +115,7 @@ The ready condition is:
 
 If any evaluation is still `pending`, `/report` returns `status: "evaluating"` and should not generate the overall report.
 
-If any evaluation has `status = "failed"`, `/report` returns `status: "failed"` and should not generate the overall report. The frontend should show that the report could not be completed and may still show the available detailed rows if the UI clearly marks the report as incomplete.
+If any evaluation has `status = "failed"`, `/report` returns `status: "failed"` for the full report and should not generate the complete overall report. The frontend may still show the available detailed rows. A failed pronunciation evaluation can still include text-only `contentFeedback`.
 
 When all sentence evaluations are completed, the backend generates and caches the overall report. Overall output owns:
 
@@ -125,6 +128,8 @@ When all sentence evaluations are completed, the backend generates and caches th
 
 These fields must be based on all expected student answers, all Azure scores, all available sentence feedback objects, and the full dialogue transcript. If a sentence feedback LLM call failed but Azure succeeded, that answer still has completed pronunciation scores and `contentFeedback = null`; overall generation should tolerate that missing feedback instead of blocking forever.
 
+Full OverallReport still requires all expected student answers to have completed Azure pronunciation scores. Text-only sentence feedback is useful for `DetailedReport`, but it is not enough to compute complete aggregate pronunciation scores.
+
 ## API Contract
 
 `GET /report?sessionId=...` remains the frontend entry point.
@@ -228,6 +233,8 @@ feedback?: {
 
 If `contentFeedback` is missing, the expanded card should still show available pronunciation scores and omit the feedback blocks.
 
+If pronunciation scores are missing but `contentFeedback` is present, the expanded card should show the feedback blocks and omit the pronunciation score rows.
+
 ## Frontend OverallReport Mapping
 
 `OverallReport.vue` should not consume per-sentence `comment` or `betterExpression` directly. It should consume only the overall report fields generated after all sentence evaluations are complete.
@@ -367,7 +374,12 @@ If these logs are not enough to diagnose production stalls, a later implementati
 
 ## Error Handling
 
-Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
+Sentence feedback has two modes:
+
+- Scored mode: Azure succeeded, so the evaluator receives current-turn pronunciation scores plus dialogue text.
+- Text-only mode: Azure failed after retries, so the evaluator receives dialogue text without pronunciation scores.
+
+If Azure fails after retries, keep the pronunciation evaluation `status = "failed"` but still attempt text-only sentence feedback. If text-only feedback succeeds, store it in `contentFeedback`.
 
 If Azure succeeds but the sentence feedback LLM fails or returns invalid JSON, keep the Azure scores and store `contentFeedback = null`. This keeps `DetailedReport` useful without inventing feedback.
 
@@ -381,7 +393,9 @@ Backend tests should cover:
 - invalid evaluator JSON returns `None`
 - Azure success followed by feedback success writes both scores and `content_feedback`
 - Azure success followed by feedback failure writes scores and leaves `content_feedback` null
-- Azure failure skips feedback generation
+- Azure failure after retries attempts text-only feedback
+- Azure failure after text-only feedback success stores `status = "failed"` and non-null `content_feedback`
+- Azure failure after text-only feedback failure stores `status = "failed"` and null `content_feedback`
 - `/report` does not generate overall data while any evaluation is pending
 - `/report` generates or returns cached overall data once all sentence evaluations are complete
 - report pipeline logs include `session_id`, `stage`, `event`, and `duration_ms` for provider calls