2 minggu lalu · e5a06b71e2
--- a/docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md
+++ b/docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md
@@ -0,0 +1,260 @@
 
				+# English Speaking Report Pipeline Design
			
 
				+
			
 
				+## Scope
			
 
				+
			
 
				+This design covers the runtime report pipeline for `src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue` and `OverallReport.vue`, plus the backend report contract in `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api`.
			
 
				+
			
 
				+The goal is to separate per-round sentence evaluation from whole-session overall evaluation while keeping `/report` as the single frontend entry point for the final report page.
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+The current backend stores Azure pronunciation scores in `pronunciation_evaluation` and has a `content_feedback` JSON field, but the existing feedback shape uses `highlights / corrections / suggestions`. That shape is not correct for `DetailedReport` because those fields describe whole-session conclusions, not one student turn.
			
 
				+
			
 
				+`DetailedReport` needs one feedback object per student round:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "comment": "一句话点评",
			
 
				+  "betterExpression": "进阶表达"
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+`OverallReport` needs whole-session fields only after all student rounds have completed sentence evaluation.
			
 
				+
			
 
				+## Architecture
			
 
				+
			
 
				+The report flow has three layers:
			
 
				+
			
 
				+1. Sentence evaluation: one record per student round.
			
 
				+2. Overall evaluation: one cached summary per completed session.
			
 
				+3. Report API: one `/report` response that returns both detailed and overall report data with a readiness status.
			
 
				+
			
 
				+`/speak-stream` remains optimized for conversation responsiveness. It should not wait for report generation before sending `done`.
			
 
				+
			
 
				+## Sentence Evaluation
			
 
				+
			
 
				+Each student round is evaluated asynchronously after `/speak-stream` writes the student message and AI response.
			
 
				+
			
 
				+The sentence evaluation job runs in this order:
			
 
				+
			
 
				+1. Call Azure Speech API for pronunciation scoring.
			
 
				+2. If Azure succeeds, call the sentence feedback LLM evaluator.
			
 
				+3. Store Azure scores and sentence feedback in `pronunciation_evaluation`.
			
 
				+4. If Azure fails, mark the evaluation `failed` and do not generate sentence feedback.
			
 
				+
			
 
				+The sentence feedback evaluator input is:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "conversationHistory": [
			
 
				+    { "role": "ai", "content": "..." },
			
 
				+    { "role": "student", "content": "..." }
			
 
				+  ],
			
 
				+  "latestStudentTurn": {
			
 
				+    "round": 1,
			
 
				+    "content": "...",
			
 
				+    "pronunciation": {
			
 
				+      "accuracy": 82,
			
 
				+      "fluency": 76,
			
 
				+      "completeness": 88,
			
 
				+      "prosody": 70
			
 
				+    }
			
 
				+  },
			
 
				+  "grade": "五年级",
			
 
				+  "vocabulary": ["..."],
			
 
				+  "sentences": ["..."]
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The conversation history for round N includes the prior conversation and the latest student turn being evaluated. It should not include the AI reply after the latest student turn, because that reply is not context the student had when speaking.
			
 
				+
			
 
				+The evaluator output is:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "comment": "表达清楚，because 用得很好。",
			
 
				+  "betterExpression": "My favorite animal is the panda because it is cute."
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Both keys are required. Values may be empty strings only if the evaluator fails validation and the backend chooses to store a normalized fallback; the preferred failure behavior is to store `null` for `contentFeedback`.
			
 
				+
			
 
				+## Overall Evaluation
			
 
				+
			
 
				+Overall evaluation is generated only when every student round has a completed pronunciation evaluation.
			
 
				+
			
 
				+The ready condition is:
			
 
				+
			
 
				+- The session has at least one student message.
			
 
				+- Every student message has a `pronunciation_evaluation` row.
			
 
				+- Every evaluation has `status = "completed"`.
			
 
				+
			
 
				+If any evaluation is still `pending`, `/report` returns `status: "evaluating"` and should not generate the overall report.
			
 
				+
			
 
				+If any evaluation has `status = "failed"`, `/report` returns `status: "failed"` and should not generate the overall report. The frontend should show that the report could not be completed and may still show the available detailed rows if the UI clearly marks the report as incomplete.
			
 
				+
			
 
				+When all sentence evaluations are completed, the backend generates and caches the overall report. Overall output owns:
			
 
				+
			
 
				+- `aiComment`
			
 
				+- `highlights`
			
 
				+- `improvements`
			
 
				+- aggregate dimensions
			
 
				+- `overallScore`
			
 
				+- statistics such as average score, highest score, grammar error count, and excellent expression count
			
 
				+
			
 
				+These fields must be based on all student rounds, all Azure scores, all available sentence feedback objects, and the full dialogue transcript. If a sentence feedback LLM call failed but Azure succeeded, that round still has completed pronunciation scores and `contentFeedback = null`; overall generation should tolerate that missing feedback instead of blocking forever.
			
 
				+
			
 
				+## API Contract
			
 
				+
			
 
				+`GET /report?sessionId=...` remains the frontend entry point.
			
 
				+
			
 
				+The backend response should include:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "sessionId": "...",
			
 
				+  "topic": "...",
			
 
				+  "status": "evaluating",
			
 
				+  "rounds": [
			
 
				+    {
			
 
				+      "round": 1,
			
 
				+      "role": "student",
			
 
				+      "content": "...",
			
 
				+      "audioUrl": "...",
			
 
				+      "evaluation": {
			
 
				+        "status": "completed",
			
 
				+        "accuracyScore": 82,
			
 
				+        "fluencyScore": 76,
			
 
				+        "completenessScore": 88,
			
 
				+        "prosodyScore": 70,
			
 
				+        "wordAnalysis": null,
			
 
				+        "contentFeedback": {
			
 
				+          "comment": "表达清楚，because 用得很好。",
			
 
				+          "betterExpression": "My favorite animal is the panda because it is cute."
			
 
				+        }
			
 
				+      }
			
 
				+    }
			
 
				+  ],
			
 
				+  "overall": null
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+When the report is ready:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "status": "ready",
			
 
				+  "rounds": [
			
 
				+    {
			
 
				+      "round": 1,
			
 
				+      "role": "student",
			
 
				+      "content": "...",
			
 
				+      "audioUrl": "...",
			
 
				+      "evaluation": {
			
 
				+        "status": "completed",
			
 
				+        "accuracyScore": 82,
			
 
				+        "fluencyScore": 76,
			
 
				+        "completenessScore": 88,
			
 
				+        "prosodyScore": 70,
			
 
				+        "wordAnalysis": null,
			
 
				+        "contentFeedback": {
			
 
				+          "comment": "表达清楚，because 用得很好。",
			
 
				+          "betterExpression": "My favorite animal is the panda because it is cute."
			
 
				+        }
			
 
				+      }
			
 
				+    }
			
 
				+  ],
			
 
				+  "overall": {
			
 
				+    "overallScore": 85,
			
 
				+    "scoreLevel": "good",
			
 
				+    "percentile": 0,
			
 
				+    "dimensions": {
			
 
				+      "fluency": 82,
			
 
				+      "interaction": 80,
			
 
				+      "vocabulary": 78,
			
 
				+      "grammar": 84
			
 
				+    },
			
 
				+    "aiComment": "...",
			
 
				+    "highlights": ["..."],
			
 
				+    "improvements": ["..."],
			
 
				+    "statistics": {}
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The frontend adapter may keep the existing `DialogueReport` shape, but it should map this response into a clear internal model:
			
 
				+
			
 
				+- `evaluation.sentenceEvaluations` for `DetailedReport`
			
 
				+- `evaluation.aiComment`, `highlights`, `improvements`, dimensions, score, and statistics for `OverallReport`
			
 
				+
			
 
				+## Frontend DetailedReport Mapping
			
 
				+
			
 
				+`SentenceEvaluation.feedback` should change from the old whole-session-like shape to:
			
 
				+
			
 
				+```ts
			
 
				+feedback?: {
			
 
				+  comment: string
			
 
				+  betterExpression: string
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+`DetailedReport.vue` should display, for student rounds:
			
 
				+
			
 
				+- score badge derived from the four Azure scores
			
 
				+- four pronunciation rows
			
 
				+- `一句话点评`: `feedback.comment`
			
 
				+- `进阶表达`: `feedback.betterExpression`
			
 
				+
			
 
				+If `contentFeedback` is missing, the expanded card should still show available pronunciation scores and omit the feedback blocks.
			
 
				+
			
 
				+## Frontend OverallReport Mapping
			
 
				+
			
 
				+`OverallReport.vue` should not consume per-sentence `comment` or `betterExpression` directly. It should consume only the overall report fields generated after all sentence evaluations are complete.
			
 
				+
			
 
				+The existing `highlights` and `improvements` arrays belong here.
			
 
				+
			
 
				+## Polling Behavior
			
 
				+
			
 
				+When the dialogue completes and the frontend enters report mode, it should call `/report`.
			
 
				+
			
 
				+If `status !== "ready"`, the frontend should show a report-generating state and poll `/report` until either:
			
 
				+
			
 
				+- `status === "ready"`
			
 
				+- `status === "failed"`
			
 
				+- a retry limit is reached
			
 
				+- the request fails with an unrecoverable error
			
 
				+
			
 
				+The detailed report may be shown partially only if the UI explicitly distinguishes partial data from a completed report. The first implementation should prefer a single loading state until `ready` to avoid showing misleading aggregate results.
			
 
				+
			
 
				+## Error Handling
			
 
				+
			
 
				+Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
			
 
				+
			
 
				+If Azure succeeds but the sentence feedback LLM fails or returns invalid JSON, keep the Azure scores and store `contentFeedback = null`. This keeps `DetailedReport` useful without inventing feedback.
			
 
				+
			
 
				+If overall generation fails, keep the sentence-level data intact and return a non-ready status or a soft error field. The frontend should keep the report loading/error state separate from the completed conversation state.
			
 
				+
			
 
				+## Testing
			
 
				+
			
 
				+Backend tests should cover:
			
 
				+
			
 
				+- sentence feedback evaluator parses `{ comment, betterExpression }`
			
 
				+- invalid evaluator JSON returns `None`
			
 
				+- Azure success followed by feedback success writes both scores and `content_feedback`
			
 
				+- Azure success followed by feedback failure writes scores and leaves `content_feedback` null
			
 
				+- Azure failure skips feedback generation
			
 
				+- `/report` does not generate overall data while any evaluation is pending
			
 
				+- `/report` generates or returns cached overall data once all sentence evaluations are complete
			
 
				+
			
 
				+Frontend tests or focused manual checks should cover:
			
 
				+
			
 
				+- `adaptReport` maps `contentFeedback.comment` and `betterExpression`
			
 
				+- `DetailedReport.vue` renders pronunciation scores without feedback
			
 
				+- `DetailedReport.vue` renders both feedback blocks when present
			
 
				+- report mode polls until `ready` before showing `OverallReport`
			
 
				+
			
 
				+## Out of Scope
			
 
				+
			
 
				+This design does not define the final overall-report prompt in detail. That prompt should be designed separately after the sentence-level report contract is implemented.
			
 
				+
			
 
				+This design also does not change the conversation behavior of `/speak-stream`; the endpoint should continue returning transcript, AI tokens, and `done` without waiting for report jobs.