|
|
@@ -0,0 +1,1579 @@
|
|
|
+# English Speaking Report Pipeline Implementation Plan
|
|
|
+
|
|
|
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
+
|
|
|
+**Goal:** Build the English-speaking report pipeline so `DetailedReport` shows per-answer pronunciation and expression feedback, while `OverallReport` is generated only when the completed practice has all required Azure scores.
|
|
|
+
|
|
|
+**Architecture:** The backend owns report generation and exposes one `/report` endpoint with `evaluating | ready | failed | incomplete` status. Per-answer evaluation runs in background after each student answer; OverallReport is lazily generated and cached when `/report` sees the generation gate pass. The frontend polls `/report`, maps backend data into the existing preview types, and renders clear degraded states.
|
|
|
+
|
|
|
+**Tech Stack:** FastAPI, SQLAlchemy async, MySQL JSON columns, OpenAI-compatible OneHub LLM, Azure Speech API, Vue 3, TypeScript, SCSS.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Scope Check
|
|
|
+
|
|
|
+This plan spans two repos but one feature:
|
|
|
+
|
|
|
+- Backend: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api`
|
|
|
+- Frontend: `/Users/buoy/Development/gitrepo/PPT`
|
|
|
+
|
|
|
+The backend tasks are independently testable with `uv run pytest`. The frontend tasks are independently checkable with `npm run type-check`.
|
|
|
+
|
|
|
+## File Structure
|
|
|
+
|
|
|
+Backend files:
|
|
|
+
|
|
|
+- Create `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/report_logging.py`: stable report-pipeline structured logging helper.
|
|
|
+- Create `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/sentence_feedback_evaluator.py`: per-student-answer LLM evaluator returning `{comment, betterExpression}` in scored and text-only modes.
|
|
|
+- Create `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/overall_report_evaluator.py`: whole-session LLM evaluator returning normalized `aiComment/highlights/improvements`.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/models/dialogue.py`: add cached overall report/status fields.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/init.sql`: add new columns for fresh DBs.
|
|
|
+- Create `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/migrations/003_add_overall_report.sql`: add new columns for existing DBs.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`: wire sentence feedback, overall gate, overall generation, status responses, and structured logs.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/api/dialogue.py`: fix `/speak-stream` round ownership and call shared background evaluation logic.
|
|
|
+- Add tests under `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/`.
|
|
|
+
|
|
|
+Frontend files:
|
|
|
+
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/PPT/src/types/englishSpeaking.ts`: update feedback and report status types.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts`: map new backend response, add report polling helper or status handling.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue`: render new feedback shape and degradation cases.
|
|
|
+- Modify `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/TopicDiscussionPreview.vue`: handle `evaluating/failed/incomplete` report states.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 1: Backend Schema For Cached OverallReport
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/models/dialogue.py`
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/init.sql`
|
|
|
+- Create: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/migrations/003_add_overall_report.sql`
|
|
|
+
|
|
|
+- [ ] **Step 1: Add model fields**
|
|
|
+
|
|
|
+In `DialogueSession`, add fields after `summary`:
|
|
|
+
|
|
|
+```python
|
|
|
+ overall_report: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
|
|
|
+ overall_status: Mapped[Optional[str]] = mapped_column(String(20), nullable=True)
|
|
|
+ overall_error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Update fresh DB schema**
|
|
|
+
|
|
|
+In `init.sql`, add columns after `summary TEXT NULL,`:
|
|
|
+
|
|
|
+```sql
|
|
|
+ overall_report JSON NULL,
|
|
|
+ overall_status VARCHAR(20) NULL,
|
|
|
+ overall_error_message TEXT NULL,
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 3: Add migration**
|
|
|
+
|
|
|
+Create `migrations/003_add_overall_report.sql`:
|
|
|
+
|
|
|
+```sql
|
|
|
+-- Add cached structured overall report fields to existing dialogue sessions.
|
|
|
+-- Apply once against an existing database (new DBs use updated init.sql).
|
|
|
+ALTER TABLE dialogue_session
|
|
|
+ ADD COLUMN overall_report JSON NULL AFTER summary,
|
|
|
+ ADD COLUMN overall_status VARCHAR(20) NULL AFTER overall_report,
|
|
|
+ ADD COLUMN overall_error_message TEXT NULL AFTER overall_status;
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run backend tests for import sanity**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_smoke.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: existing smoke tests pass.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/models/dialogue.py init.sql migrations/003_add_overall_report.sql
|
|
|
+git commit -m "feat: add cached overall report fields"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 2: Structured Report Pipeline Logging Helper
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Create: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/report_logging.py`
|
|
|
+- Test: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/test_report_logging.py`
|
|
|
+
|
|
|
+- [ ] **Step 1: Write failing logging tests**
|
|
|
+
|
|
|
+Create `tests/service/speaking/test_report_logging.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+import logging
|
|
|
+
|
|
|
+from app.service.speaking.report_logging import log_report_stage
|
|
|
+
|
|
|
+
|
|
|
+def test_log_report_stage_emits_stable_key_value_fields(caplog):
|
|
|
+ logger = logging.getLogger("test-report-logger")
|
|
|
+
|
|
|
+ with caplog.at_level(logging.INFO):
|
|
|
+ log_report_stage(
|
|
|
+ logger,
|
|
|
+ stage="sentence_eval.azure_completed",
|
|
|
+ event="success",
|
|
|
+ session_id="session-1",
|
|
|
+ round=2,
|
|
|
+ evaluation_id=42,
|
|
|
+ duration_ms=123,
|
|
|
+ status="completed",
|
|
|
+ attempt=1,
|
|
|
+ )
|
|
|
+
|
|
|
+ line = caplog.records[0].message
|
|
|
+ assert "stage=sentence_eval.azure_completed" in line
|
|
|
+ assert "event=success" in line
|
|
|
+ assert "trace_id=session-1" in line
|
|
|
+ assert "session_id=session-1" in line
|
|
|
+ assert "round=2" in line
|
|
|
+ assert "evaluation_id=42" in line
|
|
|
+ assert "duration_ms=123" in line
|
|
|
+ assert "status=completed" in line
|
|
|
+ assert "attempt=1" in line
|
|
|
+
|
|
|
+
|
|
|
+def test_log_report_stage_escapes_spaces_in_error_message(caplog):
|
|
|
+ logger = logging.getLogger("test-report-logger")
|
|
|
+
|
|
|
+ with caplog.at_level(logging.ERROR):
|
|
|
+ log_report_stage(
|
|
|
+ logger,
|
|
|
+ stage="report.failed",
|
|
|
+ event="failed",
|
|
|
+ session_id="session-1",
|
|
|
+ error_code="OVERALL_LLM_ERROR",
|
|
|
+ error_type="RuntimeError",
|
|
|
+ error_message="provider returned bad json",
|
|
|
+ level=logging.ERROR,
|
|
|
+ )
|
|
|
+
|
|
|
+ line = caplog.records[0].message
|
|
|
+ assert "error_message=\"provider returned bad json\"" in line
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Run tests to verify failure**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_report_logging.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL with `ModuleNotFoundError: No module named 'app.service.speaking.report_logging'`.
|
|
|
+
|
|
|
+- [ ] **Step 3: Implement logging helper**
|
|
|
+
|
|
|
+Create `app/service/speaking/report_logging.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+"""Structured key=value logs for the speaking report pipeline."""
|
|
|
+
|
|
|
+from __future__ import annotations
|
|
|
+
|
|
|
+import logging
|
|
|
+from typing import Any
|
|
|
+
|
|
|
+
|
|
|
+def _format_value(value: Any) -> str:
|
|
|
+ text = str(value)
|
|
|
+ if any(ch.isspace() for ch in text) or '"' in text:
|
|
|
+ escaped = text.replace("\\", "\\\\").replace('"', '\\"')
|
|
|
+ return f'"{escaped}"'
|
|
|
+ return text
|
|
|
+
|
|
|
+
|
|
|
+def log_report_stage(
|
|
|
+ logger: logging.Logger,
|
|
|
+ *,
|
|
|
+ stage: str,
|
|
|
+ event: str,
|
|
|
+ session_id: str | None = None,
|
|
|
+ trace_id: str | None = None,
|
|
|
+ round: int | None = None,
|
|
|
+ message_id: int | None = None,
|
|
|
+ evaluation_id: int | None = None,
|
|
|
+ duration_ms: int | None = None,
|
|
|
+ status: str | None = None,
|
|
|
+ attempt: int | None = None,
|
|
|
+ error_code: str | None = None,
|
|
|
+ error_type: str | None = None,
|
|
|
+ error_message: str | None = None,
|
|
|
+ level: int = logging.INFO,
|
|
|
+) -> None:
|
|
|
+ fields: dict[str, Any] = {
|
|
|
+ "trace_id": trace_id or session_id,
|
|
|
+ "session_id": session_id,
|
|
|
+ "round": round,
|
|
|
+ "message_id": message_id,
|
|
|
+ "evaluation_id": evaluation_id,
|
|
|
+ "stage": stage,
|
|
|
+ "event": event,
|
|
|
+ "duration_ms": duration_ms,
|
|
|
+ "status": status,
|
|
|
+ "attempt": attempt,
|
|
|
+ "error_code": error_code,
|
|
|
+ "error_type": error_type,
|
|
|
+ "error_message": error_message,
|
|
|
+ }
|
|
|
+ parts = [
|
|
|
+ f"{key}={_format_value(value)}"
|
|
|
+ for key, value in fields.items()
|
|
|
+ if value is not None
|
|
|
+ ]
|
|
|
+ logger.log(level, "report_pipeline %s", " ".join(parts))
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_report_logging.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/service/speaking/report_logging.py tests/service/speaking/test_report_logging.py
|
|
|
+git commit -m "feat: add report pipeline logging helper"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 3: Sentence Feedback Evaluator
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Create: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/sentence_feedback_evaluator.py`
|
|
|
+- Test: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/test_sentence_feedback_evaluator.py`
|
|
|
+
|
|
|
+- [ ] **Step 1: Write failing evaluator tests**
|
|
|
+
|
|
|
+Create `tests/service/speaking/test_sentence_feedback_evaluator.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+import json
|
|
|
+from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
+
|
|
|
+import pytest
|
|
|
+
|
|
|
+from app.service.speaking.sentence_feedback_evaluator import SentenceFeedbackEvaluator
|
|
|
+
|
|
|
+
|
|
|
+def _mock_openai_response(content: str) -> MagicMock:
|
|
|
+ choice = MagicMock()
|
|
|
+ choice.message.content = content
|
|
|
+ resp = MagicMock()
|
|
|
+ resp.choices = [choice]
|
|
|
+ return resp
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_scored_mode_returns_normalized_feedback():
|
|
|
+ payload = json.dumps(
|
|
|
+ {
|
|
|
+ "comment": "表达清楚,because 用得好。",
|
|
|
+ "betterExpression": "I like pandas because they are cute.",
|
|
|
+ },
|
|
|
+ ensure_ascii=False,
|
|
|
+ )
|
|
|
+
|
|
|
+ with patch("app.service.speaking.sentence_feedback_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response(payload))
|
|
|
+
|
|
|
+ evaluator = SentenceFeedbackEvaluator()
|
|
|
+ result = await evaluator.evaluate(
|
|
|
+ conversation_history=[{"role": "ai", "content": "What animal do you like?"}],
|
|
|
+ latest_student_turn={
|
|
|
+ "round": 1,
|
|
|
+ "content": "I like panda because cute.",
|
|
|
+ "pronunciation": {"accuracy": 82, "fluency": 76, "completeness": 88, "prosody": 70},
|
|
|
+ },
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=["panda"],
|
|
|
+ sentences=["I like ... because ..."],
|
|
|
+ )
|
|
|
+
|
|
|
+ assert result == {
|
|
|
+ "comment": "表达清楚,because 用得好。",
|
|
|
+ "betterExpression": "I like pandas because they are cute.",
|
|
|
+ }
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_text_only_mode_sends_null_pronunciation():
|
|
|
+ payload = json.dumps({"comment": "能表达喜好。", "betterExpression": "I like pandas."}, ensure_ascii=False)
|
|
|
+
|
|
|
+ with patch("app.service.speaking.sentence_feedback_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response(payload))
|
|
|
+
|
|
|
+ evaluator = SentenceFeedbackEvaluator()
|
|
|
+ await evaluator.evaluate(
|
|
|
+ conversation_history=[],
|
|
|
+ latest_student_turn={"round": 1, "content": "I like panda.", "pronunciation": None},
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=[],
|
|
|
+ sentences=[],
|
|
|
+ )
|
|
|
+
|
|
|
+ raw_user_payload = instance.chat.completions.create.await_args.kwargs["messages"][1]["content"]
|
|
|
+ assert '"pronunciation": null' in raw_user_payload
|
|
|
+ assert "text-only" in instance.chat.completions.create.await_args.kwargs["messages"][0]["content"]
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_returns_none_on_invalid_shape():
|
|
|
+ with patch("app.service.speaking.sentence_feedback_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response('{"comment": "ok"}'))
|
|
|
+
|
|
|
+ evaluator = SentenceFeedbackEvaluator()
|
|
|
+ result = await evaluator.evaluate(
|
|
|
+ conversation_history=[],
|
|
|
+ latest_student_turn={"round": 1, "content": "Hi", "pronunciation": None},
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=[],
|
|
|
+ sentences=[],
|
|
|
+ )
|
|
|
+
|
|
|
+ assert result is None
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Run tests to verify failure**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_sentence_feedback_evaluator.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL with missing module.
|
|
|
+
|
|
|
+- [ ] **Step 3: Implement evaluator**
|
|
|
+
|
|
|
+Create `app/service/speaking/sentence_feedback_evaluator.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+"""Per-answer expression feedback evaluator for DetailedReport."""
|
|
|
+
|
|
|
+from __future__ import annotations
|
|
|
+
|
|
|
+import asyncio
|
|
|
+import json
|
|
|
+from typing import Any
|
|
|
+
|
|
|
+from openai import AsyncOpenAI
|
|
|
+
|
|
|
+from app.config import settings
|
|
|
+from app.logging import get_logger
|
|
|
+
|
|
|
+logger = get_logger(__name__)
|
|
|
+
|
|
|
+SYSTEM_PROMPT = """你是专业的智能英语口语对话教练。
|
|
|
+你会收到学生与AI的对话文本、学生年级、重点词汇、重点句型,以及最新学生回答的 pronunciation 评分。
|
|
|
+
|
|
|
+输出要求:
|
|
|
+- 仅输出 JSON,不要输出其它内容。
|
|
|
+- JSON 必须包含 comment 和 betterExpression 两个英文 key。
|
|
|
+- comment 是中文一句话点评,不超过 30 个汉字,积极、具体,提及当前回答的细节。
|
|
|
+- betterExpression 是一个可直接替换学生原句的英文进阶表达,适合学生年级。
|
|
|
+
|
|
|
+如果 latestStudentTurn.pronunciation 为 null,表示 text-only 模式:
|
|
|
+- 不要提发音、流畅度、语调、重音、prosody、speech score 或任何依赖语音评分的观察。
|
|
|
+- 只评价语法、词汇、句子完整度、交际意图、重点词汇和重点句型。
|
|
|
+"""
|
|
|
+
|
|
|
+
|
|
|
+class SentenceFeedbackEvaluator:
|
|
|
+ def __init__(self, timeout_seconds: float = 10.0):
|
|
|
+ self.client = AsyncOpenAI(
|
|
|
+ base_url=settings.ONEHUB_BASE_URL,
|
|
|
+ api_key=settings.ONEHUB_API_KEY,
|
|
|
+ )
|
|
|
+ self.model = settings.ONEHUB_MODEL
|
|
|
+ self.timeout_seconds = timeout_seconds
|
|
|
+
|
|
|
+ async def evaluate(
|
|
|
+ self,
|
|
|
+ *,
|
|
|
+ conversation_history: list[dict[str, Any]],
|
|
|
+ latest_student_turn: dict[str, Any],
|
|
|
+ grade: str,
|
|
|
+ vocabulary: list[str],
|
|
|
+ sentences: list[str],
|
|
|
+ ) -> dict | None:
|
|
|
+ user_payload = json.dumps(
|
|
|
+ {
|
|
|
+ "conversationHistory": conversation_history,
|
|
|
+ "latestStudentTurn": latest_student_turn,
|
|
|
+ "grade": grade,
|
|
|
+ "vocabulary": vocabulary,
|
|
|
+ "sentences": sentences,
|
|
|
+ },
|
|
|
+ ensure_ascii=False,
|
|
|
+ )
|
|
|
+ try:
|
|
|
+ resp = await asyncio.wait_for(
|
|
|
+ self.client.chat.completions.create(
|
|
|
+ model=self.model,
|
|
|
+ messages=[
|
|
|
+ {"role": "system", "content": SYSTEM_PROMPT},
|
|
|
+ {"role": "user", "content": user_payload},
|
|
|
+ ],
|
|
|
+ response_format={"type": "json_object"},
|
|
|
+ temperature=0,
|
|
|
+ ),
|
|
|
+ timeout=self.timeout_seconds,
|
|
|
+ )
|
|
|
+ except asyncio.TimeoutError:
|
|
|
+ logger.warning("SentenceFeedbackEvaluator LLM timeout")
|
|
|
+ return None
|
|
|
+ except Exception as e:
|
|
|
+ logger.error(f"SentenceFeedbackEvaluator LLM error: {e}")
|
|
|
+ return None
|
|
|
+
|
|
|
+ raw = resp.choices[0].message.content or ""
|
|
|
+ try:
|
|
|
+ parsed = json.loads(raw)
|
|
|
+ except json.JSONDecodeError:
|
|
|
+ logger.warning(f"SentenceFeedbackEvaluator got non-JSON: {raw[:200]}")
|
|
|
+ return None
|
|
|
+
|
|
|
+ if not isinstance(parsed, dict):
|
|
|
+ return None
|
|
|
+ comment = parsed.get("comment")
|
|
|
+ better = parsed.get("betterExpression")
|
|
|
+ if not isinstance(comment, str) or not isinstance(better, str):
|
|
|
+ logger.warning(f"SentenceFeedbackEvaluator got invalid shape: {parsed}")
|
|
|
+ return None
|
|
|
+ return {"comment": comment, "betterExpression": better}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_sentence_feedback_evaluator.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/service/speaking/sentence_feedback_evaluator.py tests/service/speaking/test_sentence_feedback_evaluator.py
|
|
|
+git commit -m "feat: add sentence feedback evaluator"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 4: OverallReport Evaluator
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Create: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/overall_report_evaluator.py`
|
|
|
+- Test: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/test_overall_report_evaluator.py`
|
|
|
+
|
|
|
+- [ ] **Step 1: Write failing evaluator tests**
|
|
|
+
|
|
|
+Create `tests/service/speaking/test_overall_report_evaluator.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+import json
|
|
|
+from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
+
|
|
|
+import pytest
|
|
|
+
|
|
|
+from app.service.speaking.overall_report_evaluator import OverallReportEvaluator
|
|
|
+
|
|
|
+
|
|
|
+def _mock_openai_response(content: str) -> MagicMock:
|
|
|
+ choice = MagicMock()
|
|
|
+ choice.message.content = content
|
|
|
+ resp = MagicMock()
|
|
|
+ resp.choices = [choice]
|
|
|
+ return resp
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_maps_suggestions_to_improvements():
|
|
|
+ raw = json.dumps(
|
|
|
+ {
|
|
|
+ "overall_evaluation": {"evaluation": "整体表达积极。"},
|
|
|
+ "highlights": ["能主动回应", "使用了 because", "发音清晰"],
|
|
|
+ "suggestions": ["补充更多细节", "注意主谓一致", "多用连接词"],
|
|
|
+ },
|
|
|
+ ensure_ascii=False,
|
|
|
+ )
|
|
|
+
|
|
|
+ with patch("app.service.speaking.overall_report_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response(raw))
|
|
|
+
|
|
|
+ evaluator = OverallReportEvaluator()
|
|
|
+ result = await evaluator.evaluate(
|
|
|
+ conversation_history=[],
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=[],
|
|
|
+ sentences=[],
|
|
|
+ )
|
|
|
+
|
|
|
+ assert result == {
|
|
|
+ "aiComment": "整体表达积极。",
|
|
|
+ "highlights": ["能主动回应", "使用了 because", "发音清晰"],
|
|
|
+ "improvements": ["补充更多细节", "注意主谓一致", "多用连接词"],
|
|
|
+ }
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_accepts_chinese_alias():
|
|
|
+ raw = json.dumps(
|
|
|
+ {
|
|
|
+ "overall_evaluation": {"chinese": "整体表现不错。"},
|
|
|
+ "highlights": ["回应及时"],
|
|
|
+ "suggestions": ["增加细节"],
|
|
|
+ },
|
|
|
+ ensure_ascii=False,
|
|
|
+ )
|
|
|
+
|
|
|
+ with patch("app.service.speaking.overall_report_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response(raw))
|
|
|
+
|
|
|
+ evaluator = OverallReportEvaluator()
|
|
|
+ result = await evaluator.evaluate(
|
|
|
+ conversation_history=[],
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=[],
|
|
|
+ sentences=[],
|
|
|
+ )
|
|
|
+
|
|
|
+ assert result["aiComment"] == "整体表现不错。"
|
|
|
+
|
|
|
+
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_evaluate_returns_none_on_bad_json():
|
|
|
+ with patch("app.service.speaking.overall_report_evaluator.AsyncOpenAI") as MockClient:
|
|
|
+ instance = MockClient.return_value
|
|
|
+ instance.chat.completions.create = AsyncMock(return_value=_mock_openai_response("bad"))
|
|
|
+
|
|
|
+ evaluator = OverallReportEvaluator()
|
|
|
+ result = await evaluator.evaluate(
|
|
|
+ conversation_history=[],
|
|
|
+ grade="五年级",
|
|
|
+ vocabulary=[],
|
|
|
+ sentences=[],
|
|
|
+ )
|
|
|
+
|
|
|
+ assert result is None
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Run tests to verify failure**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_overall_report_evaluator.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL with missing module.
|
|
|
+
|
|
|
+- [ ] **Step 3: Implement evaluator**
|
|
|
+
|
|
|
+Create `app/service/speaking/overall_report_evaluator.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+"""Whole-session evaluator for OverallReport."""
|
|
|
+
|
|
|
+from __future__ import annotations
|
|
|
+
|
|
|
+import asyncio
|
|
|
+import json
|
|
|
+from typing import Any
|
|
|
+
|
|
|
+from openai import AsyncOpenAI
|
|
|
+
|
|
|
+from app.config import settings
|
|
|
+from app.logging import get_logger
|
|
|
+
|
|
|
+logger = get_logger(__name__)
|
|
|
+
|
|
|
+SYSTEM_PROMPT = """## 任务
|
|
|
+基于学生与AI的完整对话记录(多轮)、Speech-API的每条语音评分结果,生成结构化评估报告,输出为JSON格式。
|
|
|
+
|
|
|
+### 输入数据
|
|
|
+1、完整对话记录:包含多轮对话,每轮标注角色、时间戳。
|
|
|
+2、Speech-API评分:每条学生语音的4个维度评分。
|
|
|
+3、学生年级、重点词汇、重点句型。
|
|
|
+
|
|
|
+### 任务要求
|
|
|
+1、综合分析所有对话内容和语音评分。
|
|
|
+2、从整体表现、语言能力、交流技巧、学习态度等角度进行评估。
|
|
|
+3、亮点和建议需具体、有针对性,避免泛泛而谈。
|
|
|
+4、仅输出 JSON,不要输出其它内容。
|
|
|
+
|
|
|
+### 输出格式
|
|
|
+{
|
|
|
+ "overall_evaluation": {
|
|
|
+ "evaluation": "中文整体评价"
|
|
|
+ },
|
|
|
+ "highlights": ["发言亮点1", "发言亮点2", "发言亮点3"],
|
|
|
+ "suggestions": ["具体改进建议1", "具体改进建议2", "具体改进建议3"]
|
|
|
+}
|
|
|
+"""
|
|
|
+
|
|
|
+
|
|
|
+class OverallReportEvaluator:
|
|
|
+ def __init__(self, timeout_seconds: float = 15.0):
|
|
|
+ self.client = AsyncOpenAI(
|
|
|
+ base_url=settings.ONEHUB_BASE_URL,
|
|
|
+ api_key=settings.ONEHUB_API_KEY,
|
|
|
+ )
|
|
|
+ self.model = settings.ONEHUB_MODEL
|
|
|
+ self.timeout_seconds = timeout_seconds
|
|
|
+
|
|
|
+ async def evaluate(
|
|
|
+ self,
|
|
|
+ *,
|
|
|
+ conversation_history: list[dict[str, Any]],
|
|
|
+ grade: str,
|
|
|
+ vocabulary: list[str],
|
|
|
+ sentences: list[str],
|
|
|
+ ) -> dict | None:
|
|
|
+ user_payload = json.dumps(
|
|
|
+ {
|
|
|
+ "conversationHistory": conversation_history,
|
|
|
+ "grade": grade,
|
|
|
+ "vocabulary": vocabulary,
|
|
|
+ "sentences": sentences,
|
|
|
+ },
|
|
|
+ ensure_ascii=False,
|
|
|
+ )
|
|
|
+ try:
|
|
|
+ resp = await asyncio.wait_for(
|
|
|
+ self.client.chat.completions.create(
|
|
|
+ model=self.model,
|
|
|
+ messages=[
|
|
|
+ {"role": "system", "content": SYSTEM_PROMPT},
|
|
|
+ {"role": "user", "content": user_payload},
|
|
|
+ ],
|
|
|
+ response_format={"type": "json_object"},
|
|
|
+ temperature=0,
|
|
|
+ ),
|
|
|
+ timeout=self.timeout_seconds,
|
|
|
+ )
|
|
|
+ except asyncio.TimeoutError:
|
|
|
+ logger.warning("OverallReportEvaluator LLM timeout")
|
|
|
+ return None
|
|
|
+ except Exception as e:
|
|
|
+ logger.error(f"OverallReportEvaluator LLM error: {e}")
|
|
|
+ return None
|
|
|
+
|
|
|
+ raw = resp.choices[0].message.content or ""
|
|
|
+ try:
|
|
|
+ parsed = json.loads(raw)
|
|
|
+ except json.JSONDecodeError:
|
|
|
+ logger.warning(f"OverallReportEvaluator got non-JSON: {raw[:200]}")
|
|
|
+ return None
|
|
|
+
|
|
|
+ return self._normalize(parsed)
|
|
|
+
|
|
|
+ @staticmethod
|
|
|
+ def _normalize(parsed: object) -> dict | None:
|
|
|
+ if not isinstance(parsed, dict):
|
|
|
+ return None
|
|
|
+ overall = parsed.get("overall_evaluation")
|
|
|
+ if not isinstance(overall, dict):
|
|
|
+ return None
|
|
|
+ ai_comment = overall.get("evaluation") or overall.get("chinese")
|
|
|
+ highlights = parsed.get("highlights")
|
|
|
+ suggestions = parsed.get("suggestions")
|
|
|
+ if not isinstance(ai_comment, str):
|
|
|
+ return None
|
|
|
+ if not isinstance(highlights, list) or not all(isinstance(item, str) for item in highlights):
|
|
|
+ return None
|
|
|
+ if not isinstance(suggestions, list) or not all(isinstance(item, str) for item in suggestions):
|
|
|
+ return None
|
|
|
+ return {
|
|
|
+ "aiComment": ai_comment,
|
|
|
+ "highlights": highlights,
|
|
|
+ "improvements": suggestions,
|
|
|
+ }
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_overall_report_evaluator.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/service/speaking/overall_report_evaluator.py tests/service/speaking/test_overall_report_evaluator.py
|
|
|
+git commit -m "feat: add overall report evaluator"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 5: Backend Sentence Evaluation Pipeline
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/api/dialogue.py`
|
|
|
+- Test: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py`
|
|
|
+
|
|
|
+- [ ] **Step 1: Update existing service tests for Azure failure text-only fallback**
|
|
|
+
|
|
|
+In `tests/service/speaking/test_dialogue_service_content.py`, change the old Azure failure test expectation from "skips evaluator" to "attempts text-only evaluator":
|
|
|
+
|
|
|
+```python
|
|
|
+@pytest.mark.asyncio
|
|
|
+async def test_azure_failure_attempts_text_only_feedback(monkeypatch) -> None:
|
|
|
+ ev = _fake_evaluation()
|
|
|
+ stub_db = _StubDB(ev)
|
|
|
+ monkeypatch.setattr(
|
|
|
+ "app.service.speaking.dialogue_service.async_session", lambda: stub_db
|
|
|
+ )
|
|
|
+
|
|
|
+ assessor = MagicMock()
|
|
|
+ assessor.assess = AsyncMock(side_effect=RuntimeError("azure exploded"))
|
|
|
+ evaluator = MagicMock()
|
|
|
+ evaluator.evaluate = AsyncMock(
|
|
|
+ return_value={
|
|
|
+ "comment": "能表达自己的想法。",
|
|
|
+ "betterExpression": "I like pandas.",
|
|
|
+ }
|
|
|
+ )
|
|
|
+
|
|
|
+ service = _build_service(assessor, evaluator)
|
|
|
+ await service._evaluate_pronunciation(
|
|
|
+ evaluation_id=1,
|
|
|
+ audio_bytes=b"",
|
|
|
+ reference_text="I like panda",
|
|
|
+ prior_ai_turn="What animal do you like?",
|
|
|
+ )
|
|
|
+
|
|
|
+ assert ev.status == "failed"
|
|
|
+ assert ev.content_feedback == {
|
|
|
+ "comment": "能表达自己的想法。",
|
|
|
+ "betterExpression": "I like pandas.",
|
|
|
+ }
|
|
|
+ evaluator.evaluate.assert_awaited_once()
|
|
|
+ latest_turn = evaluator.evaluate.await_args.kwargs["latest_student_turn"]
|
|
|
+ assert latest_turn["pronunciation"] is None
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Run updated test to verify failure**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_dialogue_service_content.py::test_azure_failure_attempts_text_only_feedback -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL because `ContentEvaluator.evaluate()` still uses old arguments or is not called on Azure failure.
|
|
|
+
|
|
|
+- [ ] **Step 3: Modify DialogueService constructor**
|
|
|
+
|
|
|
+In `dialogue_service.py`, replace `ContentEvaluator` import and constructor parameter with `SentenceFeedbackEvaluator` and `OverallReportEvaluator`:
|
|
|
+
|
|
|
+```python
|
|
|
+from app.service.speaking.sentence_feedback_evaluator import SentenceFeedbackEvaluator
|
|
|
+from app.service.speaking.overall_report_evaluator import OverallReportEvaluator
|
|
|
+from app.service.speaking.report_logging import log_report_stage
|
|
|
+```
|
|
|
+
|
|
|
+Constructor:
|
|
|
+
|
|
|
+```python
|
|
|
+ sentence_feedback_evaluator: SentenceFeedbackEvaluator | None = None,
|
|
|
+ overall_report_evaluator: OverallReportEvaluator | None = None,
|
|
|
+```
|
|
|
+
|
|
|
+Assignments:
|
|
|
+
|
|
|
+```python
|
|
|
+ self.sentence_feedback_evaluator = sentence_feedback_evaluator or SentenceFeedbackEvaluator()
|
|
|
+ self.overall_report_evaluator = overall_report_evaluator or OverallReportEvaluator()
|
|
|
+```
|
|
|
+
|
|
|
+Update tests' `_build_service()` helper to pass `sentence_feedback_evaluator=evaluator`.
|
|
|
+
|
|
|
+- [ ] **Step 4: Build sentence feedback payload helpers**
|
|
|
+
|
|
|
+Add helpers near `_coerce_str_list`:
|
|
|
+
|
|
|
+```python
|
|
|
+def _session_learning_context(session: DialogueSession) -> tuple[str, list[str], list[str]]:
|
|
|
+ role_config = session.role_config or {}
|
|
|
+ grade = role_config.get("grade")
|
|
|
+ if not isinstance(grade, str) or not grade.strip():
|
|
|
+ grade = "未指定年级"
|
|
|
+ return (
|
|
|
+ grade,
|
|
|
+ _coerce_str_list(role_config.get("vocabulary")),
|
|
|
+ _coerce_str_list(role_config.get("sentences")),
|
|
|
+ )
|
|
|
+
|
|
|
+
|
|
|
+def _history_until_latest_student(messages: Sequence[DialogueMessage], latest_message_id: int) -> list[dict[str, Any]]:
|
|
|
+ history: list[dict[str, Any]] = []
|
|
|
+ for msg in messages:
|
|
|
+ history.append(
|
|
|
+ {
|
|
|
+ "round": msg.round,
|
|
|
+ "role": msg.role,
|
|
|
+ "content": msg.content,
|
|
|
+ "timestamp": msg.created_at.isoformat() if msg.created_at else None,
|
|
|
+ }
|
|
|
+ )
|
|
|
+ if msg.id == latest_message_id:
|
|
|
+ break
|
|
|
+ return history
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 5: Update `_evaluate_pronunciation`**
|
|
|
+
|
|
|
+Inside `_evaluate_pronunciation`, after loading `evaluation`, load `message` and `session`:
|
|
|
+
|
|
|
+```python
|
|
|
+message = await db.get(DialogueMessage, evaluation.message_id)
|
|
|
+session = await db.get(DialogueSession, evaluation.session_id)
|
|
|
+if not message or not session:
|
|
|
+ logger.error(f"Evaluation dependencies missing: id={evaluation_id}")
|
|
|
+ return
|
|
|
+```
|
|
|
+
|
|
|
+Fetch history:
|
|
|
+
|
|
|
+```python
|
|
|
+history_result = await db.execute(
|
|
|
+ select(DialogueMessage)
|
|
|
+ .where(DialogueMessage.session_id == session.id)
|
|
|
+ .order_by(DialogueMessage.created_at)
|
|
|
+)
|
|
|
+messages = history_result.scalars().all()
|
|
|
+conversation_history = _history_until_latest_student(messages, message.id)
|
|
|
+grade, vocabulary, sentences = _session_learning_context(session)
|
|
|
+```
|
|
|
+
|
|
|
+On Azure success, call:
|
|
|
+
|
|
|
+```python
|
|
|
+content_feedback = await self.sentence_feedback_evaluator.evaluate(
|
|
|
+ conversation_history=conversation_history,
|
|
|
+ latest_student_turn={
|
|
|
+ "round": evaluation.round,
|
|
|
+ "content": reference_text,
|
|
|
+ "pronunciation": {
|
|
|
+ "accuracy": result["accuracy_score"],
|
|
|
+ "fluency": result["fluency_score"],
|
|
|
+ "completeness": result["completeness_score"],
|
|
|
+ "prosody": result["prosody_score"],
|
|
|
+ },
|
|
|
+ },
|
|
|
+ grade=grade,
|
|
|
+ vocabulary=vocabulary,
|
|
|
+ sentences=sentences,
|
|
|
+)
|
|
|
+evaluation.content_feedback = content_feedback
|
|
|
+```
|
|
|
+
|
|
|
+On Azure failure, set failed and still call text-only:
|
|
|
+
|
|
|
+```python
|
|
|
+evaluation.status = "failed"
|
|
|
+evaluation.error_message = str(e)
|
|
|
+try:
|
|
|
+ evaluation.content_feedback = await self.sentence_feedback_evaluator.evaluate(
|
|
|
+ conversation_history=conversation_history,
|
|
|
+ latest_student_turn={
|
|
|
+ "round": evaluation.round,
|
|
|
+ "content": reference_text,
|
|
|
+ "pronunciation": None,
|
|
|
+ },
|
|
|
+ grade=grade,
|
|
|
+ vocabulary=vocabulary,
|
|
|
+ sentences=sentences,
|
|
|
+ )
|
|
|
+except Exception as feedback_error:
|
|
|
+ logger.error(f"Text-only feedback failed: eval={evaluation_id}, error={feedback_error}")
|
|
|
+ evaluation.content_feedback = None
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 6: Route `/speak-stream` background evaluation through service**
|
|
|
+
|
|
|
+In `app/api/dialogue.py`, replace `_background_evaluate_pronunciation` internals so it constructs `DialogueService` and calls `_evaluate_pronunciation`:
|
|
|
+
|
|
|
+```python
|
|
|
+async def _background_evaluate_pronunciation(
|
|
|
+ evaluation_id: int, wav_bytes: bytes, reference_text: str
|
|
|
+):
|
|
|
+ service = get_dialogue_service()
|
|
|
+ await service._evaluate_pronunciation(
|
|
|
+ evaluation_id=evaluation_id,
|
|
|
+ audio_bytes=wav_bytes,
|
|
|
+ reference_text=reference_text,
|
|
|
+ content_type="audio/wav",
|
|
|
+ )
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 7: Fix `/speak-stream` AI reply round ownership**
|
|
|
+
|
|
|
+In `speak_stream`, before creating `ai_msg`, compute:
|
|
|
+
|
|
|
+```python
|
|
|
+next_round = current_round + 1
|
|
|
+is_complete = next_round > session.total_rounds
|
|
|
+```
|
|
|
+
|
|
|
+Store the AI response as:
|
|
|
+
|
|
|
+```python
|
|
|
+ai_msg = DialogueMessage(
|
|
|
+ session_id=session.id,
|
|
|
+ round=current_round if is_complete else next_round,
|
|
|
+ role="ai",
|
|
|
+ content=full_response,
|
|
|
+)
|
|
|
+```
|
|
|
+
|
|
|
+Then set:
|
|
|
+
|
|
|
+```python
|
|
|
+session.current_round = next_round
|
|
|
+```
|
|
|
+
|
|
|
+Keep the final done payload:
|
|
|
+
|
|
|
+```python
|
|
|
+{"type": "done", "isComplete": is_complete, "nextRound": session.current_round}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 8: Run sentence pipeline tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_dialogue_service_content.py tests/service/speaking/test_sentence_feedback_evaluator.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 9: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/service/speaking/dialogue_service.py app/api/dialogue.py tests/service/speaking/test_dialogue_service_content.py
|
|
|
+git commit -m "feat: generate sentence feedback for detailed report"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 6: Backend OverallReport Gate And `/report` Response
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`
|
|
|
+- Test: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py`
|
|
|
+
|
|
|
+- [ ] **Step 1: Add unit tests for report status decisions**
|
|
|
+
|
|
|
+Extend `tests/service/speaking/test_dialogue_service_report.py` with pure helper tests first:
|
|
|
+
|
|
|
+```python
|
|
|
+from app.service.speaking.dialogue_service import _decide_report_status
|
|
|
+
|
|
|
+
|
|
|
+def test_decide_report_status_evaluating_when_session_active():
|
|
|
+ assert _decide_report_status(
|
|
|
+ session_status="active",
|
|
|
+ student_count=1,
|
|
|
+ evaluation_statuses=["completed"],
|
|
|
+ ) == "evaluating"
|
|
|
+
|
|
|
+
|
|
|
+def test_decide_report_status_incomplete_without_student_answers():
|
|
|
+ assert _decide_report_status(
|
|
|
+ session_status="completed",
|
|
|
+ student_count=0,
|
|
|
+ evaluation_statuses=[],
|
|
|
+ ) == "incomplete"
|
|
|
+
|
|
|
+
|
|
|
+def test_decide_report_status_evaluating_when_missing_evaluation():
|
|
|
+ assert _decide_report_status(
|
|
|
+ session_status="completed",
|
|
|
+ student_count=2,
|
|
|
+ evaluation_statuses=["completed"],
|
|
|
+ ) == "evaluating"
|
|
|
+
|
|
|
+
|
|
|
+def test_decide_report_status_failed_when_any_evaluation_failed():
|
|
|
+ assert _decide_report_status(
|
|
|
+ session_status="completed",
|
|
|
+ student_count=2,
|
|
|
+ evaluation_statuses=["completed", "failed"],
|
|
|
+ ) == "failed"
|
|
|
+
|
|
|
+
|
|
|
+def test_decide_report_status_ready_when_all_completed():
|
|
|
+ assert _decide_report_status(
|
|
|
+ session_status="completed",
|
|
|
+ student_count=2,
|
|
|
+ evaluation_statuses=["completed", "completed"],
|
|
|
+ ) == "ready"
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Run tests to verify failure**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_dialogue_service_report.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL because `_decide_report_status` does not exist.
|
|
|
+
|
|
|
+- [ ] **Step 3: Implement status helper**
|
|
|
+
|
|
|
+Add near `_coerce_str_list`:
|
|
|
+
|
|
|
+```python
|
|
|
+def _decide_report_status(
|
|
|
+ *,
|
|
|
+ session_status: str,
|
|
|
+ student_count: int,
|
|
|
+ evaluation_statuses: list[str],
|
|
|
+) -> str:
|
|
|
+ if session_status != "completed":
|
|
|
+ return "evaluating"
|
|
|
+ if student_count == 0:
|
|
|
+ return "incomplete"
|
|
|
+ if len(evaluation_statuses) < student_count:
|
|
|
+ return "evaluating"
|
|
|
+ if any(status in {"pending", "running", "retrying"} for status in evaluation_statuses):
|
|
|
+ return "evaluating"
|
|
|
+ if any(status == "failed" for status in evaluation_statuses):
|
|
|
+ return "failed"
|
|
|
+ if all(status == "completed" for status in evaluation_statuses):
|
|
|
+ return "ready"
|
|
|
+ return "failed"
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Build overall conversation payload helper**
|
|
|
+
|
|
|
+Add:
|
|
|
+
|
|
|
+```python
|
|
|
+def _build_overall_history(messages: Sequence[DialogueMessage]) -> list[dict[str, Any]]:
|
|
|
+ history: list[dict[str, Any]] = []
|
|
|
+ for msg in messages:
|
|
|
+ item: dict[str, Any] = {
|
|
|
+ "round": msg.round,
|
|
|
+ "role": msg.role,
|
|
|
+ "content": msg.content,
|
|
|
+ "timestamp": msg.created_at.isoformat() if msg.created_at else None,
|
|
|
+ }
|
|
|
+ if msg.role == "student" and msg.evaluation:
|
|
|
+ ev = msg.evaluation
|
|
|
+ item["pronunciation"] = {
|
|
|
+ "accuracy": ev.accuracy_score,
|
|
|
+ "fluency": ev.fluency_score,
|
|
|
+ "completeness": ev.completeness_score,
|
|
|
+ "prosody": ev.prosody_score,
|
|
|
+ }
|
|
|
+ if ev.content_feedback:
|
|
|
+ item["contentFeedback"] = ev.content_feedback
|
|
|
+ history.append(item)
|
|
|
+ return history
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 5: Modify `get_report` gate**
|
|
|
+
|
|
|
+In `get_report`, after loading `messages` and `evaluations`, compute:
|
|
|
+
|
|
|
+```python
|
|
|
+student_messages = [msg for msg in messages if msg.role == "student"]
|
|
|
+report_status = _decide_report_status(
|
|
|
+ session_status=session.status,
|
|
|
+ student_count=len(student_messages),
|
|
|
+ evaluation_statuses=[ev.status for ev in evaluations],
|
|
|
+)
|
|
|
+```
|
|
|
+
|
|
|
+If `report_status == "ready"` and `session.overall_report is None`, generate:
|
|
|
+
|
|
|
+```python
|
|
|
+grade, vocabulary, sentences = _session_learning_context(session)
|
|
|
+overall = await self.overall_report_evaluator.evaluate(
|
|
|
+ conversation_history=_build_overall_history(messages),
|
|
|
+ grade=grade,
|
|
|
+ vocabulary=vocabulary,
|
|
|
+ sentences=sentences,
|
|
|
+)
|
|
|
+if overall is None:
|
|
|
+ session.overall_status = "failed"
|
|
|
+ session.overall_error_message = "OverallReport evaluator returned invalid output"
|
|
|
+ await db.commit()
|
|
|
+ report_status = "failed"
|
|
|
+else:
|
|
|
+ session.overall_report = overall
|
|
|
+ session.overall_status = "completed"
|
|
|
+ session.overall_error_message = None
|
|
|
+ await db.commit()
|
|
|
+```
|
|
|
+
|
|
|
+Return:
|
|
|
+
|
|
|
+```python
|
|
|
+return {
|
|
|
+ "sessionId": session.uuid,
|
|
|
+ "topic": session.topic,
|
|
|
+ "status": report_status,
|
|
|
+ "rounds": rounds,
|
|
|
+ "overall": session.overall_report,
|
|
|
+ "summary": session.summary,
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+Keep `summary` temporarily for backward compatibility.
|
|
|
+
|
|
|
+- [ ] **Step 6: Run report tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_dialogue_service_report.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 7: Commit**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_report.py
|
|
|
+git commit -m "feat: gate and cache overall reports"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 7: Frontend Types And API Adapter
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/types/englishSpeaking.ts`
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts`
|
|
|
+
|
|
|
+- [ ] **Step 1: Update TypeScript report types**
|
|
|
+
|
|
|
+In `src/types/englishSpeaking.ts`, replace `SentenceEvaluation.feedback` with:
|
|
|
+
|
|
|
+```ts
|
|
|
+ feedback?: {
|
|
|
+ comment: string
|
|
|
+ betterExpression: string
|
|
|
+ }
|
|
|
+```
|
|
|
+
|
|
|
+Add:
|
|
|
+
|
|
|
+```ts
|
|
|
+export type DialogueReportStatus = 'evaluating' | 'ready' | 'failed' | 'incomplete'
|
|
|
+```
|
|
|
+
|
|
|
+Change `DialogueReport` to:
|
|
|
+
|
|
|
+```ts
|
|
|
+export interface DialogueReport {
|
|
|
+ status: DialogueReportStatus
|
|
|
+ evaluation: OverallEvaluation
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Update backend response types**
|
|
|
+
|
|
|
+In `llmService.ts`, change `BackendEvaluation.contentFeedback`:
|
|
|
+
|
|
|
+```ts
|
|
|
+ contentFeedback: {
|
|
|
+ comment: string
|
|
|
+ betterExpression: string
|
|
|
+ } | null
|
|
|
+```
|
|
|
+
|
|
|
+Change `BackendReportResponse`:
|
|
|
+
|
|
|
+```ts
|
|
|
+interface BackendOverall {
|
|
|
+ aiComment: string
|
|
|
+ highlights: string[]
|
|
|
+ improvements: string[]
|
|
|
+}
|
|
|
+
|
|
|
+interface BackendReportResponse {
|
|
|
+ sessionId: string
|
|
|
+ topic: string
|
|
|
+ status: 'evaluating' | 'ready' | 'failed' | 'incomplete'
|
|
|
+ rounds: BackendRound[]
|
|
|
+ overall: BackendOverall | null
|
|
|
+ summary: string | null
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 3: Update `adaptReport` mapping**
|
|
|
+
|
|
|
+Inside `adaptReport`, change `feedback` assignment:
|
|
|
+
|
|
|
+```ts
|
|
|
+ feedback: r.evaluation?.contentFeedback ?? undefined,
|
|
|
+```
|
|
|
+
|
|
|
+Compute dimensions from student evaluations:
|
|
|
+
|
|
|
+```ts
|
|
|
+ const avgDim = (key: 'accuracy' | 'fluency' | 'intonation' | 'stress') => {
|
|
|
+ if (studentEvals.length === 0) return 0
|
|
|
+ return Math.round(studentEvals.reduce((sum, s) => sum + (s.pronunciation?.[key] ?? 0), 0) / studentEvals.length)
|
|
|
+ }
|
|
|
+```
|
|
|
+
|
|
|
+Return:
|
|
|
+
|
|
|
+```ts
|
|
|
+ const overall = raw.overall
|
|
|
+ return {
|
|
|
+ status: raw.status,
|
|
|
+ evaluation: {
|
|
|
+ overallScore: avg,
|
|
|
+ scoreLevel: avg >= 85 ? 'excellent' : avg >= 70 ? 'good' : avg >= 60 ? 'fair' : 'needsWork',
|
|
|
+ percentile: 0,
|
|
|
+ dimensions: {
|
|
|
+ fluency: avgDim('fluency'),
|
|
|
+ interaction: avgDim('intonation'),
|
|
|
+ vocabulary: avgDim('stress'),
|
|
|
+ grammar: avgDim('accuracy'),
|
|
|
+ },
|
|
|
+ aiComment: overall?.aiComment ?? raw.summary ?? '',
|
|
|
+ highlights: overall?.highlights ?? [],
|
|
|
+ improvements: overall?.improvements ?? [],
|
|
|
+ nextChallenge: {},
|
|
|
+ statistics: {
|
|
|
+ totalRounds: Math.max(...sentenceEvaluations.map(s => s.round), 0),
|
|
|
+ averageScore: avg,
|
|
|
+ highestScore: studentEvals.length ? Math.max(...studentEvals.map(s => s.score ?? 0)) : 0,
|
|
|
+ highestRound: studentEvals.reduce((best, s) => ((s.score ?? 0) > (best.score ?? 0) ? s : best), studentEvals[0])?.round ?? 0,
|
|
|
+ grammarErrors: 0,
|
|
|
+ excellentExpressions: 0,
|
|
|
+ totalDuration: 0,
|
|
|
+ },
|
|
|
+ sentenceEvaluations,
|
|
|
+ },
|
|
|
+ }
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run frontend type-check**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/PPT
|
|
|
+npm run type-check
|
|
|
+```
|
|
|
+
|
|
|
+Expected: FAIL until `DetailedReport.vue` and `TopicDiscussionPreview.vue` are updated.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit after dependent frontend tasks pass**
|
|
|
+
|
|
|
+Do not commit this task yet if type-check fails. Commit together with Tasks 8 and 9.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 8: Frontend DetailedReport Rendering
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue`
|
|
|
+
|
|
|
+- [ ] **Step 1: Replace old feedback blocks**
|
|
|
+
|
|
|
+In `DetailedReport.vue`, replace the old `highlights/corrections/suggestions` feedback section with:
|
|
|
+
|
|
|
+```vue
|
|
|
+ <div v-if="sentence.feedback" class="feedback-section">
|
|
|
+ <div v-if="sentence.feedback.comment" class="feedback-block">
|
|
|
+ <div class="feedback-block-label"><span class="fb-good">✓</span> 一句话点评</div>
|
|
|
+ <p class="feedback-text">{{ sentence.feedback.comment }}</p>
|
|
|
+ </div>
|
|
|
+ <div v-if="sentence.feedback.betterExpression" class="feedback-block">
|
|
|
+ <div class="feedback-block-label"><span class="fb-suggest">→</span> 进阶表达</div>
|
|
|
+ <p class="better-expression">{{ sentence.feedback.betterExpression }}</p>
|
|
|
+ </div>
|
|
|
+ </div>
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Add compact styles**
|
|
|
+
|
|
|
+Add styles near existing feedback styles:
|
|
|
+
|
|
|
+```scss
|
|
|
+.feedback-text {
|
|
|
+ margin: 0;
|
|
|
+ font-size: 11px;
|
|
|
+ color: #4b5563;
|
|
|
+ line-height: 1.5;
|
|
|
+}
|
|
|
+
|
|
|
+.better-expression {
|
|
|
+ margin: 0;
|
|
|
+ padding: 8px 10px;
|
|
|
+ background: #fff;
|
|
|
+ border: 1px solid #f3f4f6;
|
|
|
+ border-radius: 8px;
|
|
|
+ font-size: 12px;
|
|
|
+ color: #111827;
|
|
|
+ line-height: 1.5;
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 3: Ensure score rows tolerate missing pronunciation**
|
|
|
+
|
|
|
+Verify existing template already wraps the score rows with:
|
|
|
+
|
|
|
+```vue
|
|
|
+<div v-if="sentence.pronunciation" class="pron-scores">
|
|
|
+```
|
|
|
+
|
|
|
+No change needed if present.
|
|
|
+
|
|
|
+- [ ] **Step 4: Run type-check**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/PPT
|
|
|
+npm run type-check
|
|
|
+```
|
|
|
+
|
|
|
+Expected: still may fail until TopicDiscussionPreview handles report status.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 9: Frontend Report Polling And States
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts`
|
|
|
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/TopicDiscussionPreview.vue`
|
|
|
+
|
|
|
+- [ ] **Step 1: Poll `/report` after dialogue completion**
|
|
|
+
|
|
|
+In `useDialogueEngine.ts`, replace direct single `api.getReport(sessionId.value!)` after completion with:
|
|
|
+
|
|
|
+```ts
|
|
|
+async function waitForReport(sessionId: string): Promise<DialogueReport> {
|
|
|
+ const maxAttempts = 30
|
|
|
+ const intervalMs = 2000
|
|
|
+ for (let attempt = 1; attempt <= maxAttempts; attempt++) {
|
|
|
+ const report = await api.getReport(sessionId)
|
|
|
+ if (report.status === 'ready' || report.status === 'failed' || report.status === 'incomplete') {
|
|
|
+ return report
|
|
|
+ }
|
|
|
+ await new Promise(resolve => setTimeout(resolve, intervalMs))
|
|
|
+ }
|
|
|
+ throw new Error('Report generation timed out')
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+Then call:
|
|
|
+
|
|
|
+```ts
|
|
|
+const report = await waitForReport(sessionId.value!)
|
|
|
+resolve(report)
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 2: Add report status state in preview**
|
|
|
+
|
|
|
+In `TopicDiscussionPreview.vue`, add:
|
|
|
+
|
|
|
+```ts
|
|
|
+const reportStatus = ref<DialogueReport['status'] | null>(null)
|
|
|
+const reportError = ref('')
|
|
|
+```
|
|
|
+
|
|
|
+In `handleDialogueComplete(report)`, set:
|
|
|
+
|
|
|
+```ts
|
|
|
+reportStatus.value = report?.status ?? null
|
|
|
+realEvaluation.value = report?.evaluation ?? null
|
|
|
+if (report?.status === 'failed') reportError.value = '报告生成失败,部分语音评分未完成。'
|
|
|
+else if (report?.status === 'incomplete') reportError.value = '本次练习没有足够的有效回答生成报告。'
|
|
|
+else reportError.value = ''
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 3: Render failed/incomplete state before reports**
|
|
|
+
|
|
|
+In the report stage template, before `OverallReport`, add:
|
|
|
+
|
|
|
+```vue
|
|
|
+ <div v-if="reportError" class="report-error">
|
|
|
+ {{ reportError }}
|
|
|
+ </div>
|
|
|
+```
|
|
|
+
|
|
|
+Add style:
|
|
|
+
|
|
|
+```scss
|
|
|
+.report-error {
|
|
|
+ max-width: 448px;
|
|
|
+ margin: 0 auto 12px;
|
|
|
+ padding: 10px 12px;
|
|
|
+ border: 1px solid #fed7aa;
|
|
|
+ border-radius: 8px;
|
|
|
+ background: #fff7ed;
|
|
|
+ color: #c2410c;
|
|
|
+ font-size: 12px;
|
|
|
+ line-height: 1.5;
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+- [ ] **Step 4: Run frontend type-check**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/PPT
|
|
|
+npm run type-check
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 5: Commit frontend changes**
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/PPT
|
|
|
+git add src/types/englishSpeaking.ts src/views/Editor/EnglishSpeaking/services/llmService.ts src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts src/views/Editor/EnglishSpeaking/preview/TopicDiscussionPreview.vue
|
|
|
+git commit -m "feat: render English speaking report pipeline"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+### Task 10: End-To-End Verification
|
|
|
+
|
|
|
+**Files:**
|
|
|
+- Verify only.
|
|
|
+
|
|
|
+- [ ] **Step 1: Run backend focused tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest tests/service/speaking/test_sentence_feedback_evaluator.py tests/service/speaking/test_overall_report_evaluator.py tests/service/speaking/test_dialogue_service_content.py tests/service/speaking/test_dialogue_service_report.py tests/service/speaking/test_report_logging.py -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 2: Run backend full tests**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
|
|
|
+uv run pytest -q
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 3: Run frontend type-check**
|
|
|
+
|
|
|
+Run:
|
|
|
+
|
|
|
+```bash
|
|
|
+cd /Users/buoy/Development/gitrepo/PPT
|
|
|
+npm run type-check
|
|
|
+```
|
|
|
+
|
|
|
+Expected: PASS.
|
|
|
+
|
|
|
+- [ ] **Step 4: Manual log verification**
|
|
|
+
|
|
|
+Run backend with `LOG_LEVEL=INFO`, complete one streamed practice, then filter logs:
|
|
|
+
|
|
|
+```bash
|
|
|
+rg 'session_id=<session-uuid>' logs/app.log
|
|
|
+rg 'session_id=<session-uuid>.*event=(failed|timeout)' logs/app.log
|
|
|
+rg 'session_id=<session-uuid>.*stage=report.waiting_sentence_evals' logs/app.log
|
|
|
+```
|
|
|
+
|
|
|
+Expected:
|
|
|
+
|
|
|
+- same `session_id` appears across speak stream, sentence eval, and report stages
|
|
|
+- no `failed` or `timeout` entries for happy path
|
|
|
+- if report waits, `report.waiting_sentence_evals` identifies the blocking evaluation
|
|
|
+
|
|
|
+- [ ] **Step 5: Final commits check**
|
|
|
+
|
|
|
+Run in both repos:
|
|
|
+
|
|
|
+```bash
|
|
|
+git status --short
|
|
|
+git log -5 --oneline
|
|
|
+```
|
|
|
+
|
|
|
+Expected: clean worktrees except intentionally uncommitted local config; recent commits match the tasks above.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Self-Review
|
|
|
+
|
|
|
+Spec coverage:
|
|
|
+
|
|
|
+- Per-answer `comment/betterExpression`: Task 3 and Task 5.
|
|
|
+- Azure failure text-only fallback: Task 5.
|
|
|
+- Full OverallReport gate and statuses: Task 6.
|
|
|
+- Overall evaluator input/output and alias parsing: Task 4.
|
|
|
+- Bounded retries prevention: Task 6 covers non-retry via gate; Task 5 should keep evaluator retries inside task-local calls if retries are added.
|
|
|
+- Structured logs: Task 2 and Task 10.
|
|
|
+- Frontend DetailedReport degradation: Task 8.
|
|
|
+- Frontend report polling: Task 9.
|
|
|
+
|
|
|
+Known MVP exclusions:
|
|
|
+
|
|
|
+- No persistent job table.
|
|
|
+- No manual retry endpoint.
|
|
|
+- No complex partial OverallReport UI.
|
|
|
+- No browser visual audit unless requested after implementation.
|