Преглед изворни кода

docs: 单轮内容评语 MVP 实施计划

跨两仓的 TDD 分解:后端加 content_feedback 列 + content_evaluator
模块 + 接线 _evaluate_pronunciation + /report 透传;前端在结果页
adapter 里把 contentFeedback 映射到 SentenceCard 已有的 feedback 插槽。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jimmylee пре 2 недеља
родитељ
комит
7f8ee5ba90
1 измењених фајлова са 1289 додато и 0 уклоњено
  1. 1289 0
      doc/ContentEvaluationPlan.md

+ 1289 - 0
doc/ContentEvaluationPlan.md

@@ -0,0 +1,1289 @@
+# 单轮内容评语 MVP Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** 在现有每轮 Azure 发音四分之外,补一条 LLM 内容评语链路,产出 `{highlights, corrections, suggestions}` 挂在每轮评估上,只在结果页展示。
+
+**Architecture:** 每轮 `/speak` 后台任务里,Azure PA 完成后串联一次 OpenAI JSON-mode 调用;失败降级到 `content_feedback=null` 不影响发音分;`/report` 返回时带 `contentFeedback` 字段。
+
+**Tech Stack:** Python 3.13 · FastAPI · SQLAlchemy 2.x async · MySQL · OpenAI SDK (via onehub base_url) · pytest · uv · Vue 3 · TypeScript
+
+**Repos:**
+- Backend: `/Users/buoy/Development/gitrepo/cococlass-english-speaking-api`
+- Frontend: `/Users/buoy/Development/gitrepo/PPT`
+
+**Spec:** `/Users/buoy/Development/gitrepo/PPT/doc/ContentEvaluationDesign.md`
+
+---
+
+## File Structure
+
+### Backend (cococlass-english-speaking-api)
+
+**Create:**
+- `app/service/speaking/content_evaluator.py` — 单一职责:把 (4 发音分 + AI 上一句 + 学生转录) 丢给 LLM 出 JSON 评语
+- `tests/conftest.py` — pytest 异步 + mock 夹具
+- `tests/service/__init__.py`
+- `tests/service/speaking/__init__.py`
+- `tests/service/speaking/test_content_evaluator.py` — evaluator 的单元测试
+- `tests/service/speaking/test_dialogue_service_content.py` — 串联逻辑的单元测试
+- `migrations/001_add_content_feedback.sql` — 对已有 DB 的增量 SQL
+
+**Modify:**
+- `init.sql` — 对新 DB 的建表语句同步加列
+- `app/models/dialogue.py` — `PronunciationEvaluation` 增加 `content_feedback` 列
+- `app/service/speaking/dialogue_service.py` — `_evaluate_pronunciation` 成功分支后追加 content 评估;`get_report` 返回 `contentFeedback`
+
+### Frontend (PPT)
+
+**Modify:**
+- `src/views/Editor/EnglishSpeaking/services/llmService.ts` — `getReport` 的响应转换(把后端 `rounds[i].evaluation.contentFeedback` 映射到 `sentenceEvaluations[i].feedback`)
+
+不改:`DetailedReport.vue` 已经按 `sentence.feedback.{highlights, corrections, suggestions}` 形状渲染;`englishSpeaking.ts` 的 `SentenceEvaluation.feedback` 类型也已经对齐。
+
+---
+
+## Task 1: [backend] 加 `content_feedback` 列
+
+**Files:**
+- Modify: `cococlass-english-speaking-api/init.sql`(新建表语句)
+- Create: `cococlass-english-speaking-api/migrations/001_add_content_feedback.sql`
+- Modify: `cococlass-english-speaking-api/app/models/dialogue.py`(SQLAlchemy 模型)
+
+- [ ] **Step 1: 更新 `init.sql` 的 `pronunciation_evaluation` 建表语句**
+
+在 `pronunciation_evaluation` 表定义里,`completed_at` 之前插入 `content_feedback` 列:
+
+打开 `cococlass-english-speaking-api/init.sql`,把:
+
+```sql
+    word_analysis JSON NULL,
+    error_message TEXT NULL,
+    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    completed_at DATETIME NULL,
+```
+
+改为:
+
+```sql
+    word_analysis JSON NULL,
+    content_feedback JSON NULL,
+    error_message TEXT NULL,
+    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    completed_at DATETIME NULL,
+```
+
+- [ ] **Step 2: 创建 `migrations/` 目录并写入增量 SQL**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+mkdir -p migrations
+```
+
+创建 `migrations/001_add_content_feedback.sql`,内容:
+
+```sql
+-- Add content_feedback column to existing pronunciation_evaluation table.
+-- Apply once against an existing database (new DBs use updated init.sql).
+ALTER TABLE pronunciation_evaluation
+  ADD COLUMN content_feedback JSON NULL AFTER word_analysis;
+```
+
+- [ ] **Step 3: 更新 SQLAlchemy 模型**
+
+打开 `cococlass-english-speaking-api/app/models/dialogue.py`。
+
+定位:
+
+```python
+    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+```
+
+改为(在中间插入 `content_feedback`):
+
+```python
+    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    content_feedback: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+```
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+git add init.sql migrations/001_add_content_feedback.sql app/models/dialogue.py
+git commit -m "feat(db): 为 pronunciation_evaluation 增加 content_feedback 列"
+```
+
+---
+
+## Task 2: [backend] 搭 pytest 目录骨架 + conftest
+
+本仓库 `tests/` 目前只有空 `__init__.py`。先建立可运行的单测基础。
+
+**Files:**
+- Create: `cococlass-english-speaking-api/tests/conftest.py`
+- Create: `cococlass-english-speaking-api/tests/service/__init__.py`
+- Create: `cococlass-english-speaking-api/tests/service/speaking/__init__.py`
+- Create: `cococlass-english-speaking-api/tests/service/speaking/test_smoke.py`
+
+- [ ] **Step 1: 创建 `tests/conftest.py`**
+
+```python
+"""Pytest global fixtures & asyncio config."""
+
+import pytest
+
+
+@pytest.fixture
+def anyio_backend() -> str:
+    """Force asyncio backend for anyio tests (not trio)."""
+    return "asyncio"
+```
+
+- [ ] **Step 2: 创建空 `__init__.py` 使 pytest 能发现嵌套目录**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+mkdir -p tests/service/speaking
+touch tests/service/__init__.py tests/service/speaking/__init__.py
+```
+
+- [ ] **Step 3: 写冒烟测试确认 pytest 跑得起来**
+
+创建 `tests/service/speaking/test_smoke.py`:
+
+```python
+def test_pytest_works() -> None:
+    assert 1 + 1 == 2
+```
+
+- [ ] **Step 4: 运行冒烟测试**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_smoke.py -v
+```
+
+Expected: `1 passed`.
+
+如果 `uv run pytest` 报 "pytest: command not found",先 `uv sync --group dev` 装开发依赖再重跑。
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+git add tests/
+git commit -m "chore(test): 搭建 pytest 目录骨架和 conftest"
+```
+
+---
+
+## Task 3: [backend] 写 `content_evaluator` 模块(TDD)
+
+**Files:**
+- Create: `cococlass-english-speaking-api/app/service/speaking/content_evaluator.py`
+- Modify: `cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py`(上一任务 smoke 测试文件所在目录,新建另一个文件)
+
+ContentEvaluator 直接实例化 `AsyncOpenAI`(和 `OneHubLLM` 一样用 `settings.ONEHUB_BASE_URL` + `settings.ONEHUB_API_KEY`),因为需要 `response_format` 参数,现有 `LLMProvider.chat()` 接口不暴露它。
+
+- [ ] **Step 1: 写 evaluator 的失败测试(happy path)**
+
+创建 `cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py`:
+
+```python
+"""Unit tests for ContentEvaluator."""
+
+import json
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from app.service.speaking.content_evaluator import ContentEvaluator
+
+
+def _mock_openai_response(content: str) -> MagicMock:
+    """Construct a fake AsyncOpenAI chat completion response."""
+    choice = MagicMock()
+    choice.message.content = content
+    resp = MagicMock()
+    resp.choices = [choice]
+    return resp
+
+
+@pytest.mark.asyncio
+async def test_evaluate_happy_path() -> None:
+    fake_json = json.dumps(
+        {
+            "highlights": ["发音清晰", "句子完整"],
+            "corrections": [
+                {
+                    "original": "I go to park yesterday",
+                    "corrected": "I went to the park yesterday",
+                    "explanation": "过去式应用 went,park 前加 the",
+                }
+            ],
+            "suggestions": ["可增加连接词"],
+        }
+    )
+
+    with patch(
+        "app.service.speaking.content_evaluator.AsyncOpenAI"
+    ) as MockClient:
+        instance = MockClient.return_value
+        instance.chat.completions.create = AsyncMock(
+            return_value=_mock_openai_response(fake_json)
+        )
+
+        evaluator = ContentEvaluator()
+        result = await evaluator.evaluate(
+            transcript="I go to park yesterday",
+            prior_ai_turn="What did you do last weekend?",
+            pron_scores={"accuracy": 72, "fluency": 85, "completeness": 90, "prosody": 60},
+        )
+
+    assert result is not None
+    assert result["highlights"] == ["发音清晰", "句子完整"]
+    assert len(result["corrections"]) == 1
+    assert result["corrections"][0]["corrected"] == "I went to the park yesterday"
+    assert result["suggestions"] == ["可增加连接词"]
+```
+
+- [ ] **Step 2: 运行,确认 fail(模块还不存在)**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_content_evaluator.py -v
+```
+
+Expected: `ModuleNotFoundError: No module named 'app.service.speaking.content_evaluator'` 或类似 import 错误。
+
+- [ ] **Step 3: 实现最小 evaluator 让 happy path 通过**
+
+创建 `cococlass-english-speaking-api/app/service/speaking/content_evaluator.py`:
+
+```python
+"""Per-turn content evaluation via LLM (JSON mode)."""
+
+import asyncio
+import json
+
+from openai import AsyncOpenAI
+
+from app.config import settings
+from app.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+SYSTEM_PROMPT = """You are an English tutor evaluating a student's single spoken turn
+in an open dialogue. You receive:
+- Azure pronunciation scores (accuracy/fluency/completeness/prosody, 0-100)
+- The immediate prior AI turn (context)
+- The student's transcript
+
+Return JSON with exactly these keys:
+- highlights: 1-2 Chinese sentences praising specific strengths. Reference a
+              pronunciation dimension if that score is >= 85. <= 30 chars each.
+- corrections: array of grammar/word-choice fixes. Each item has keys:
+               original (EN), corrected (EN), explanation (ZH, <= 30 chars).
+- suggestions: 1-2 Chinese actionable improvements. Reference a pronunciation
+               dimension if that score is < 70. <= 30 chars each.
+
+Rules:
+- Empty arrays are valid. Do not invent errors to fill quota.
+- If the student only said a filler ("yes", "ok", "hmm"), return empty
+  corrections and suggestions plus one encouragement in highlights.
+- Never include raw score numbers in output text; describe qualitatively
+  ("发音准确度很高" not "accuracy 92").
+- Output MUST be a single JSON object with keys highlights, corrections, suggestions.
+"""
+
+
+class ContentEvaluator:
+    """Generates per-turn content feedback via LLM in JSON mode."""
+
+    def __init__(self, timeout_seconds: float = 10.0):
+        self.client = AsyncOpenAI(
+            base_url=settings.ONEHUB_BASE_URL,
+            api_key=settings.ONEHUB_API_KEY,
+        )
+        self.model = settings.ONEHUB_MODEL
+        self.timeout_seconds = timeout_seconds
+
+    async def evaluate(
+        self,
+        transcript: str,
+        prior_ai_turn: str,
+        pron_scores: dict,
+    ) -> dict | None:
+        """Return {highlights, corrections, suggestions} or None on failure."""
+        user_payload = json.dumps(
+            {
+                "pronunciation": pron_scores,
+                "ai_said": prior_ai_turn,
+                "student_said": transcript,
+            },
+            ensure_ascii=False,
+        )
+
+        try:
+            resp = await asyncio.wait_for(
+                self.client.chat.completions.create(
+                    model=self.model,
+                    messages=[
+                        {"role": "system", "content": SYSTEM_PROMPT},
+                        {"role": "user", "content": user_payload},
+                    ],
+                    response_format={"type": "json_object"},
+                    temperature=0,
+                ),
+                timeout=self.timeout_seconds,
+            )
+        except asyncio.TimeoutError:
+            logger.warning("ContentEvaluator LLM timeout")
+            return None
+        except Exception as e:
+            logger.error(f"ContentEvaluator LLM error: {e}")
+            return None
+
+        raw = resp.choices[0].message.content or ""
+        try:
+            parsed = json.loads(raw)
+        except json.JSONDecodeError:
+            logger.warning(f"ContentEvaluator got non-JSON: {raw[:200]}")
+            return None
+
+        if not self._has_required_shape(parsed):
+            logger.warning(f"ContentEvaluator got invalid shape: {parsed}")
+            return None
+
+        return {
+            "highlights": parsed.get("highlights", []),
+            "corrections": parsed.get("corrections", []),
+            "suggestions": parsed.get("suggestions", []),
+        }
+
+    @staticmethod
+    def _has_required_shape(obj: object) -> bool:
+        if not isinstance(obj, dict):
+            return False
+        for key in ("highlights", "corrections", "suggestions"):
+            if key not in obj or not isinstance(obj[key], list):
+                return False
+        return True
+```
+
+- [ ] **Step 4: 运行 happy path 测试,确认 pass**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_content_evaluator.py::test_evaluate_happy_path -v
+```
+
+Expected: `1 passed`.
+
+如果报错 `pytest-asyncio plugin not installed`,在 `pyproject.toml` 的 `[dependency-groups].dev` 里追加 `"pytest-asyncio>=0.26.0"`,并在 `tests/conftest.py` 顶部加:
+
+```python
+import pytest
+
+pytest_plugins = ["pytest_asyncio"]
+```
+
+再 `uv sync --group dev` 重跑。
+
+- [ ] **Step 5: 加失败分支测试 — JSON 解析失败**
+
+在 `test_content_evaluator.py` 追加:
+
+```python
+@pytest.mark.asyncio
+async def test_evaluate_returns_none_on_invalid_json() -> None:
+    with patch(
+        "app.service.speaking.content_evaluator.AsyncOpenAI"
+    ) as MockClient:
+        instance = MockClient.return_value
+        instance.chat.completions.create = AsyncMock(
+            return_value=_mock_openai_response("not a json")
+        )
+
+        evaluator = ContentEvaluator()
+        result = await evaluator.evaluate(
+            transcript="Hi",
+            prior_ai_turn="Hello",
+            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
+        )
+
+    assert result is None
+```
+
+- [ ] **Step 6: 加失败分支测试 — 超时**
+
+追加:
+
+```python
+@pytest.mark.asyncio
+async def test_evaluate_returns_none_on_timeout() -> None:
+    async def never_returns(**kwargs):
+        await asyncio.sleep(5)
+
+    with patch(
+        "app.service.speaking.content_evaluator.AsyncOpenAI"
+    ) as MockClient:
+        instance = MockClient.return_value
+        instance.chat.completions.create = never_returns
+
+        evaluator = ContentEvaluator(timeout_seconds=0.05)
+        result = await evaluator.evaluate(
+            transcript="Hi",
+            prior_ai_turn="Hello",
+            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
+        )
+
+    assert result is None
+```
+
+同时在文件顶部 import 里加 `import asyncio`(如果还没有)。
+
+- [ ] **Step 7: 加失败分支测试 — 非法 shape**
+
+追加:
+
+```python
+@pytest.mark.asyncio
+async def test_evaluate_returns_none_on_wrong_shape() -> None:
+    # LLM 返回 JSON 但少字段
+    bad = json.dumps({"highlights": ["ok"]})
+    with patch(
+        "app.service.speaking.content_evaluator.AsyncOpenAI"
+    ) as MockClient:
+        instance = MockClient.return_value
+        instance.chat.completions.create = AsyncMock(
+            return_value=_mock_openai_response(bad)
+        )
+
+        evaluator = ContentEvaluator()
+        result = await evaluator.evaluate(
+            transcript="Hi",
+            prior_ai_turn="Hello",
+            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
+        )
+
+    assert result is None
+```
+
+- [ ] **Step 8: 运行全部 evaluator 测试**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_content_evaluator.py -v
+```
+
+Expected: `4 passed`.
+
+- [ ] **Step 9: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+git add app/service/speaking/content_evaluator.py tests/service/speaking/test_content_evaluator.py
+# 如果改了 pyproject.toml / conftest.py
+git add pyproject.toml tests/conftest.py uv.lock 2>/dev/null || true
+git commit -m "feat(speaking): 新增 content_evaluator(LLM JSON 模式生成单轮评语)"
+```
+
+---
+
+## Task 4: [backend] 把 ContentEvaluator 串进 `_evaluate_pronunciation`(TDD)
+
+这是核心集成点:Azure 成功后追加一次 content 评估;Azure 失败则不调用;content 失败不影响 status。
+
+**Files:**
+- Modify: `cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`
+- Create: `cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py`
+
+注意:原 `_evaluate_pronunciation` 通过 `self.assessor` 依赖注入。为了让 content evaluator 可被测替换,下面把它也作为依赖挂到 `DialogueService` 上。
+
+- [ ] **Step 1: 写测试 — Azure 成功 + content 成功 → 两个字段都写入**
+
+创建 `cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py`:
+
+```python
+"""Integration-ish tests for content evaluation wired into DialogueService._evaluate_pronunciation."""
+
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+from app.service.speaking.dialogue_service import DialogueService
+
+
+class _StubDB:
+    """Minimal stand-in for AsyncSession that supports get() + commit()."""
+
+    def __init__(self, evaluation):
+        self._evaluation = evaluation
+        self.commit = AsyncMock()
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, *args):
+        return False
+
+    async def get(self, _cls, _id):
+        return self._evaluation
+
+
+def _fake_evaluation() -> MagicMock:
+    ev = MagicMock()
+    ev.status = "pending"
+    ev.accuracy_score = None
+    ev.fluency_score = None
+    ev.completeness_score = None
+    ev.prosody_score = None
+    ev.word_analysis = None
+    ev.content_feedback = None
+    ev.completed_at = None
+    ev.error_message = None
+    return ev
+
+
+def _build_service(assessor, evaluator) -> DialogueService:
+    return DialogueService(
+        asr=MagicMock(),
+        llm=MagicMock(),
+        assessor=assessor,
+        storage=MagicMock(),
+        content_evaluator=evaluator,
+    )
+
+
+@pytest.mark.asyncio
+async def test_azure_success_then_content_success_writes_both(monkeypatch) -> None:
+    ev = _fake_evaluation()
+    stub_db = _StubDB(ev)
+    monkeypatch.setattr(
+        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
+    )
+
+    assessor = MagicMock()
+    assessor.assess = AsyncMock(
+        return_value={
+            "accuracy_score": 80,
+            "fluency_score": 85,
+            "completeness_score": 90,
+            "prosody_score": 75,
+            "word_analysis": [],
+        }
+    )
+    evaluator = MagicMock()
+    evaluator.evaluate = AsyncMock(
+        return_value={
+            "highlights": ["nice"],
+            "corrections": [],
+            "suggestions": [],
+        }
+    )
+
+    service = _build_service(assessor, evaluator)
+    await service._evaluate_pronunciation(
+        evaluation_id=1,
+        audio_bytes=b"",
+        reference_text="hi",
+        prior_ai_turn="hello",
+    )
+
+    assert ev.status == "completed"
+    assert ev.accuracy_score == 80
+    assert ev.content_feedback == {"highlights": ["nice"], "corrections": [], "suggestions": []}
+    evaluator.evaluate.assert_awaited_once()
+```
+
+- [ ] **Step 2: 写测试 — Azure 成功 + content 失败 → content_feedback None,status 仍 completed**
+
+追加:
+
+```python
+@pytest.mark.asyncio
+async def test_azure_success_content_failure_keeps_status_completed(monkeypatch) -> None:
+    ev = _fake_evaluation()
+    stub_db = _StubDB(ev)
+    monkeypatch.setattr(
+        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
+    )
+
+    assessor = MagicMock()
+    assessor.assess = AsyncMock(
+        return_value={
+            "accuracy_score": 80,
+            "fluency_score": 85,
+            "completeness_score": 90,
+            "prosody_score": 75,
+            "word_analysis": [],
+        }
+    )
+    evaluator = MagicMock()
+    evaluator.evaluate = AsyncMock(return_value=None)  # LLM failed
+
+    service = _build_service(assessor, evaluator)
+    await service._evaluate_pronunciation(
+        evaluation_id=1,
+        audio_bytes=b"",
+        reference_text="hi",
+        prior_ai_turn="hello",
+    )
+
+    assert ev.status == "completed"
+    assert ev.accuracy_score == 80
+    assert ev.content_feedback is None
+```
+
+- [ ] **Step 3: 写测试 — Azure 失败 → ContentEvaluator 不被调用**
+
+追加:
+
+```python
+@pytest.mark.asyncio
+async def test_azure_failure_skips_content_evaluator(monkeypatch) -> None:
+    ev = _fake_evaluation()
+    stub_db = _StubDB(ev)
+    monkeypatch.setattr(
+        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
+    )
+
+    assessor = MagicMock()
+    assessor.assess = AsyncMock(side_effect=RuntimeError("azure exploded"))
+    evaluator = MagicMock()
+    evaluator.evaluate = AsyncMock()
+
+    service = _build_service(assessor, evaluator)
+    await service._evaluate_pronunciation(
+        evaluation_id=1,
+        audio_bytes=b"",
+        reference_text="hi",
+        prior_ai_turn="hello",
+    )
+
+    assert ev.status == "failed"
+    assert ev.content_feedback is None
+    evaluator.evaluate.assert_not_awaited()
+```
+
+- [ ] **Step 4: 运行测试,确认 fail**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_dialogue_service_content.py -v
+```
+
+Expected: 3 tests fail(`DialogueService.__init__` 还没有 `content_evaluator` 参数,`_evaluate_pronunciation` 也没有 `prior_ai_turn` 参数)。
+
+- [ ] **Step 5: 修改 `DialogueService.__init__` 接受 content_evaluator**
+
+打开 `cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`。
+
+在文件顶部 import 追加:
+
+```python
+from app.service.speaking.content_evaluator import ContentEvaluator
+```
+
+定位 `__init__`:
+
+```python
+    def __init__(
+        self,
+        asr: ASRProvider,
+        llm: LLMProvider,
+        assessor: PronunciationAssessor,
+        storage: AudioStorage,
+    ):
+        self.asr = asr
+        self.llm = llm
+        self.assessor = assessor
+        self.storage = storage
+```
+
+改为:
+
+```python
+    def __init__(
+        self,
+        asr: ASRProvider,
+        llm: LLMProvider,
+        assessor: PronunciationAssessor,
+        storage: AudioStorage,
+        content_evaluator: ContentEvaluator | None = None,
+    ):
+        self.asr = asr
+        self.llm = llm
+        self.assessor = assessor
+        self.storage = storage
+        self.content_evaluator = content_evaluator or ContentEvaluator()
+```
+
+- [ ] **Step 6: 改 `_evaluate_pronunciation` 签名和逻辑**
+
+定位现有实现(`dialogue_service.py:321` 左右):
+
+```python
+    async def _evaluate_pronunciation(
+        self,
+        evaluation_id: int,
+        audio_bytes: bytes,
+        reference_text: str,
+        content_type: str = "audio/webm;codecs=opus",
+    ):
+        """后台静默发音评估"""
+        from app.models.database import async_session
+
+        async with async_session() as db:
+            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
+            if not evaluation:
+                logger.error(f"Evaluation record not found: id={evaluation_id}")
+                return
+
+            try:
+                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
+                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
+                evaluation.status = "completed"
+                evaluation.accuracy_score = result["accuracy_score"]
+                evaluation.fluency_score = result["fluency_score"]
+                evaluation.completeness_score = result["completeness_score"]
+                evaluation.prosody_score = result["prosody_score"]
+                evaluation.word_analysis = result.get("word_analysis")
+                evaluation.completed_at = datetime.now()
+            except Exception as e:
+                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
+                evaluation.status = "failed"
+                evaluation.error_message = str(e)
+
+            await db.commit()
+```
+
+改为:
+
+```python
+    async def _evaluate_pronunciation(
+        self,
+        evaluation_id: int,
+        audio_bytes: bytes,
+        reference_text: str,
+        prior_ai_turn: str = "",
+        content_type: str = "audio/webm;codecs=opus",
+    ):
+        """后台静默发音评估 + 内容评语"""
+        from app.models.database import async_session
+
+        async with async_session() as db:
+            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
+            if not evaluation:
+                logger.error(f"Evaluation record not found: id={evaluation_id}")
+                return
+
+            try:
+                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
+                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
+                evaluation.status = "completed"
+                evaluation.accuracy_score = result["accuracy_score"]
+                evaluation.fluency_score = result["fluency_score"]
+                evaluation.completeness_score = result["completeness_score"]
+                evaluation.prosody_score = result["prosody_score"]
+                evaluation.word_analysis = result.get("word_analysis")
+                evaluation.completed_at = datetime.now()
+
+                # Content evaluation: 仅在 Azure 成功时触发;失败不影响 status。
+                try:
+                    content_feedback = await self.content_evaluator.evaluate(
+                        transcript=reference_text,
+                        prior_ai_turn=prior_ai_turn,
+                        pron_scores={
+                            "accuracy": result["accuracy_score"],
+                            "fluency": result["fluency_score"],
+                            "completeness": result["completeness_score"],
+                            "prosody": result["prosody_score"],
+                        },
+                    )
+                    evaluation.content_feedback = content_feedback
+                    logger.info(
+                        f"Content evaluation done: eval={evaluation_id}, "
+                        f"has_feedback={content_feedback is not None}"
+                    )
+                except Exception as e:
+                    logger.error(f"Content evaluation error (soft-fail): eval={evaluation_id}, error={e}")
+                    evaluation.content_feedback = None
+
+            except Exception as e:
+                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
+                evaluation.status = "failed"
+                evaluation.error_message = str(e)
+
+            await db.commit()
+```
+
+- [ ] **Step 7: 更新 `speak()` 里的 `asyncio.create_task` 传入 `prior_ai_turn`**
+
+定位 `speak()` 方法内现有的 `create_task` 调用(`dialogue_service.py:189` 左右):
+
+```python
+            asyncio.create_task(
+                self._evaluate_pronunciation(
+                    evaluation_id=evaluation.id,
+                    audio_bytes=audio_bytes,
+                    reference_text=transcript,
+                    content_type=content_type,
+                )
+            )
+```
+
+在 create_task 之前,计算 `prior_ai_turn`。新增变量(放在 "⑩ 后台发音评估" 之前):
+
+```python
+            # 找到本轮 student 消息之前最近的一条 AI 消息作为 content 评估的上下文
+            prior_ai_turn = ""
+            for msg in reversed(history):
+                if msg.role == "ai":
+                    prior_ai_turn = msg.content
+                    break
+```
+
+然后把 create_task 改为:
+
+```python
+            asyncio.create_task(
+                self._evaluate_pronunciation(
+                    evaluation_id=evaluation.id,
+                    audio_bytes=audio_bytes,
+                    reference_text=transcript,
+                    prior_ai_turn=prior_ai_turn,
+                    content_type=content_type,
+                )
+            )
+```
+
+- [ ] **Step 8: 运行测试,确认全部通过**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_dialogue_service_content.py -v
+```
+
+Expected: `3 passed`.
+
+如果报 `ImportError: cannot import name 'ContentEvaluator' from partial init`(循环引用),把 `ContentEvaluator` 的 import 放到 `dialogue_service.py` 的 `__init__` 方法内的首行(延迟导入)而不是文件顶部。
+
+- [ ] **Step 9: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_content.py
+git commit -m "feat(speaking): 在 _evaluate_pronunciation 串联 content_evaluator"
+```
+
+---
+
+## Task 5: [backend] `/report` 返回 `contentFeedback`
+
+**Files:**
+- Modify: `cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`(`get_report` 方法)
+- Create: `cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py`
+
+- [ ] **Step 1: 写测试 — evaluation 带 content_feedback 时,report entry 也带**
+
+创建 `cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py`:
+
+```python
+"""Tests for get_report including content_feedback pass-through."""
+
+from unittest.mock import MagicMock
+
+import pytest
+
+
+def _stub_message(role: str, content: str, round_: int, evaluation=None):
+    msg = MagicMock()
+    msg.role = role
+    msg.content = content
+    msg.round = round_
+    msg.audio_url = None
+    msg.evaluation = evaluation
+    return msg
+
+
+def _stub_evaluation(content_feedback=None, status="completed"):
+    ev = MagicMock()
+    ev.status = status
+    ev.accuracy_score = 80
+    ev.fluency_score = 80
+    ev.completeness_score = 80
+    ev.prosody_score = 80
+    ev.word_analysis = None
+    ev.content_feedback = content_feedback
+    return ev
+
+
+def _build_report_entry(msg) -> dict:
+    """Replicates the entry construction in DialogueService.get_report.
+
+    We only exercise the dict-shaping step in isolation — the full get_report
+    path hits DB/LLM summary and is not needed for this contract check.
+    """
+    entry = {
+        "round": msg.round,
+        "role": msg.role,
+        "content": msg.content,
+        "audioUrl": msg.audio_url,
+    }
+    if msg.role == "student" and msg.evaluation:
+        ev = msg.evaluation
+        entry["evaluation"] = {
+            "status": ev.status,
+            "accuracyScore": ev.accuracy_score,
+            "fluencyScore": ev.fluency_score,
+            "completenessScore": ev.completeness_score,
+            "prosodyScore": ev.prosody_score,
+            "wordAnalysis": ev.word_analysis,
+            "contentFeedback": ev.content_feedback,
+        }
+    return entry
+
+
+def test_report_entry_includes_content_feedback_when_present() -> None:
+    feedback = {"highlights": ["good"], "corrections": [], "suggestions": []}
+    ev = _stub_evaluation(content_feedback=feedback)
+    msg = _stub_message("student", "hi", 1, evaluation=ev)
+
+    entry = _build_report_entry(msg)
+
+    assert entry["evaluation"]["contentFeedback"] == feedback
+
+
+def test_report_entry_content_feedback_is_null_when_absent() -> None:
+    ev = _stub_evaluation(content_feedback=None)
+    msg = _stub_message("student", "hi", 1, evaluation=ev)
+
+    entry = _build_report_entry(msg)
+
+    assert entry["evaluation"]["contentFeedback"] is None
+
+
+def test_ai_message_has_no_evaluation_key() -> None:
+    msg = _stub_message("ai", "hello", 1, evaluation=None)
+    entry = _build_report_entry(msg)
+    assert "evaluation" not in entry
+```
+
+这里我们测试的是 entry-shaping 的契约(独立 helper)。真实 `get_report` 里我们要修改同样的 entry 构造块保持一致。
+
+- [ ] **Step 2: 运行测试,应该全过(独立 helper 不依赖还没修改的代码)**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/test_dialogue_service_report.py -v
+```
+
+Expected: `3 passed`. 这一步验证的是契约,下一步把它应用到真实代码。
+
+- [ ] **Step 3: 修改 `get_report` 的 entry 构造块**
+
+打开 `cococlass-english-speaking-api/app/service/speaking/dialogue_service.py`,定位:
+
+```python
+            if msg.role == "student" and msg.evaluation:
+                ev = msg.evaluation
+                entry["evaluation"] = {
+                    "status": ev.status,
+                    "accuracyScore": ev.accuracy_score,
+                    "fluencyScore": ev.fluency_score,
+                    "completenessScore": ev.completeness_score,
+                    "prosodyScore": ev.prosody_score,
+                    "wordAnalysis": ev.word_analysis,
+                }
+```
+
+改为:
+
+```python
+            if msg.role == "student" and msg.evaluation:
+                ev = msg.evaluation
+                entry["evaluation"] = {
+                    "status": ev.status,
+                    "accuracyScore": ev.accuracy_score,
+                    "fluencyScore": ev.fluency_score,
+                    "completenessScore": ev.completeness_score,
+                    "prosodyScore": ev.prosody_score,
+                    "wordAnalysis": ev.word_analysis,
+                    "contentFeedback": ev.content_feedback,
+                }
+```
+
+- [ ] **Step 4: 重跑 report 相关测试**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest tests/service/speaking/ -v
+```
+
+Expected: 全部 pass(smoke 1 + evaluator 4 + content 3 + report 3 = 11 passed)。
+
+- [ ] **Step 5: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_report.py
+git commit -m "feat(speaking): /report 返回每轮 contentFeedback"
+```
+
+---
+
+## Task 6: [frontend] 把 `contentFeedback` 透传到 `sentence.feedback`
+
+`DetailedReport.vue` 已经按 `sentence.feedback.{highlights, corrections, suggestions}` 渲染(`PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue:94-116`),所以前端只需要在 `getReport` 响应转 `OverallEvaluation` 的地方加一个 field pass-through。
+
+**Files:**
+- Modify: `PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts`
+
+- [ ] **Step 1: 定位后端→前端形状转换位置**
+
+运行:
+
+```bash
+grep -n "rounds\|sentenceEvaluations\|evaluation" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts
+```
+
+后端 `/report` 返回 `{ sessionId, topic, status, rounds[], summary }`,前端 `DialogueReport` 期望 `{ evaluation: OverallEvaluation }`(`sentenceEvaluations[]` 里每项的 `feedback` 字段)。当前 `RealDialogueAPI.getReport()`(`llmService.ts:86-92`)直接 `return res.json()`,不做形状转换。
+
+这意味着:**当前前端要么通过其他层做 shape adaption,要么 `DetailedReport.vue` 从别处拿数据**。先跑一次 grep 找适配位置:
+
+```bash
+grep -rn "sentenceEvaluations\|rounds" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking --include="*.ts" --include="*.vue" | head -30
+```
+
+- [ ] **Step 2: 根据 Step 1 结果选一条支路**
+
+**支路 A(理想情况):如果已经有一个 `mapReportToEvaluation(backendRes)` 之类的函数**
+- 在那个函数里给每个 sentence 加 `feedback: round.evaluation?.contentFeedback ?? undefined`
+- 继续 Step 3
+
+**支路 B(没有转换层):如果 `getReport` 的返回值直接裸传给组件**
+- 在 `RealDialogueAPI.getReport()` 里把 `rounds[]` 转成 `OverallEvaluation.sentenceEvaluations[]`,其中每个 `student` 角色的轮次 emit 一个 `SentenceEvaluation` 带 `feedback: r.evaluation?.contentFeedback ?? undefined`
+- 继续 Step 3
+
+**支路 C(Mock API 已经长成 `{ evaluation: OverallEvaluation }` 但真实后端没适配):** 这是当前最可能的状态。此时必须在 `RealDialogueAPI.getReport()` 里写一个显式 adapter。按支路 B 实现。
+
+- [ ] **Step 3: 在 `RealDialogueAPI.getReport` 里加 adapter(假设走支路 B/C)**
+
+打开 `PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts`。
+
+把:
+
+```typescript
+  async getReport(sessionId: string): Promise<DialogueReport> {
+    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
+      credentials: 'include',
+    })
+    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
+    return res.json()
+  }
+```
+
+改为:
+
+```typescript
+  async getReport(sessionId: string): Promise<DialogueReport> {
+    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
+      credentials: 'include',
+    })
+    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
+    const raw = await res.json() as BackendReportResponse
+    return adaptReport(raw)
+  }
+```
+
+在 `RealDialogueAPI` 类定义**之前**加:
+
+```typescript
+interface BackendEvaluation {
+  status: 'pending' | 'completed' | 'failed'
+  accuracyScore: number | null
+  fluencyScore: number | null
+  completenessScore: number | null
+  prosodyScore: number | null
+  wordAnalysis: unknown
+  contentFeedback: {
+    highlights: string[]
+    corrections: { original: string; corrected: string; explanation: string }[]
+    suggestions: string[]
+  } | null
+}
+
+interface BackendRound {
+  round: number
+  role: 'ai' | 'student'
+  content: string
+  audioUrl: string | null
+  evaluation?: BackendEvaluation
+}
+
+interface BackendReportResponse {
+  sessionId: string
+  topic: string
+  status: 'evaluating' | 'ready'
+  rounds: BackendRound[]
+  summary: string | null
+}
+
+function adaptReport(raw: BackendReportResponse): DialogueReport {
+  const sentenceEvaluations: SentenceEvaluation[] = raw.rounds.map((r, idx) => ({
+    id: `${raw.sessionId}-${idx}`,
+    round: r.round,
+    role: r.role,
+    content: r.content,
+    audioUrl: r.audioUrl ?? undefined,
+    pronunciation: r.evaluation && r.role === 'student'
+      ? {
+          accuracy: r.evaluation.accuracyScore ?? 0,
+          fluency: r.evaluation.fluencyScore ?? 0,
+          // enspeak 原型用 intonation/stress 做 UI label;把 Azure 的 prosody/completeness 分别
+          // 映射到这两格(prosody → intonation 表示语调、completeness → stress 表示完整读出)。
+          // 这是一个 UI 贴合性决定,如未来 UI 统一改用 Azure 四维,再把 key 改回来。
+          intonation: r.evaluation.prosodyScore ?? 0,
+          stress: r.evaluation.completenessScore ?? 0,
+        }
+      : undefined,
+    feedback: r.evaluation?.contentFeedback ?? undefined,
+  }))
+
+  // overallScore 先用平均分作为 MVP 占位;其他字段留空/安全默认。
+  const studentEvals = sentenceEvaluations.filter(s => s.role === 'student' && s.pronunciation)
+  const avg = studentEvals.length > 0
+    ? Math.round(
+        studentEvals.reduce(
+          (sum, s) => sum + (s.pronunciation!.accuracy + s.pronunciation!.fluency + s.pronunciation!.intonation + s.pronunciation!.stress) / 4,
+          0,
+        ) / studentEvals.length,
+      )
+    : 0
+
+  return {
+    evaluation: {
+      overallScore: avg,
+      scoreLevel: avg >= 85 ? 'excellent' : avg >= 70 ? 'good' : avg >= 60 ? 'fair' : 'needsWork',
+      percentile: 0,
+      dimensions: { fluency: 0, interaction: 0, vocabulary: 0, grammar: 0 },
+      aiComment: raw.summary ?? '',
+      highlights: [],
+      improvements: [],
+      nextChallenge: {},
+      statistics: {
+        totalRounds: Math.max(...sentenceEvaluations.map(s => s.round), 0),
+        averageScore: avg,
+        highestScore: 0,
+        highestRound: 0,
+        grammarErrors: 0,
+        excellentExpressions: 0,
+        totalDuration: 0,
+      },
+      sentenceEvaluations,
+    },
+  }
+}
+```
+
+然后在顶部 import 里追加 `SentenceEvaluation`:
+
+```typescript
+import type {
+  DialogueAPI, DialogueReport, SessionConfig, SessionInfo, SSEEvent,
+  SentenceEvaluation,
+} from '@/types/englishSpeaking'
+```
+
+(如果 `SentenceEvaluation` 未从 `englishSpeaking.ts` 导出,先去那个文件确认 `export interface SentenceEvaluation` 已加 `export` 关键字。)
+
+**注意**:如果 Step 1 的 grep 显示已经有现成的 adapter 函数,**以现有适配层为准**——只在那里追加 `feedback` 字段、不新建 adapter。跳过这里的 `adaptReport` 整段代码,改为找到现有函数加一行 pass-through。
+
+- [ ] **Step 4: 类型检查**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+(如果项目用 `pnpm` / `yarn`,相应调整。若没有 `type-check` script,跑 `npx vue-tsc --noEmit`。)
+
+Expected: 无 type error。
+
+- [ ] **Step 5: 手动 smoke 验证**
+
+1. 启动后端:
+   ```bash
+   cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+   uv run uvicorn app.main:app --reload
+   ```
+2. 启动前端:
+   ```bash
+   cd /Users/buoy/Development/gitrepo/PPT
+   npm run dev
+   ```
+3. 浏览器进入 EnglishSpeaking 组件,完成一轮对话。
+4. 打开结果页(DetailedReport),确认每轮学生句子下面能看到"亮点 / 改正 / 建议"三段内容。
+5. 同时在后端 DB 查:
+   ```sql
+   SELECT round, status, accuracy_score, content_feedback
+   FROM pronunciation_evaluation
+   WHERE session_id = (SELECT id FROM dialogue_session ORDER BY id DESC LIMIT 1);
+   ```
+   确认 `content_feedback` 是 `{highlights, corrections, suggestions}` 结构(或 `NULL` 如果 LLM 失败)。
+
+任意一项不通过,回到对应 Task 定位 bug。
+
+- [ ] **Step 6: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/services/llmService.ts
+git commit -m "feat(english-speaking): 结果页透传 contentFeedback 到 SentenceCard"
+```
+
+---
+
+## Task 7: 回归校验全部测试和现有流程
+
+- [ ] **Step 1: 跑后端全测试**
+
+```bash
+cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
+uv run pytest -v
+```
+
+Expected: 所有 test 通过(包含本次新增的 smoke 1 + evaluator 4 + content-dispatch 3 + report 3 = 11 个)。
+
+- [ ] **Step 2: 跑前端类型检查**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: 无 type error。
+
+- [ ] **Step 3: 把两个 repo 的 HEAD 记下来,作为本次实施的完成标记**
+
+```bash
+echo "backend:  $(git -C /Users/buoy/Development/gitrepo/cococlass-english-speaking-api rev-parse --short HEAD)"
+echo "frontend: $(git -C /Users/buoy/Development/gitrepo/PPT rev-parse --short HEAD)"
+```
+
+把输出贴到本 plan 文件底部的"完成记录"栏。
+
+---
+
+## 完成记录(实施时填写)
+
+- 计划完成日期:_____
+- 后端 HEAD:_____
+- 前端 HEAD:_____
+- 偏差或额外说明:_____