单轮内容评语 MVP Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 在现有每轮 Azure 发音四分之外，补一条 LLM 内容评语链路，产出 {highlights, corrections, suggestions} 挂在每轮评估上，只在结果页展示。

Architecture: 每轮 /speak 后台任务里，Azure PA 完成后串联一次 OpenAI JSON-mode 调用；失败降级到 content_feedback=null 不影响发音分；/report 返回时带 contentFeedback 字段。

Tech Stack: Python 3.13 · FastAPI · SQLAlchemy 2.x async · MySQL · OpenAI SDK (via onehub base_url) · pytest · uv · Vue 3 · TypeScript

Repos:

Backend: /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
Frontend: /Users/buoy/Development/gitrepo/PPT

Spec: /Users/buoy/Development/gitrepo/PPT/doc/ContentEvaluationDesign.md

File Structure

Backend (cococlass-english-speaking-api)

Create:

app/service/speaking/content_evaluator.py — 单一职责：把 (4 发音分 + AI 上一句 + 学生转录) 丢给 LLM 出 JSON 评语
tests/conftest.py — pytest 异步 + mock 夹具
tests/service/__init__.py
tests/service/speaking/__init__.py
tests/service/speaking/test_content_evaluator.py — evaluator 的单元测试
tests/service/speaking/test_dialogue_service_content.py — 串联逻辑的单元测试
migrations/001_add_content_feedback.sql — 对已有 DB 的增量 SQL

Modify:

init.sql — 对新 DB 的建表语句同步加列
app/models/dialogue.py — PronunciationEvaluation 增加 content_feedback 列
app/service/speaking/dialogue_service.py — _evaluate_pronunciation 成功分支后追加 content 评估；get_report 返回 contentFeedback

Frontend (PPT)

Modify:

src/views/Editor/EnglishSpeaking/services/llmService.ts — getReport 的响应转换（把后端 rounds[i].evaluation.contentFeedback 映射到 sentenceEvaluations[i].feedback）

不改：DetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 形状渲染；englishSpeaking.ts 的 SentenceEvaluation.feedback 类型也已经对齐。

Task 1: [backend] 加 `content_feedback` 列

Files:

Modify: cococlass-english-speaking-api/init.sql（新建表语句）
Create: cococlass-english-speaking-api/migrations/001_add_content_feedback.sql
Modify: cococlass-english-speaking-api/app/models/dialogue.py（SQLAlchemy 模型）
[ ] Step 1: 更新 init.sql 的 pronunciation_evaluation 建表语句

在 pronunciation_evaluation 表定义里，completed_at 之前插入 content_feedback 列：

打开 cococlass-english-speaking-api/init.sql，把：

    word_analysis JSON NULL,
    error_message TEXT NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    completed_at DATETIME NULL,

改为：

    word_analysis JSON NULL,
    content_feedback JSON NULL,
    error_message TEXT NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    completed_at DATETIME NULL,

Step 2: 创建 migrations/ 目录并写入增量 SQL

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p migrations

创建 migrations/001_add_content_feedback.sql，内容：

-- Add content_feedback column to existing pronunciation_evaluation table.
-- Apply once against an existing database (new DBs use updated init.sql).
ALTER TABLE pronunciation_evaluation
  ADD COLUMN content_feedback JSON NULL AFTER word_analysis;

Step 3: 更新 SQLAlchemy 模型

打开 cococlass-english-speaking-api/app/models/dialogue.py。

定位：

    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)

改为（在中间插入 content_feedback）：

    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    content_feedback: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)

Step 4: Commit

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add init.sql migrations/001_add_content_feedback.sql app/models/dialogue.py
git commit -m "feat(db): 为 pronunciation_evaluation 增加 content_feedback 列"

Task 2: [backend] 搭 pytest 目录骨架 + conftest

本仓库 tests/ 目前只有空 __init__.py。先建立可运行的单测基础。

Files:

Create: cococlass-english-speaking-api/tests/conftest.py
Create: cococlass-english-speaking-api/tests/service/__init__.py
Create: cococlass-english-speaking-api/tests/service/speaking/__init__.py
Create: cococlass-english-speaking-api/tests/service/speaking/test_smoke.py
[ ] Step 1: 创建 tests/conftest.py

"""Pytest global fixtures & asyncio config."""

import pytest


@pytest.fixture
def anyio_backend() -> str:
    """Force asyncio backend for anyio tests (not trio)."""
    return "asyncio"

Step 2: 创建空 __init__.py 使 pytest 能发现嵌套目录

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p tests/service/speaking
touch tests/service/__init__.py tests/service/speaking/__init__.py

Step 3: 写冒烟测试确认 pytest 跑得起来

创建 tests/service/speaking/test_smoke.py：

def test_pytest_works() -> None:
    assert 1 + 1 == 2

Step 4: 运行冒烟测试

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_smoke.py -v

Expected: 1 passed.

如果 uv run pytest 报 "pytest: command not found"，先 uv sync --group dev 装开发依赖再重跑。

Step 5: Commit

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add tests/
git commit -m "chore(test): 搭建 pytest 目录骨架和 conftest"

Task 3: [backend] 写 `content_evaluator` 模块（TDD）

Files:

Create: cococlass-english-speaking-api/app/service/speaking/content_evaluator.py
Modify: cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py（上一任务 smoke 测试文件所在目录，新建另一个文件）

ContentEvaluator 直接实例化 AsyncOpenAI（和 OneHubLLM 一样用 settings.ONEHUB_BASE_URL + settings.ONEHUB_API_KEY），因为需要 response_format 参数，现有 LLMProvider.chat() 接口不暴露它。

Step 1: 写 evaluator 的失败测试（happy path）

创建 cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py：

"""Unit tests for ContentEvaluator."""

import json
from unittest.mock import AsyncMock, MagicMock, patch

import pytest

from app.service.speaking.content_evaluator import ContentEvaluator


def _mock_openai_response(content: str) -> MagicMock:
    """Construct a fake AsyncOpenAI chat completion response."""
    choice = MagicMock()
    choice.message.content = content
    resp = MagicMock()
    resp.choices = [choice]
    return resp


@pytest.mark.asyncio
async def test_evaluate_happy_path() -> None:
    fake_json = json.dumps(
        {
            "highlights": ["发音清晰", "句子完整"],
            "corrections": [
                {
                    "original": "I go to park yesterday",
                    "corrected": "I went to the park yesterday",
                    "explanation": "过去式应用 went，park 前加 the",
                }
            ],
            "suggestions": ["可增加连接词"],
        }
    )

    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response(fake_json)
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="I go to park yesterday",
            prior_ai_turn="What did you do last weekend?",
            pron_scores={"accuracy": 72, "fluency": 85, "completeness": 90, "prosody": 60},
        )

    assert result is not None
    assert result["highlights"] == ["发音清晰", "句子完整"]
    assert len(result["corrections"]) == 1
    assert result["corrections"][0]["corrected"] == "I went to the park yesterday"
    assert result["suggestions"] == ["可增加连接词"]

Step 2: 运行，确认 fail（模块还不存在）

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v

Expected: ModuleNotFoundError: No module named 'app.service.speaking.content_evaluator' 或类似 import 错误。

Step 3: 实现最小 evaluator 让 happy path 通过

创建 cococlass-english-speaking-api/app/service/speaking/content_evaluator.py：

"""Per-turn content evaluation via LLM (JSON mode)."""

import asyncio
import json

from openai import AsyncOpenAI

from app.config import settings
from app.logging import get_logger

logger = get_logger(__name__)


SYSTEM_PROMPT = """You are an English tutor evaluating a student's single spoken turn
in an open dialogue. You receive:
- Azure pronunciation scores (accuracy/fluency/completeness/prosody, 0-100)
- The immediate prior AI turn (context)
- The student's transcript

Return JSON with exactly these keys:
- highlights: 1-2 Chinese sentences praising specific strengths. Reference a
              pronunciation dimension if that score is >= 85. <= 30 chars each.
- corrections: array of grammar/word-choice fixes. Each item has keys:
               original (EN), corrected (EN), explanation (ZH, <= 30 chars).
- suggestions: 1-2 Chinese actionable improvements. Reference a pronunciation
               dimension if that score is < 70. <= 30 chars each.

Rules:
- Empty arrays are valid. Do not invent errors to fill quota.
- If the student only said a filler ("yes", "ok", "hmm"), return empty
  corrections and suggestions plus one encouragement in highlights.
- Never include raw score numbers in output text; describe qualitatively
  ("发音准确度很高" not "accuracy 92").
- Output MUST be a single JSON object with keys highlights, corrections, suggestions.
"""


class ContentEvaluator:
    """Generates per-turn content feedback via LLM in JSON mode."""

    def __init__(self, timeout_seconds: float = 10.0):
        self.client = AsyncOpenAI(
            base_url=settings.ONEHUB_BASE_URL,
            api_key=settings.ONEHUB_API_KEY,
        )
        self.model = settings.ONEHUB_MODEL
        self.timeout_seconds = timeout_seconds

    async def evaluate(
        self,
        transcript: str,
        prior_ai_turn: str,
        pron_scores: dict,
    ) -> dict | None:
        """Return {highlights, corrections, suggestions} or None on failure."""
        user_payload = json.dumps(
            {
                "pronunciation": pron_scores,
                "ai_said": prior_ai_turn,
                "student_said": transcript,
            },
            ensure_ascii=False,
        )

        try:
            resp = await asyncio.wait_for(
                self.client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": SYSTEM_PROMPT},
                        {"role": "user", "content": user_payload},
                    ],
                    response_format={"type": "json_object"},
                    temperature=0,
                ),
                timeout=self.timeout_seconds,
            )
        except asyncio.TimeoutError:
            logger.warning("ContentEvaluator LLM timeout")
            return None
        except Exception as e:
            logger.error(f"ContentEvaluator LLM error: {e}")
            return None

        raw = resp.choices[0].message.content or ""
        try:
            parsed = json.loads(raw)
        except json.JSONDecodeError:
            logger.warning(f"ContentEvaluator got non-JSON: {raw[:200]}")
            return None

        if not self._has_required_shape(parsed):
            logger.warning(f"ContentEvaluator got invalid shape: {parsed}")
            return None

        return {
            "highlights": parsed.get("highlights", []),
            "corrections": parsed.get("corrections", []),
            "suggestions": parsed.get("suggestions", []),
        }

    @staticmethod
    def _has_required_shape(obj: object) -> bool:
        if not isinstance(obj, dict):
            return False
        for key in ("highlights", "corrections", "suggestions"):
            if key not in obj or not isinstance(obj[key], list):
                return False
        return True

Step 4: 运行 happy path 测试，确认 pass

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py::test_evaluate_happy_path -v

Expected: 1 passed.

如果报错 pytest-asyncio plugin not installed，在 pyproject.toml 的 [dependency-groups].dev 里追加 "pytest-asyncio>=0.26.0"，并在 tests/conftest.py 顶部加：

import pytest

pytest_plugins = ["pytest_asyncio"]

再 uv sync --group dev 重跑。

Step 5: 加失败分支测试 — JSON 解析失败

在 test_content_evaluator.py 追加：

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_invalid_json() -> None:
    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response("not a json")
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None

Step 6: 加失败分支测试 — 超时

追加：

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_timeout() -> None:
    async def never_returns(**kwargs):
        await asyncio.sleep(5)

    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = never_returns

        evaluator = ContentEvaluator(timeout_seconds=0.05)
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None

同时在文件顶部 import 里加 import asyncio（如果还没有）。

Step 7: 加失败分支测试 — 非法 shape

追加：

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_wrong_shape() -> None:
    # LLM 返回 JSON 但少字段
    bad = json.dumps({"highlights": ["ok"]})
    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response(bad)
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None

Step 8: 运行全部 evaluator 测试

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v

Expected: 4 passed.

Step 9: Commit

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/content_evaluator.py tests/service/speaking/test_content_evaluator.py
# 如果改了 pyproject.toml / conftest.py
git add pyproject.toml tests/conftest.py uv.lock 2>/dev/null || true
git commit -m "feat(speaking): 新增 content_evaluator（LLM JSON 模式生成单轮评语）"

Task 4: [backend] 把 ContentEvaluator 串进 `_evaluate_pronunciation`（TDD）

这是核心集成点：Azure 成功后追加一次 content 评估；Azure 失败则不调用；content 失败不影响 status。

Files:

Modify: cococlass-english-speaking-api/app/service/speaking/dialogue_service.py
Create: cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py

注意：原 _evaluate_pronunciation 通过 self.assessor 依赖注入。为了让 content evaluator 可被测替换，下面把它也作为依赖挂到 DialogueService 上。

Step 1: 写测试 — Azure 成功 + content 成功 → 两个字段都写入

创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py：

"""Integration-ish tests for content evaluation wired into DialogueService._evaluate_pronunciation."""

from unittest.mock import AsyncMock, MagicMock

import pytest

from app.service.speaking.dialogue_service import DialogueService


class _StubDB:
    """Minimal stand-in for AsyncSession that supports get() + commit()."""

    def __init__(self, evaluation):
        self._evaluation = evaluation
        self.commit = AsyncMock()

    async def __aenter__(self):
        return self

    async def __aexit__(self, *args):
        return False

    async def get(self, _cls, _id):
        return self._evaluation


def _fake_evaluation() -> MagicMock:
    ev = MagicMock()
    ev.status = "pending"
    ev.accuracy_score = None
    ev.fluency_score = None
    ev.completeness_score = None
    ev.prosody_score = None
    ev.word_analysis = None
    ev.content_feedback = None
    ev.completed_at = None
    ev.error_message = None
    return ev


def _build_service(assessor, evaluator) -> DialogueService:
    return DialogueService(
        asr=MagicMock(),
        llm=MagicMock(),
        assessor=assessor,
        storage=MagicMock(),
        content_evaluator=evaluator,
    )


@pytest.mark.asyncio
async def test_azure_success_then_content_success_writes_both(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(
        return_value={
            "accuracy_score": 80,
            "fluency_score": 85,
            "completeness_score": 90,
            "prosody_score": 75,
            "word_analysis": [],
        }
    )
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock(
        return_value={
            "highlights": ["nice"],
            "corrections": [],
            "suggestions": [],
        }
    )

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "completed"
    assert ev.accuracy_score == 80
    assert ev.content_feedback == {"highlights": ["nice"], "corrections": [], "suggestions": []}
    evaluator.evaluate.assert_awaited_once()

Step 2: 写测试 — Azure 成功 + content 失败 → content_feedback None，status 仍 completed

追加：

@pytest.mark.asyncio
async def test_azure_success_content_failure_keeps_status_completed(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(
        return_value={
            "accuracy_score": 80,
            "fluency_score": 85,
            "completeness_score": 90,
            "prosody_score": 75,
            "word_analysis": [],
        }
    )
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock(return_value=None)  # LLM failed

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "completed"
    assert ev.accuracy_score == 80
    assert ev.content_feedback is None

Step 3: 写测试 — Azure 失败 → ContentEvaluator 不被调用

追加：

@pytest.mark.asyncio
async def test_azure_failure_skips_content_evaluator(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(side_effect=RuntimeError("azure exploded"))
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock()

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "failed"
    assert ev.content_feedback is None
    evaluator.evaluate.assert_not_awaited()

Step 4: 运行测试，确认 fail

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v

Expected: 3 tests fail（DialogueService.__init__ 还没有 content_evaluator 参数，_evaluate_pronunciation 也没有 prior_ai_turn 参数）。

Step 5: 修改 DialogueService.__init__ 接受 content_evaluator

打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py。

在文件顶部 import 追加：

from app.service.speaking.content_evaluator import ContentEvaluator

定位 __init__：

    def __init__(
        self,
        asr: ASRProvider,
        llm: LLMProvider,
        assessor: PronunciationAssessor,
        storage: AudioStorage,
    ):
        self.asr = asr
        self.llm = llm
        self.assessor = assessor
        self.storage = storage

改为：

    def __init__(
        self,
        asr: ASRProvider,
        llm: LLMProvider,
        assessor: PronunciationAssessor,
        storage: AudioStorage,
        content_evaluator: ContentEvaluator | None = None,
    ):
        self.asr = asr
        self.llm = llm
        self.assessor = assessor
        self.storage = storage
        self.content_evaluator = content_evaluator or ContentEvaluator()

Step 6: 改 _evaluate_pronunciation 签名和逻辑

定位现有实现（dialogue_service.py:321 左右）：

    async def _evaluate_pronunciation(
        self,
        evaluation_id: int,
        audio_bytes: bytes,
        reference_text: str,
        content_type: str = "audio/webm;codecs=opus",
    ):
        """后台静默发音评估"""
        from app.models.database import async_session

        async with async_session() as db:
            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
            if not evaluation:
                logger.error(f"Evaluation record not found: id={evaluation_id}")
                return

            try:
                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
                evaluation.status = "completed"
                evaluation.accuracy_score = result["accuracy_score"]
                evaluation.fluency_score = result["fluency_score"]
                evaluation.completeness_score = result["completeness_score"]
                evaluation.prosody_score = result["prosody_score"]
                evaluation.word_analysis = result.get("word_analysis")
                evaluation.completed_at = datetime.now()
            except Exception as e:
                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
                evaluation.status = "failed"
                evaluation.error_message = str(e)

            await db.commit()

改为：

    async def _evaluate_pronunciation(
        self,
        evaluation_id: int,
        audio_bytes: bytes,
        reference_text: str,
        prior_ai_turn: str = "",
        content_type: str = "audio/webm;codecs=opus",
    ):
        """后台静默发音评估 + 内容评语"""
        from app.models.database import async_session

        async with async_session() as db:
            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
            if not evaluation:
                logger.error(f"Evaluation record not found: id={evaluation_id}")
                return

            try:
                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
                evaluation.status = "completed"
                evaluation.accuracy_score = result["accuracy_score"]
                evaluation.fluency_score = result["fluency_score"]
                evaluation.completeness_score = result["completeness_score"]
                evaluation.prosody_score = result["prosody_score"]
                evaluation.word_analysis = result.get("word_analysis")
                evaluation.completed_at = datetime.now()

                # Content evaluation: 仅在 Azure 成功时触发；失败不影响 status。
                try:
                    content_feedback = await self.content_evaluator.evaluate(
                        transcript=reference_text,
                        prior_ai_turn=prior_ai_turn,
                        pron_scores={
                            "accuracy": result["accuracy_score"],
                            "fluency": result["fluency_score"],
                            "completeness": result["completeness_score"],
                            "prosody": result["prosody_score"],
                        },
                    )
                    evaluation.content_feedback = content_feedback
                    logger.info(
                        f"Content evaluation done: eval={evaluation_id}, "
                        f"has_feedback={content_feedback is not None}"
                    )
                except Exception as e:
                    logger.error(f"Content evaluation error (soft-fail): eval={evaluation_id}, error={e}")
                    evaluation.content_feedback = None

            except Exception as e:
                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
                evaluation.status = "failed"
                evaluation.error_message = str(e)

            await db.commit()

Step 7: 更新 speak() 里的 asyncio.create_task 传入 prior_ai_turn

定位 speak() 方法内现有的 create_task 调用（dialogue_service.py:189 左右）：

            asyncio.create_task(
                self._evaluate_pronunciation(
                    evaluation_id=evaluation.id,
                    audio_bytes=audio_bytes,
                    reference_text=transcript,
                    content_type=content_type,
                )
            )

在 create_task 之前，计算 prior_ai_turn。新增变量（放在 "⑩ 后台发音评估" 之前）：

            # 找到本轮 student 消息之前最近的一条 AI 消息作为 content 评估的上下文
            prior_ai_turn = ""
            for msg in reversed(history):
                if msg.role == "ai":
                    prior_ai_turn = msg.content
                    break

然后把 create_task 改为：

            asyncio.create_task(
                self._evaluate_pronunciation(
                    evaluation_id=evaluation.id,
                    audio_bytes=audio_bytes,
                    reference_text=transcript,
                    prior_ai_turn=prior_ai_turn,
                    content_type=content_type,
                )
            )

Step 8: 运行测试，确认全部通过

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v

Expected: 3 passed.

如果报 ImportError: cannot import name 'ContentEvaluator' from partial init（循环引用），把 ContentEvaluator 的 import 放到 dialogue_service.py 的 __init__ 方法内的首行（延迟导入）而不是文件顶部。

Step 9: Commit

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_content.py
git commit -m "feat(speaking): 在 _evaluate_pronunciation 串联 content_evaluator"

Task 5: [backend] `/report` 返回 `contentFeedback`

Files:

Modify: cococlass-english-speaking-api/app/service/speaking/dialogue_service.py（get_report 方法）
Create: cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py
[ ] Step 1: 写测试 — evaluation 带 content_feedback 时，report entry 也带

创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py：

"""Tests for get_report including content_feedback pass-through."""

from unittest.mock import MagicMock

import pytest


def _stub_message(role: str, content: str, round_: int, evaluation=None):
    msg = MagicMock()
    msg.role = role
    msg.content = content
    msg.round = round_
    msg.audio_url = None
    msg.evaluation = evaluation
    return msg


def _stub_evaluation(content_feedback=None, status="completed"):
    ev = MagicMock()
    ev.status = status
    ev.accuracy_score = 80
    ev.fluency_score = 80
    ev.completeness_score = 80
    ev.prosody_score = 80
    ev.word_analysis = None
    ev.content_feedback = content_feedback
    return ev


def _build_report_entry(msg) -> dict:
    """Replicates the entry construction in DialogueService.get_report.

    We only exercise the dict-shaping step in isolation — the full get_report
    path hits DB/LLM summary and is not needed for this contract check.
    """
    entry = {
        "round": msg.round,
        "role": msg.role,
        "content": msg.content,
        "audioUrl": msg.audio_url,
    }
    if msg.role == "student" and msg.evaluation:
        ev = msg.evaluation
        entry["evaluation"] = {
            "status": ev.status,
            "accuracyScore": ev.accuracy_score,
            "fluencyScore": ev.fluency_score,
            "completenessScore": ev.completeness_score,
            "prosodyScore": ev.prosody_score,
            "wordAnalysis": ev.word_analysis,
            "contentFeedback": ev.content_feedback,
        }
    return entry


def test_report_entry_includes_content_feedback_when_present() -> None:
    feedback = {"highlights": ["good"], "corrections": [], "suggestions": []}
    ev = _stub_evaluation(content_feedback=feedback)
    msg = _stub_message("student", "hi", 1, evaluation=ev)

    entry = _build_report_entry(msg)

    assert entry["evaluation"]["contentFeedback"] == feedback


def test_report_entry_content_feedback_is_null_when_absent() -> None:
    ev = _stub_evaluation(content_feedback=None)
    msg = _stub_message("student", "hi", 1, evaluation=ev)

    entry = _build_report_entry(msg)

    assert entry["evaluation"]["contentFeedback"] is None


def test_ai_message_has_no_evaluation_key() -> None:
    msg = _stub_message("ai", "hello", 1, evaluation=None)
    entry = _build_report_entry(msg)
    assert "evaluation" not in entry

这里我们测试的是 entry-shaping 的契约（独立 helper）。真实 get_report 里我们要修改同样的 entry 构造块保持一致。

Step 2: 运行测试，应该全过（独立 helper 不依赖还没修改的代码）

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_report.py -v

Expected: 3 passed. 这一步验证的是契约，下一步把它应用到真实代码。

Step 3: 修改 get_report 的 entry 构造块

打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py，定位：

            if msg.role == "student" and msg.evaluation:
                ev = msg.evaluation
                entry["evaluation"] = {
                    "status": ev.status,
                    "accuracyScore": ev.accuracy_score,
                    "fluencyScore": ev.fluency_score,
                    "completenessScore": ev.completeness_score,
                    "prosodyScore": ev.prosody_score,
                    "wordAnalysis": ev.word_analysis,
                }

改为：

            if msg.role == "student" and msg.evaluation:
                ev = msg.evaluation
                entry["evaluation"] = {
                    "status": ev.status,
                    "accuracyScore": ev.accuracy_score,
                    "fluencyScore": ev.fluency_score,
                    "completenessScore": ev.completeness_score,
                    "prosodyScore": ev.prosody_score,
                    "wordAnalysis": ev.word_analysis,
                    "contentFeedback": ev.content_feedback,
                }

Step 4: 重跑 report 相关测试

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/ -v

Expected: 全部 pass（smoke 1 + evaluator 4 + content 3 + report 3 = 11 passed）。

Step 5: Commit

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_report.py
git commit -m "feat(speaking): /report 返回每轮 contentFeedback"

Task 6: [frontend] 把 `contentFeedback` 透传到 `sentence.feedback`

DetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 渲染（PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue:94-116），所以前端只需要在 getReport 响应转 OverallEvaluation 的地方加一个 field pass-through。

Files:

Modify: PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts
[ ] Step 1: 定位后端→前端形状转换位置

运行：

grep -n "rounds\|sentenceEvaluations\|evaluation" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts

后端 /report 返回 { sessionId, topic, status, rounds[], summary }，前端 DialogueReport 期望 { evaluation: OverallEvaluation }（sentenceEvaluations[] 里每项的 feedback 字段）。当前 RealDialogueAPI.getReport()（llmService.ts:86-92）直接 return res.json()，不做形状转换。

这意味着：当前前端要么通过其他层做 shape adaption，要么 DetailedReport.vue 从别处拿数据。先跑一次 grep 找适配位置：

grep -rn "sentenceEvaluations\|rounds" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking --include="*.ts" --include="*.vue" | head -30

Step 2: 根据 Step 1 结果选一条支路

支路 A（理想情况）：如果已经有一个 mapReportToEvaluation(backendRes) 之类的函数

在那个函数里给每个 sentence 加 feedback: round.evaluation?.contentFeedback ?? undefined
继续 Step 3

支路 B（没有转换层）：如果 getReport 的返回值直接裸传给组件

在 RealDialogueAPI.getReport() 里把 rounds[] 转成 OverallEvaluation.sentenceEvaluations[]，其中每个 student 角色的轮次 emit 一个 SentenceEvaluation 带 feedback: r.evaluation?.contentFeedback ?? undefined
继续 Step 3

支路 C（Mock API 已经长成 { evaluation: OverallEvaluation } 但真实后端没适配）： 这是当前最可能的状态。此时必须在 RealDialogueAPI.getReport() 里写一个显式 adapter。按支路 B 实现。

Step 3: 在 RealDialogueAPI.getReport 里加 adapter（假设走支路 B/C）

打开 PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts。

把：

  async getReport(sessionId: string): Promise<DialogueReport> {
    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
      credentials: 'include',
    })
    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
    return res.json()
  }

改为：

  async getReport(sessionId: string): Promise<DialogueReport> {
    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
      credentials: 'include',
    })
    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
    const raw = await res.json() as BackendReportResponse
    return adaptReport(raw)
  }

在 RealDialogueAPI 类定义之前加：

interface BackendEvaluation {
  status: 'pending' | 'completed' | 'failed'
  accuracyScore: number | null
  fluencyScore: number | null
  completenessScore: number | null
  prosodyScore: number | null
  wordAnalysis: unknown
  contentFeedback: {
    highlights: string[]
    corrections: { original: string; corrected: string; explanation: string }[]
    suggestions: string[]
  } | null
}

interface BackendRound {
  round: number
  role: 'ai' | 'student'
  content: string
  audioUrl: string | null
  evaluation?: BackendEvaluation
}

interface BackendReportResponse {
  sessionId: string
  topic: string
  status: 'evaluating' | 'ready'
  rounds: BackendRound[]
  summary: string | null
}

function adaptReport(raw: BackendReportResponse): DialogueReport {
  const sentenceEvaluations: SentenceEvaluation[] = raw.rounds.map((r, idx) => ({
    id: `${raw.sessionId}-${idx}`,
    round: r.round,
    role: r.role,
    content: r.content,
    audioUrl: r.audioUrl ?? undefined,
    pronunciation: r.evaluation && r.role === 'student'
      ? {
          accuracy: r.evaluation.accuracyScore ?? 0,
          fluency: r.evaluation.fluencyScore ?? 0,
          // enspeak 原型用 intonation/stress 做 UI label；把 Azure 的 prosody/completeness 分别
          // 映射到这两格（prosody → intonation 表示语调、completeness → stress 表示完整读出）。
          // 这是一个 UI 贴合性决定，如未来 UI 统一改用 Azure 四维，再把 key 改回来。
          intonation: r.evaluation.prosodyScore ?? 0,
          stress: r.evaluation.completenessScore ?? 0,
        }
      : undefined,
    feedback: r.evaluation?.contentFeedback ?? undefined,
  }))

  // overallScore 先用平均分作为 MVP 占位；其他字段留空/安全默认。
  const studentEvals = sentenceEvaluations.filter(s => s.role === 'student' && s.pronunciation)
  const avg = studentEvals.length > 0
    ? Math.round(
        studentEvals.reduce(
          (sum, s) => sum + (s.pronunciation!.accuracy + s.pronunciation!.fluency + s.pronunciation!.intonation + s.pronunciation!.stress) / 4,
          0,
        ) / studentEvals.length,
      )
    : 0

  return {
    evaluation: {
      overallScore: avg,
      scoreLevel: avg >= 85 ? 'excellent' : avg >= 70 ? 'good' : avg >= 60 ? 'fair' : 'needsWork',
      percentile: 0,
      dimensions: { fluency: 0, interaction: 0, vocabulary: 0, grammar: 0 },
      aiComment: raw.summary ?? '',
      highlights: [],
      improvements: [],
      nextChallenge: {},
      statistics: {
        totalRounds: Math.max(...sentenceEvaluations.map(s => s.round), 0),
        averageScore: avg,
        highestScore: 0,
        highestRound: 0,
        grammarErrors: 0,
        excellentExpressions: 0,
        totalDuration: 0,
      },
      sentenceEvaluations,
    },
  }
}

然后在顶部 import 里追加 SentenceEvaluation：

import type {
  DialogueAPI, DialogueReport, SessionConfig, SessionInfo, SSEEvent,
  SentenceEvaluation,
} from '@/types/englishSpeaking'

（如果 SentenceEvaluation 未从 englishSpeaking.ts 导出，先去那个文件确认 export interface SentenceEvaluation 已加 export 关键字。）

注意：如果 Step 1 的 grep 显示已经有现成的 adapter 函数，以现有适配层为准——只在那里追加 feedback 字段、不新建 adapter。跳过这里的 adaptReport 整段代码，改为找到现有函数加一行 pass-through。

Step 4: 类型检查

cd /Users/buoy/Development/gitrepo/PPT
npm run type-check

（如果项目用 pnpm / yarn，相应调整。若没有 type-check script，跑 npx vue-tsc --noEmit。）

Expected: 无 type error。

Step 5: 手动 smoke 验证

启动后端：

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run uvicorn app.main:app --reload

启动前端： bash cd /Users/buoy/Development/gitrepo/PPT npm run dev

浏览器进入 EnglishSpeaking 组件，完成一轮对话。
打开结果页（DetailedReport），确认每轮学生句子下面能看到"亮点 / 改正 / 建议"三段内容。

同时在后端 DB 查：

SELECT round, status, accuracy_score, content_feedback
FROM pronunciation_evaluation
WHERE session_id = (SELECT id FROM dialogue_session ORDER BY id DESC LIMIT 1);

确认 content_feedback 是 {highlights, corrections, suggestions} 结构（或 NULL 如果 LLM 失败）。

任意一项不通过，回到对应 Task 定位 bug。

Step 6: Commit

cd /Users/buoy/Development/gitrepo/PPT
git add src/views/Editor/EnglishSpeaking/services/llmService.ts
git commit -m "feat(english-speaking): 结果页透传 contentFeedback 到 SentenceCard"

Task 7: 回归校验全部测试和现有流程

Step 1: 跑后端全测试

cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest -v

Expected: 所有 test 通过（包含本次新增的 smoke 1 + evaluator 4 + content-dispatch 3 + report 3 = 11 个）。

Step 2: 跑前端类型检查

cd /Users/buoy/Development/gitrepo/PPT
npm run type-check

Expected: 无 type error。

Step 3: 把两个 repo 的 HEAD 记下来，作为本次实施的完成标记

echo "backend:  $(git -C /Users/buoy/Development/gitrepo/cococlass-english-speaking-api rev-parse --short HEAD)"
echo "frontend: $(git -C /Users/buoy/Development/gitrepo/PPT rev-parse --short HEAD)"

把输出贴到本 plan 文件底部的"完成记录"栏。

完成记录

计划完成日期：2026-04-23
后端 HEAD：7d192be (branch feat/content-evaluator, 基线 aa5e1a7，5 个 commit)
- 99e64fa Task 1 DB 列
- 1492ebe Task 2 pytest 骨架
- dee45e6 Task 3 content_evaluator 模块
- a1f1b91 Task 4 接线 _evaluate_pronunciation
- 7d192be Task 5 /report 返回 contentFeedback
前端 HEAD：7c4d1a9 (branch feat/english-speaking, 基线 4523862，1 个 commit)
- 7c4d1a9 Task 6 结果页 contentFeedback 透传
最终回归：后端 uv run pytest 11/11 passed；前端 vue-tsc --noEmit exit=0
偏差与说明：
- Task 2 触发了 plan Step 2a 的条件分支（pytest-asyncio 缺失，按 plan 指令自动追加依赖 + conftest.py 插入 pytest_plugins）
- Task 4 把 async_session 从方法内延迟 import 提升到模块顶层（为了让 monkeypatch.setattr 能打到；无循环依赖风险）
- Task 6 实际走 plan 的 Branch C（实现完整 adapter），投资成本略高于 Branch A 的一行 pass-through。发现：TopicDiscussionPreview.vue 当前展示的是 mockEvaluation 硬编码数据，真实 getReport() 并未被 UI 消费。adapter 是结构性就位，真正"打开结果页看到 LLM 评语"需要后续把 UI 切换到走 DialogueAPI.getReport——该切换不在本 MVP 范围，留给后续任务
- 未做端到端 smoke（真实后端 + 真实浏览器操作），仅静态验证（单测 + 类型检查）。进入真实联调时需先在后端启动 .env 里配好 AZURE_SPEECH_KEY 和 ONEHUB_API_KEY
- 评审中提出的几个 non-blocking 改进项（content_evaluator 的 AsyncOpenAI 生命周期、_StubDB 断言说明、pron_scores TypedDict、adapter 错误容忍）均标记为后续迭代，未纳入本次 MVP

2026-04-24 补充：UI 接入 + 跨仓全量 Code Review 发现

本轮继续完成了"UI 切换到真实 getReport"（原遗留项），并对后端/前端分别跑了一次完整 code review。对话主流程尚未跑通到结果页，下次回到本 MVP 前先让对话链路能走完 N 轮进入 completed 态，再基于真实数据验证下列修复。

新增 commit：

前端 d1186cb — DialogueChatView emit complete 携带 DialogueReport | null；TopicDiscussionPreview 用 displayEvaluation 优先真实数据、mock 作为 fallback
前端最新 HEAD：d1186cb；后端 HEAD 未变（仍 7d192be）
vue-tsc 通过

下次回来必须先修（Critical + Important）

[BACKEND CRITICAL] /speak-stream WebSocket 路径完全绕过 ContentEvaluator
- app/api/dialogue.py:159-184 的 _background_evaluate_pronunciation 只跑 Azure，从未调用 content evaluator
- 前端录音主路径是 WebSocket（useDialogueEngine.ts:256+ beginStudentStream），HTTP /speak 只是 fallback
- 后果：真实用户的 content_feedback 永远 NULL
- 修复方向：让 /speak-stream 统一走 DialogueService._evaluate_pronunciation，或把 evaluator 调用复制进去
[FRONTEND CRITICAL] getReport 轮询不识别 status === 'evaluating'
- useDialogueEngine.ts:190-202 的 poll 只对 reject 重试；后端 200 返 status='evaluating' 且部分 round contentFeedback=null 时直接 resolve
- 设计文档 §2.6 已预告此情况但实现未处理
- BackendReportResponse.status 类型已声明却从未被读取
- 修复方向：把 status === 'evaluating' 视为"未完成"继续 poll
[FRONTEND IMPORTANT] getReport 失败悄悄回落到 mockEvaluation（I2）
- fetchReportSafe → null → displayEvaluation → mockEvaluation（熊猫/竹子那份假数据）
- 用户会把虚构报告当作自己的
- 修复方向：真实模式下失败必须显示错误态 UI，不回落 mock
[FRONTEND IMPORTANT] "结束并查看报告" 阻塞最多 30s（I3）
- handleExitConfirm 里 await fetchReportSafe() 串在 modal 关闭后，期间 chat view 冻结
- 修复方向：先切到 completed stage 显示 loading，getReport 后台拉
[FRONTEND IMPORTANT] pending/failed 轮次 0 分污染 overallScore（I5）
- llmService.ts:90-97 用 ?? 0 把未完成轮次填 0，.filter(s.pronunciation) 仍会保留
- 修复方向：adapter 里只有 status === 'completed' 且 score 非 null 才填 pronunciation
[FRONTEND IMPORTANT] Axis 映射语义不对（I4）
- prosody → intonation、completeness → stress：completeness 是"读完整度"，stress 是"重音"
- 修复方向：MVP 先加注释标记为债，后续扩展 SentenceEvaluation.pronunciation 为 Azure 四维

下下轮改进（Non-blocking）

Backend I1：DialogueService.__init__ 每请求 new AsyncOpenAI（get_dialogue_service 在 Depends 里），改 module-level 单例
Backend I2：prior_ai_turn 依赖"学生消息已 flush 但 AI 未写入"的时序脆弱，加 role='ai' AND round < current_round 显式过滤
Backend I4：pron_scores: dict 缺 TypedDict
Backend I5/I6：test_content_evaluator 缺 prompt payload 断言；test_dialogue_service_report 契约镜像嫌疑
Backend M1-6：corrections 内部结构未校验、prompt 注入、migration 非幂等、model 无专用配置、f-string 日志
Frontend M1-5：非空断言可避免、Math.max(...arr) 栈风险、dimensions/statistics/aiComment 占位零未标 TODO、id 方案不统一、status 类型不含 'evaluating'

ContentEvaluationPlan.md 48 KB Historique Raw

单轮内容评语 MVP Implementation Plan

File Structure

Backend (cococlass-english-speaking-api)

Frontend (PPT)

Task 1: [backend] 加 content_feedback 列

Task 2: [backend] 搭 pytest 目录骨架 + conftest

Task 3: [backend] 写 content_evaluator 模块（TDD）

Task 4: [backend] 把 ContentEvaluator 串进 _evaluate_pronunciation（TDD）

Task 5: [backend] /report 返回 contentFeedback

Task 6: [frontend] 把 contentFeedback 透传到 sentence.feedback

Task 7: 回归校验全部测试和现有流程

完成记录

2026-04-24 补充：UI 接入 + 跨仓全量 Code Review 发现

下次回来必须先修（Critical + Important）

下下轮改进（Non-blocking）

ContentEvaluationPlan.md 48 KB

Historique Raw

Task 1: [backend] 加 `content_feedback` 列

Task 3: [backend] 写 `content_evaluator` 模块（TDD）

Task 4: [backend] 把 ContentEvaluator 串进 `_evaluate_pronunciation`（TDD）

Task 5: [backend] `/report` 返回 `contentFeedback`

Task 6: [frontend] 把 `contentFeedback` 透传到 `sentence.feedback`