ContentEvaluationPlan.md 48 KB

单轮内容评语 MVP Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 在现有每轮 Azure 发音四分之外,补一条 LLM 内容评语链路,产出 {highlights, corrections, suggestions} 挂在每轮评估上,只在结果页展示。

Architecture: 每轮 /speak 后台任务里,Azure PA 完成后串联一次 OpenAI JSON-mode 调用;失败降级到 content_feedback=null 不影响发音分;/report 返回时带 contentFeedback 字段。

Tech Stack: Python 3.13 · FastAPI · SQLAlchemy 2.x async · MySQL · OpenAI SDK (via onehub base_url) · pytest · uv · Vue 3 · TypeScript

Repos:

  • Backend: /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
  • Frontend: /Users/buoy/Development/gitrepo/PPT

Spec: /Users/buoy/Development/gitrepo/PPT/doc/ContentEvaluationDesign.md


File Structure

Backend (cococlass-english-speaking-api)

Create:

  • app/service/speaking/content_evaluator.py — 单一职责:把 (4 发音分 + AI 上一句 + 学生转录) 丢给 LLM 出 JSON 评语
  • tests/conftest.py — pytest 异步 + mock 夹具
  • tests/service/__init__.py
  • tests/service/speaking/__init__.py
  • tests/service/speaking/test_content_evaluator.py — evaluator 的单元测试
  • tests/service/speaking/test_dialogue_service_content.py — 串联逻辑的单元测试
  • migrations/001_add_content_feedback.sql — 对已有 DB 的增量 SQL

Modify:

  • init.sql — 对新 DB 的建表语句同步加列
  • app/models/dialogue.pyPronunciationEvaluation 增加 content_feedback
  • app/service/speaking/dialogue_service.py_evaluate_pronunciation 成功分支后追加 content 评估;get_report 返回 contentFeedback

Frontend (PPT)

Modify:

  • src/views/Editor/EnglishSpeaking/services/llmService.tsgetReport 的响应转换(把后端 rounds[i].evaluation.contentFeedback 映射到 sentenceEvaluations[i].feedback

不改:DetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 形状渲染;englishSpeaking.tsSentenceEvaluation.feedback 类型也已经对齐。


Task 1: [backend] 加 content_feedback

Files:

  • Modify: cococlass-english-speaking-api/init.sql(新建表语句)
  • Create: cococlass-english-speaking-api/migrations/001_add_content_feedback.sql
  • Modify: cococlass-english-speaking-api/app/models/dialogue.py(SQLAlchemy 模型)

  • [ ] Step 1: 更新 init.sqlpronunciation_evaluation 建表语句

pronunciation_evaluation 表定义里,completed_at 之前插入 content_feedback 列:

打开 cococlass-english-speaking-api/init.sql,把:

    word_analysis JSON NULL,
    error_message TEXT NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    completed_at DATETIME NULL,

改为:

    word_analysis JSON NULL,
    content_feedback JSON NULL,
    error_message TEXT NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    completed_at DATETIME NULL,
  • Step 2: 创建 migrations/ 目录并写入增量 SQL
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p migrations

创建 migrations/001_add_content_feedback.sql,内容:

-- Add content_feedback column to existing pronunciation_evaluation table.
-- Apply once against an existing database (new DBs use updated init.sql).
ALTER TABLE pronunciation_evaluation
  ADD COLUMN content_feedback JSON NULL AFTER word_analysis;
  • Step 3: 更新 SQLAlchemy 模型

打开 cococlass-english-speaking-api/app/models/dialogue.py

定位:

    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)

改为(在中间插入 content_feedback):

    word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    content_feedback: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
  • Step 4: Commit
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add init.sql migrations/001_add_content_feedback.sql app/models/dialogue.py
git commit -m "feat(db): 为 pronunciation_evaluation 增加 content_feedback 列"

Task 2: [backend] 搭 pytest 目录骨架 + conftest

本仓库 tests/ 目前只有空 __init__.py。先建立可运行的单测基础。

Files:

  • Create: cococlass-english-speaking-api/tests/conftest.py
  • Create: cococlass-english-speaking-api/tests/service/__init__.py
  • Create: cococlass-english-speaking-api/tests/service/speaking/__init__.py
  • Create: cococlass-english-speaking-api/tests/service/speaking/test_smoke.py

  • [ ] Step 1: 创建 tests/conftest.py

"""Pytest global fixtures & asyncio config."""

import pytest


@pytest.fixture
def anyio_backend() -> str:
    """Force asyncio backend for anyio tests (not trio)."""
    return "asyncio"
  • Step 2: 创建空 __init__.py 使 pytest 能发现嵌套目录
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p tests/service/speaking
touch tests/service/__init__.py tests/service/speaking/__init__.py
  • Step 3: 写冒烟测试确认 pytest 跑得起来

创建 tests/service/speaking/test_smoke.py

def test_pytest_works() -> None:
    assert 1 + 1 == 2
  • Step 4: 运行冒烟测试
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_smoke.py -v

Expected: 1 passed.

如果 uv run pytest 报 "pytest: command not found",先 uv sync --group dev 装开发依赖再重跑。

  • Step 5: Commit
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add tests/
git commit -m "chore(test): 搭建 pytest 目录骨架和 conftest"

Task 3: [backend] 写 content_evaluator 模块(TDD)

Files:

  • Create: cococlass-english-speaking-api/app/service/speaking/content_evaluator.py
  • Modify: cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py(上一任务 smoke 测试文件所在目录,新建另一个文件)

ContentEvaluator 直接实例化 AsyncOpenAI(和 OneHubLLM 一样用 settings.ONEHUB_BASE_URL + settings.ONEHUB_API_KEY),因为需要 response_format 参数,现有 LLMProvider.chat() 接口不暴露它。

  • Step 1: 写 evaluator 的失败测试(happy path)

创建 cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py

"""Unit tests for ContentEvaluator."""

import json
from unittest.mock import AsyncMock, MagicMock, patch

import pytest

from app.service.speaking.content_evaluator import ContentEvaluator


def _mock_openai_response(content: str) -> MagicMock:
    """Construct a fake AsyncOpenAI chat completion response."""
    choice = MagicMock()
    choice.message.content = content
    resp = MagicMock()
    resp.choices = [choice]
    return resp


@pytest.mark.asyncio
async def test_evaluate_happy_path() -> None:
    fake_json = json.dumps(
        {
            "highlights": ["发音清晰", "句子完整"],
            "corrections": [
                {
                    "original": "I go to park yesterday",
                    "corrected": "I went to the park yesterday",
                    "explanation": "过去式应用 went,park 前加 the",
                }
            ],
            "suggestions": ["可增加连接词"],
        }
    )

    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response(fake_json)
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="I go to park yesterday",
            prior_ai_turn="What did you do last weekend?",
            pron_scores={"accuracy": 72, "fluency": 85, "completeness": 90, "prosody": 60},
        )

    assert result is not None
    assert result["highlights"] == ["发音清晰", "句子完整"]
    assert len(result["corrections"]) == 1
    assert result["corrections"][0]["corrected"] == "I went to the park yesterday"
    assert result["suggestions"] == ["可增加连接词"]
  • Step 2: 运行,确认 fail(模块还不存在)
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v

Expected: ModuleNotFoundError: No module named 'app.service.speaking.content_evaluator' 或类似 import 错误。

  • Step 3: 实现最小 evaluator 让 happy path 通过

创建 cococlass-english-speaking-api/app/service/speaking/content_evaluator.py

"""Per-turn content evaluation via LLM (JSON mode)."""

import asyncio
import json

from openai import AsyncOpenAI

from app.config import settings
from app.logging import get_logger

logger = get_logger(__name__)


SYSTEM_PROMPT = """You are an English tutor evaluating a student's single spoken turn
in an open dialogue. You receive:
- Azure pronunciation scores (accuracy/fluency/completeness/prosody, 0-100)
- The immediate prior AI turn (context)
- The student's transcript

Return JSON with exactly these keys:
- highlights: 1-2 Chinese sentences praising specific strengths. Reference a
              pronunciation dimension if that score is >= 85. <= 30 chars each.
- corrections: array of grammar/word-choice fixes. Each item has keys:
               original (EN), corrected (EN), explanation (ZH, <= 30 chars).
- suggestions: 1-2 Chinese actionable improvements. Reference a pronunciation
               dimension if that score is < 70. <= 30 chars each.

Rules:
- Empty arrays are valid. Do not invent errors to fill quota.
- If the student only said a filler ("yes", "ok", "hmm"), return empty
  corrections and suggestions plus one encouragement in highlights.
- Never include raw score numbers in output text; describe qualitatively
  ("发音准确度很高" not "accuracy 92").
- Output MUST be a single JSON object with keys highlights, corrections, suggestions.
"""


class ContentEvaluator:
    """Generates per-turn content feedback via LLM in JSON mode."""

    def __init__(self, timeout_seconds: float = 10.0):
        self.client = AsyncOpenAI(
            base_url=settings.ONEHUB_BASE_URL,
            api_key=settings.ONEHUB_API_KEY,
        )
        self.model = settings.ONEHUB_MODEL
        self.timeout_seconds = timeout_seconds

    async def evaluate(
        self,
        transcript: str,
        prior_ai_turn: str,
        pron_scores: dict,
    ) -> dict | None:
        """Return {highlights, corrections, suggestions} or None on failure."""
        user_payload = json.dumps(
            {
                "pronunciation": pron_scores,
                "ai_said": prior_ai_turn,
                "student_said": transcript,
            },
            ensure_ascii=False,
        )

        try:
            resp = await asyncio.wait_for(
                self.client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": SYSTEM_PROMPT},
                        {"role": "user", "content": user_payload},
                    ],
                    response_format={"type": "json_object"},
                    temperature=0,
                ),
                timeout=self.timeout_seconds,
            )
        except asyncio.TimeoutError:
            logger.warning("ContentEvaluator LLM timeout")
            return None
        except Exception as e:
            logger.error(f"ContentEvaluator LLM error: {e}")
            return None

        raw = resp.choices[0].message.content or ""
        try:
            parsed = json.loads(raw)
        except json.JSONDecodeError:
            logger.warning(f"ContentEvaluator got non-JSON: {raw[:200]}")
            return None

        if not self._has_required_shape(parsed):
            logger.warning(f"ContentEvaluator got invalid shape: {parsed}")
            return None

        return {
            "highlights": parsed.get("highlights", []),
            "corrections": parsed.get("corrections", []),
            "suggestions": parsed.get("suggestions", []),
        }

    @staticmethod
    def _has_required_shape(obj: object) -> bool:
        if not isinstance(obj, dict):
            return False
        for key in ("highlights", "corrections", "suggestions"):
            if key not in obj or not isinstance(obj[key], list):
                return False
        return True
  • Step 4: 运行 happy path 测试,确认 pass
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py::test_evaluate_happy_path -v

Expected: 1 passed.

如果报错 pytest-asyncio plugin not installed,在 pyproject.toml[dependency-groups].dev 里追加 "pytest-asyncio>=0.26.0",并在 tests/conftest.py 顶部加:

import pytest

pytest_plugins = ["pytest_asyncio"]

uv sync --group dev 重跑。

  • Step 5: 加失败分支测试 — JSON 解析失败

test_content_evaluator.py 追加:

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_invalid_json() -> None:
    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response("not a json")
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None
  • Step 6: 加失败分支测试 — 超时

追加:

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_timeout() -> None:
    async def never_returns(**kwargs):
        await asyncio.sleep(5)

    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = never_returns

        evaluator = ContentEvaluator(timeout_seconds=0.05)
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None

同时在文件顶部 import 里加 import asyncio(如果还没有)。

  • Step 7: 加失败分支测试 — 非法 shape

追加:

@pytest.mark.asyncio
async def test_evaluate_returns_none_on_wrong_shape() -> None:
    # LLM 返回 JSON 但少字段
    bad = json.dumps({"highlights": ["ok"]})
    with patch(
        "app.service.speaking.content_evaluator.AsyncOpenAI"
    ) as MockClient:
        instance = MockClient.return_value
        instance.chat.completions.create = AsyncMock(
            return_value=_mock_openai_response(bad)
        )

        evaluator = ContentEvaluator()
        result = await evaluator.evaluate(
            transcript="Hi",
            prior_ai_turn="Hello",
            pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
        )

    assert result is None
  • Step 8: 运行全部 evaluator 测试
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v

Expected: 4 passed.

  • Step 9: Commit
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/content_evaluator.py tests/service/speaking/test_content_evaluator.py
# 如果改了 pyproject.toml / conftest.py
git add pyproject.toml tests/conftest.py uv.lock 2>/dev/null || true
git commit -m "feat(speaking): 新增 content_evaluator(LLM JSON 模式生成单轮评语)"

Task 4: [backend] 把 ContentEvaluator 串进 _evaluate_pronunciation(TDD)

这是核心集成点:Azure 成功后追加一次 content 评估;Azure 失败则不调用;content 失败不影响 status。

Files:

  • Modify: cococlass-english-speaking-api/app/service/speaking/dialogue_service.py
  • Create: cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py

注意:原 _evaluate_pronunciation 通过 self.assessor 依赖注入。为了让 content evaluator 可被测替换,下面把它也作为依赖挂到 DialogueService 上。

  • Step 1: 写测试 — Azure 成功 + content 成功 → 两个字段都写入

创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py

"""Integration-ish tests for content evaluation wired into DialogueService._evaluate_pronunciation."""

from unittest.mock import AsyncMock, MagicMock

import pytest

from app.service.speaking.dialogue_service import DialogueService


class _StubDB:
    """Minimal stand-in for AsyncSession that supports get() + commit()."""

    def __init__(self, evaluation):
        self._evaluation = evaluation
        self.commit = AsyncMock()

    async def __aenter__(self):
        return self

    async def __aexit__(self, *args):
        return False

    async def get(self, _cls, _id):
        return self._evaluation


def _fake_evaluation() -> MagicMock:
    ev = MagicMock()
    ev.status = "pending"
    ev.accuracy_score = None
    ev.fluency_score = None
    ev.completeness_score = None
    ev.prosody_score = None
    ev.word_analysis = None
    ev.content_feedback = None
    ev.completed_at = None
    ev.error_message = None
    return ev


def _build_service(assessor, evaluator) -> DialogueService:
    return DialogueService(
        asr=MagicMock(),
        llm=MagicMock(),
        assessor=assessor,
        storage=MagicMock(),
        content_evaluator=evaluator,
    )


@pytest.mark.asyncio
async def test_azure_success_then_content_success_writes_both(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(
        return_value={
            "accuracy_score": 80,
            "fluency_score": 85,
            "completeness_score": 90,
            "prosody_score": 75,
            "word_analysis": [],
        }
    )
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock(
        return_value={
            "highlights": ["nice"],
            "corrections": [],
            "suggestions": [],
        }
    )

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "completed"
    assert ev.accuracy_score == 80
    assert ev.content_feedback == {"highlights": ["nice"], "corrections": [], "suggestions": []}
    evaluator.evaluate.assert_awaited_once()
  • Step 2: 写测试 — Azure 成功 + content 失败 → content_feedback None,status 仍 completed

追加:

@pytest.mark.asyncio
async def test_azure_success_content_failure_keeps_status_completed(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(
        return_value={
            "accuracy_score": 80,
            "fluency_score": 85,
            "completeness_score": 90,
            "prosody_score": 75,
            "word_analysis": [],
        }
    )
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock(return_value=None)  # LLM failed

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "completed"
    assert ev.accuracy_score == 80
    assert ev.content_feedback is None
  • Step 3: 写测试 — Azure 失败 → ContentEvaluator 不被调用

追加:

@pytest.mark.asyncio
async def test_azure_failure_skips_content_evaluator(monkeypatch) -> None:
    ev = _fake_evaluation()
    stub_db = _StubDB(ev)
    monkeypatch.setattr(
        "app.service.speaking.dialogue_service.async_session", lambda: stub_db
    )

    assessor = MagicMock()
    assessor.assess = AsyncMock(side_effect=RuntimeError("azure exploded"))
    evaluator = MagicMock()
    evaluator.evaluate = AsyncMock()

    service = _build_service(assessor, evaluator)
    await service._evaluate_pronunciation(
        evaluation_id=1,
        audio_bytes=b"",
        reference_text="hi",
        prior_ai_turn="hello",
    )

    assert ev.status == "failed"
    assert ev.content_feedback is None
    evaluator.evaluate.assert_not_awaited()
  • Step 4: 运行测试,确认 fail
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v

Expected: 3 tests fail(DialogueService.__init__ 还没有 content_evaluator 参数,_evaluate_pronunciation 也没有 prior_ai_turn 参数)。

  • Step 5: 修改 DialogueService.__init__ 接受 content_evaluator

打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py

在文件顶部 import 追加:

from app.service.speaking.content_evaluator import ContentEvaluator

定位 __init__

    def __init__(
        self,
        asr: ASRProvider,
        llm: LLMProvider,
        assessor: PronunciationAssessor,
        storage: AudioStorage,
    ):
        self.asr = asr
        self.llm = llm
        self.assessor = assessor
        self.storage = storage

改为:

    def __init__(
        self,
        asr: ASRProvider,
        llm: LLMProvider,
        assessor: PronunciationAssessor,
        storage: AudioStorage,
        content_evaluator: ContentEvaluator | None = None,
    ):
        self.asr = asr
        self.llm = llm
        self.assessor = assessor
        self.storage = storage
        self.content_evaluator = content_evaluator or ContentEvaluator()
  • Step 6: 改 _evaluate_pronunciation 签名和逻辑

定位现有实现(dialogue_service.py:321 左右):

    async def _evaluate_pronunciation(
        self,
        evaluation_id: int,
        audio_bytes: bytes,
        reference_text: str,
        content_type: str = "audio/webm;codecs=opus",
    ):
        """后台静默发音评估"""
        from app.models.database import async_session

        async with async_session() as db:
            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
            if not evaluation:
                logger.error(f"Evaluation record not found: id={evaluation_id}")
                return

            try:
                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
                evaluation.status = "completed"
                evaluation.accuracy_score = result["accuracy_score"]
                evaluation.fluency_score = result["fluency_score"]
                evaluation.completeness_score = result["completeness_score"]
                evaluation.prosody_score = result["prosody_score"]
                evaluation.word_analysis = result.get("word_analysis")
                evaluation.completed_at = datetime.now()
            except Exception as e:
                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
                evaluation.status = "failed"
                evaluation.error_message = str(e)

            await db.commit()

改为:

    async def _evaluate_pronunciation(
        self,
        evaluation_id: int,
        audio_bytes: bytes,
        reference_text: str,
        prior_ai_turn: str = "",
        content_type: str = "audio/webm;codecs=opus",
    ):
        """后台静默发音评估 + 内容评语"""
        from app.models.database import async_session

        async with async_session() as db:
            evaluation = await db.get(PronunciationEvaluation, evaluation_id)
            if not evaluation:
                logger.error(f"Evaluation record not found: id={evaluation_id}")
                return

            try:
                result = await self.assessor.assess(audio_bytes, reference_text, content_type)
                logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
                evaluation.status = "completed"
                evaluation.accuracy_score = result["accuracy_score"]
                evaluation.fluency_score = result["fluency_score"]
                evaluation.completeness_score = result["completeness_score"]
                evaluation.prosody_score = result["prosody_score"]
                evaluation.word_analysis = result.get("word_analysis")
                evaluation.completed_at = datetime.now()

                # Content evaluation: 仅在 Azure 成功时触发;失败不影响 status。
                try:
                    content_feedback = await self.content_evaluator.evaluate(
                        transcript=reference_text,
                        prior_ai_turn=prior_ai_turn,
                        pron_scores={
                            "accuracy": result["accuracy_score"],
                            "fluency": result["fluency_score"],
                            "completeness": result["completeness_score"],
                            "prosody": result["prosody_score"],
                        },
                    )
                    evaluation.content_feedback = content_feedback
                    logger.info(
                        f"Content evaluation done: eval={evaluation_id}, "
                        f"has_feedback={content_feedback is not None}"
                    )
                except Exception as e:
                    logger.error(f"Content evaluation error (soft-fail): eval={evaluation_id}, error={e}")
                    evaluation.content_feedback = None

            except Exception as e:
                logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
                evaluation.status = "failed"
                evaluation.error_message = str(e)

            await db.commit()
  • Step 7: 更新 speak() 里的 asyncio.create_task 传入 prior_ai_turn

定位 speak() 方法内现有的 create_task 调用(dialogue_service.py:189 左右):

            asyncio.create_task(
                self._evaluate_pronunciation(
                    evaluation_id=evaluation.id,
                    audio_bytes=audio_bytes,
                    reference_text=transcript,
                    content_type=content_type,
                )
            )

在 create_task 之前,计算 prior_ai_turn。新增变量(放在 "⑩ 后台发音评估" 之前):

            # 找到本轮 student 消息之前最近的一条 AI 消息作为 content 评估的上下文
            prior_ai_turn = ""
            for msg in reversed(history):
                if msg.role == "ai":
                    prior_ai_turn = msg.content
                    break

然后把 create_task 改为:

            asyncio.create_task(
                self._evaluate_pronunciation(
                    evaluation_id=evaluation.id,
                    audio_bytes=audio_bytes,
                    reference_text=transcript,
                    prior_ai_turn=prior_ai_turn,
                    content_type=content_type,
                )
            )
  • Step 8: 运行测试,确认全部通过
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v

Expected: 3 passed.

如果报 ImportError: cannot import name 'ContentEvaluator' from partial init(循环引用),把 ContentEvaluator 的 import 放到 dialogue_service.py__init__ 方法内的首行(延迟导入)而不是文件顶部。

  • Step 9: Commit
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_content.py
git commit -m "feat(speaking): 在 _evaluate_pronunciation 串联 content_evaluator"

Task 5: [backend] /report 返回 contentFeedback

Files:

  • Modify: cococlass-english-speaking-api/app/service/speaking/dialogue_service.pyget_report 方法)
  • Create: cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py

  • [ ] Step 1: 写测试 — evaluation 带 content_feedback 时,report entry 也带

创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py

"""Tests for get_report including content_feedback pass-through."""

from unittest.mock import MagicMock

import pytest


def _stub_message(role: str, content: str, round_: int, evaluation=None):
    msg = MagicMock()
    msg.role = role
    msg.content = content
    msg.round = round_
    msg.audio_url = None
    msg.evaluation = evaluation
    return msg


def _stub_evaluation(content_feedback=None, status="completed"):
    ev = MagicMock()
    ev.status = status
    ev.accuracy_score = 80
    ev.fluency_score = 80
    ev.completeness_score = 80
    ev.prosody_score = 80
    ev.word_analysis = None
    ev.content_feedback = content_feedback
    return ev


def _build_report_entry(msg) -> dict:
    """Replicates the entry construction in DialogueService.get_report.

    We only exercise the dict-shaping step in isolation — the full get_report
    path hits DB/LLM summary and is not needed for this contract check.
    """
    entry = {
        "round": msg.round,
        "role": msg.role,
        "content": msg.content,
        "audioUrl": msg.audio_url,
    }
    if msg.role == "student" and msg.evaluation:
        ev = msg.evaluation
        entry["evaluation"] = {
            "status": ev.status,
            "accuracyScore": ev.accuracy_score,
            "fluencyScore": ev.fluency_score,
            "completenessScore": ev.completeness_score,
            "prosodyScore": ev.prosody_score,
            "wordAnalysis": ev.word_analysis,
            "contentFeedback": ev.content_feedback,
        }
    return entry


def test_report_entry_includes_content_feedback_when_present() -> None:
    feedback = {"highlights": ["good"], "corrections": [], "suggestions": []}
    ev = _stub_evaluation(content_feedback=feedback)
    msg = _stub_message("student", "hi", 1, evaluation=ev)

    entry = _build_report_entry(msg)

    assert entry["evaluation"]["contentFeedback"] == feedback


def test_report_entry_content_feedback_is_null_when_absent() -> None:
    ev = _stub_evaluation(content_feedback=None)
    msg = _stub_message("student", "hi", 1, evaluation=ev)

    entry = _build_report_entry(msg)

    assert entry["evaluation"]["contentFeedback"] is None


def test_ai_message_has_no_evaluation_key() -> None:
    msg = _stub_message("ai", "hello", 1, evaluation=None)
    entry = _build_report_entry(msg)
    assert "evaluation" not in entry

这里我们测试的是 entry-shaping 的契约(独立 helper)。真实 get_report 里我们要修改同样的 entry 构造块保持一致。

  • Step 2: 运行测试,应该全过(独立 helper 不依赖还没修改的代码)
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_report.py -v

Expected: 3 passed. 这一步验证的是契约,下一步把它应用到真实代码。

  • Step 3: 修改 get_report 的 entry 构造块

打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py,定位:

            if msg.role == "student" and msg.evaluation:
                ev = msg.evaluation
                entry["evaluation"] = {
                    "status": ev.status,
                    "accuracyScore": ev.accuracy_score,
                    "fluencyScore": ev.fluency_score,
                    "completenessScore": ev.completeness_score,
                    "prosodyScore": ev.prosody_score,
                    "wordAnalysis": ev.word_analysis,
                }

改为:

            if msg.role == "student" and msg.evaluation:
                ev = msg.evaluation
                entry["evaluation"] = {
                    "status": ev.status,
                    "accuracyScore": ev.accuracy_score,
                    "fluencyScore": ev.fluency_score,
                    "completenessScore": ev.completeness_score,
                    "prosodyScore": ev.prosody_score,
                    "wordAnalysis": ev.word_analysis,
                    "contentFeedback": ev.content_feedback,
                }
  • Step 4: 重跑 report 相关测试
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/ -v

Expected: 全部 pass(smoke 1 + evaluator 4 + content 3 + report 3 = 11 passed)。

  • Step 5: Commit
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_report.py
git commit -m "feat(speaking): /report 返回每轮 contentFeedback"

Task 6: [frontend] 把 contentFeedback 透传到 sentence.feedback

DetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 渲染(PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue:94-116),所以前端只需要在 getReport 响应转 OverallEvaluation 的地方加一个 field pass-through。

Files:

  • Modify: PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts

  • [ ] Step 1: 定位后端→前端形状转换位置

运行:

grep -n "rounds\|sentenceEvaluations\|evaluation" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts

后端 /report 返回 { sessionId, topic, status, rounds[], summary },前端 DialogueReport 期望 { evaluation: OverallEvaluation }sentenceEvaluations[] 里每项的 feedback 字段)。当前 RealDialogueAPI.getReport()llmService.ts:86-92)直接 return res.json(),不做形状转换。

这意味着:当前前端要么通过其他层做 shape adaption,要么 DetailedReport.vue 从别处拿数据。先跑一次 grep 找适配位置:

grep -rn "sentenceEvaluations\|rounds" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking --include="*.ts" --include="*.vue" | head -30
  • Step 2: 根据 Step 1 结果选一条支路

支路 A(理想情况):如果已经有一个 mapReportToEvaluation(backendRes) 之类的函数

  • 在那个函数里给每个 sentence 加 feedback: round.evaluation?.contentFeedback ?? undefined
  • 继续 Step 3

支路 B(没有转换层):如果 getReport 的返回值直接裸传给组件

  • RealDialogueAPI.getReport() 里把 rounds[] 转成 OverallEvaluation.sentenceEvaluations[],其中每个 student 角色的轮次 emit 一个 SentenceEvaluationfeedback: r.evaluation?.contentFeedback ?? undefined
  • 继续 Step 3

支路 C(Mock API 已经长成 { evaluation: OverallEvaluation } 但真实后端没适配): 这是当前最可能的状态。此时必须在 RealDialogueAPI.getReport() 里写一个显式 adapter。按支路 B 实现。

  • Step 3: 在 RealDialogueAPI.getReport 里加 adapter(假设走支路 B/C)

打开 PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts

把:

  async getReport(sessionId: string): Promise<DialogueReport> {
    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
      credentials: 'include',
    })
    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
    return res.json()
  }

改为:

  async getReport(sessionId: string): Promise<DialogueReport> {
    const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
      credentials: 'include',
    })
    if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
    const raw = await res.json() as BackendReportResponse
    return adaptReport(raw)
  }

RealDialogueAPI 类定义之前加:

interface BackendEvaluation {
  status: 'pending' | 'completed' | 'failed'
  accuracyScore: number | null
  fluencyScore: number | null
  completenessScore: number | null
  prosodyScore: number | null
  wordAnalysis: unknown
  contentFeedback: {
    highlights: string[]
    corrections: { original: string; corrected: string; explanation: string }[]
    suggestions: string[]
  } | null
}

interface BackendRound {
  round: number
  role: 'ai' | 'student'
  content: string
  audioUrl: string | null
  evaluation?: BackendEvaluation
}

interface BackendReportResponse {
  sessionId: string
  topic: string
  status: 'evaluating' | 'ready'
  rounds: BackendRound[]
  summary: string | null
}

function adaptReport(raw: BackendReportResponse): DialogueReport {
  const sentenceEvaluations: SentenceEvaluation[] = raw.rounds.map((r, idx) => ({
    id: `${raw.sessionId}-${idx}`,
    round: r.round,
    role: r.role,
    content: r.content,
    audioUrl: r.audioUrl ?? undefined,
    pronunciation: r.evaluation && r.role === 'student'
      ? {
          accuracy: r.evaluation.accuracyScore ?? 0,
          fluency: r.evaluation.fluencyScore ?? 0,
          // enspeak 原型用 intonation/stress 做 UI label;把 Azure 的 prosody/completeness 分别
          // 映射到这两格(prosody → intonation 表示语调、completeness → stress 表示完整读出)。
          // 这是一个 UI 贴合性决定,如未来 UI 统一改用 Azure 四维,再把 key 改回来。
          intonation: r.evaluation.prosodyScore ?? 0,
          stress: r.evaluation.completenessScore ?? 0,
        }
      : undefined,
    feedback: r.evaluation?.contentFeedback ?? undefined,
  }))

  // overallScore 先用平均分作为 MVP 占位;其他字段留空/安全默认。
  const studentEvals = sentenceEvaluations.filter(s => s.role === 'student' && s.pronunciation)
  const avg = studentEvals.length > 0
    ? Math.round(
        studentEvals.reduce(
          (sum, s) => sum + (s.pronunciation!.accuracy + s.pronunciation!.fluency + s.pronunciation!.intonation + s.pronunciation!.stress) / 4,
          0,
        ) / studentEvals.length,
      )
    : 0

  return {
    evaluation: {
      overallScore: avg,
      scoreLevel: avg >= 85 ? 'excellent' : avg >= 70 ? 'good' : avg >= 60 ? 'fair' : 'needsWork',
      percentile: 0,
      dimensions: { fluency: 0, interaction: 0, vocabulary: 0, grammar: 0 },
      aiComment: raw.summary ?? '',
      highlights: [],
      improvements: [],
      nextChallenge: {},
      statistics: {
        totalRounds: Math.max(...sentenceEvaluations.map(s => s.round), 0),
        averageScore: avg,
        highestScore: 0,
        highestRound: 0,
        grammarErrors: 0,
        excellentExpressions: 0,
        totalDuration: 0,
      },
      sentenceEvaluations,
    },
  }
}

然后在顶部 import 里追加 SentenceEvaluation

import type {
  DialogueAPI, DialogueReport, SessionConfig, SessionInfo, SSEEvent,
  SentenceEvaluation,
} from '@/types/englishSpeaking'

(如果 SentenceEvaluation 未从 englishSpeaking.ts 导出,先去那个文件确认 export interface SentenceEvaluation 已加 export 关键字。)

注意:如果 Step 1 的 grep 显示已经有现成的 adapter 函数,以现有适配层为准——只在那里追加 feedback 字段、不新建 adapter。跳过这里的 adaptReport 整段代码,改为找到现有函数加一行 pass-through。

  • Step 4: 类型检查
cd /Users/buoy/Development/gitrepo/PPT
npm run type-check

(如果项目用 pnpm / yarn,相应调整。若没有 type-check script,跑 npx vue-tsc --noEmit。)

Expected: 无 type error。

  • Step 5: 手动 smoke 验证
  1. 启动后端:

    cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
    uv run uvicorn app.main:app --reload
    
    1. 启动前端: bash cd /Users/buoy/Development/gitrepo/PPT npm run dev
  2. 浏览器进入 EnglishSpeaking 组件,完成一轮对话。

  3. 打开结果页(DetailedReport),确认每轮学生句子下面能看到"亮点 / 改正 / 建议"三段内容。

  4. 同时在后端 DB 查:

    SELECT round, status, accuracy_score, content_feedback
    FROM pronunciation_evaluation
    WHERE session_id = (SELECT id FROM dialogue_session ORDER BY id DESC LIMIT 1);
    

    确认 content_feedback{highlights, corrections, suggestions} 结构(或 NULL 如果 LLM 失败)。

    任意一项不通过,回到对应 Task 定位 bug。

    • Step 6: Commit
    cd /Users/buoy/Development/gitrepo/PPT
    git add src/views/Editor/EnglishSpeaking/services/llmService.ts
    git commit -m "feat(english-speaking): 结果页透传 contentFeedback 到 SentenceCard"
    

Task 7: 回归校验全部测试和现有流程

  • Step 1: 跑后端全测试
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest -v

Expected: 所有 test 通过(包含本次新增的 smoke 1 + evaluator 4 + content-dispatch 3 + report 3 = 11 个)。

  • Step 2: 跑前端类型检查
cd /Users/buoy/Development/gitrepo/PPT
npm run type-check

Expected: 无 type error。

  • Step 3: 把两个 repo 的 HEAD 记下来,作为本次实施的完成标记
echo "backend:  $(git -C /Users/buoy/Development/gitrepo/cococlass-english-speaking-api rev-parse --short HEAD)"
echo "frontend: $(git -C /Users/buoy/Development/gitrepo/PPT rev-parse --short HEAD)"

把输出贴到本 plan 文件底部的"完成记录"栏。


完成记录

  • 计划完成日期:2026-04-23
  • 后端 HEAD:7d192be (branch feat/content-evaluator, 基线 aa5e1a7,5 个 commit)
    • 99e64fa Task 1 DB 列
    • 1492ebe Task 2 pytest 骨架
    • dee45e6 Task 3 content_evaluator 模块
    • a1f1b91 Task 4 接线 _evaluate_pronunciation
    • 7d192be Task 5 /report 返回 contentFeedback
  • 前端 HEAD:7c4d1a9 (branch feat/english-speaking, 基线 4523862,1 个 commit)
    • 7c4d1a9 Task 6 结果页 contentFeedback 透传
  • 最终回归:后端 uv run pytest 11/11 passed;前端 vue-tsc --noEmit exit=0
  • 偏差与说明:
    • Task 2 触发了 plan Step 2a 的条件分支(pytest-asyncio 缺失,按 plan 指令自动追加依赖 + conftest.py 插入 pytest_plugins
    • Task 4 把 async_session 从方法内延迟 import 提升到模块顶层(为了让 monkeypatch.setattr 能打到;无循环依赖风险)
    • Task 6 实际走 plan 的 Branch C(实现完整 adapter),投资成本略高于 Branch A 的一行 pass-through。发现:TopicDiscussionPreview.vue 当前展示的是 mockEvaluation 硬编码数据,真实 getReport() 并未被 UI 消费。adapter 是结构性就位,真正"打开结果页看到 LLM 评语"需要后续把 UI 切换到走 DialogueAPI.getReport——该切换不在本 MVP 范围,留给后续任务
    • 未做端到端 smoke(真实后端 + 真实浏览器操作),仅静态验证(单测 + 类型检查)。进入真实联调时需先在后端启动 .env 里配好 AZURE_SPEECH_KEYONEHUB_API_KEY
    • 评审中提出的几个 non-blocking 改进项(content_evaluator 的 AsyncOpenAI 生命周期、_StubDB 断言说明、pron_scores TypedDict、adapter 错误容忍)均标记为后续迭代,未纳入本次 MVP

2026-04-24 补充:UI 接入 + 跨仓全量 Code Review 发现

本轮继续完成了"UI 切换到真实 getReport"(原遗留项),并对后端/前端分别跑了一次完整 code review。对话主流程尚未跑通到结果页,下次回到本 MVP 前先让对话链路能走完 N 轮进入 completed 态,再基于真实数据验证下列修复。

新增 commit

  • 前端 d1186cbDialogueChatView emit complete 携带 DialogueReport | nullTopicDiscussionPreviewdisplayEvaluation 优先真实数据、mock 作为 fallback
  • 前端最新 HEAD:d1186cb;后端 HEAD 未变(仍 7d192be
  • vue-tsc 通过

下次回来必须先修(Critical + Important)

  • [BACKEND CRITICAL] /speak-stream WebSocket 路径完全绕过 ContentEvaluator

    • app/api/dialogue.py:159-184_background_evaluate_pronunciation 只跑 Azure,从未调用 content evaluator
    • 前端录音主路径是 WebSocket(useDialogueEngine.ts:256+ beginStudentStream),HTTP /speak 只是 fallback
    • 后果:真实用户的 content_feedback 永远 NULL
    • 修复方向:让 /speak-stream 统一走 DialogueService._evaluate_pronunciation,或把 evaluator 调用复制进去
  • [FRONTEND CRITICAL] getReport 轮询不识别 status === 'evaluating'

    • useDialogueEngine.ts:190-202 的 poll 只对 reject 重试;后端 200 返 status='evaluating' 且部分 round contentFeedback=null 时直接 resolve
    • 设计文档 §2.6 已预告此情况但实现未处理
    • BackendReportResponse.status 类型已声明却从未被读取
    • 修复方向:把 status === 'evaluating' 视为"未完成"继续 poll
  • [FRONTEND IMPORTANT] getReport 失败悄悄回落到 mockEvaluation(I2)

    • fetchReportSafe → null → displayEvaluation → mockEvaluation(熊猫/竹子那份假数据)
    • 用户会把虚构报告当作自己的
    • 修复方向:真实模式下失败必须显示错误态 UI,不回落 mock
  • [FRONTEND IMPORTANT] "结束并查看报告" 阻塞最多 30s(I3)

    • handleExitConfirmawait fetchReportSafe() 串在 modal 关闭后,期间 chat view 冻结
    • 修复方向:先切到 completed stage 显示 loading,getReport 后台拉
  • [FRONTEND IMPORTANT] pending/failed 轮次 0 分污染 overallScore(I5)

    • llmService.ts:90-97?? 0 把未完成轮次填 0,.filter(s.pronunciation) 仍会保留
    • 修复方向:adapter 里只有 status === 'completed' 且 score 非 null 才填 pronunciation
  • [FRONTEND IMPORTANT] Axis 映射语义不对(I4)

    • prosody → intonationcompleteness → stress:completeness 是"读完整度",stress 是"重音"
    • 修复方向:MVP 先加注释标记为债,后续扩展 SentenceEvaluation.pronunciation 为 Azure 四维

下下轮改进(Non-blocking)

  • Backend I1:DialogueService.__init__ 每请求 new AsyncOpenAIget_dialogue_serviceDepends 里),改 module-level 单例
  • Backend I2:prior_ai_turn 依赖"学生消息已 flush 但 AI 未写入"的时序脆弱,加 role='ai' AND round < current_round 显式过滤
  • Backend I4:pron_scores: dict 缺 TypedDict
  • Backend I5/I6:test_content_evaluator 缺 prompt payload 断言;test_dialogue_service_report 契约镜像嫌疑
  • Backend M1-6:corrections 内部结构未校验、prompt 注入、migration 非幂等、model 无专用配置、f-string 日志
  • Frontend M1-5:非空断言可避免、Math.max(...arr) 栈风险、dimensions/statistics/aiComment 占位零未标 TODO、id 方案不统一、status 类型不含 'evaluating'