For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: 在现有每轮 Azure 发音四分之外,补一条 LLM 内容评语链路,产出 {highlights, corrections, suggestions} 挂在每轮评估上,只在结果页展示。
Architecture: 每轮 /speak 后台任务里,Azure PA 完成后串联一次 OpenAI JSON-mode 调用;失败降级到 content_feedback=null 不影响发音分;/report 返回时带 contentFeedback 字段。
Tech Stack: Python 3.13 · FastAPI · SQLAlchemy 2.x async · MySQL · OpenAI SDK (via onehub base_url) · pytest · uv · Vue 3 · TypeScript
Repos:
/Users/buoy/Development/gitrepo/cococlass-english-speaking-api/Users/buoy/Development/gitrepo/PPTSpec: /Users/buoy/Development/gitrepo/PPT/doc/ContentEvaluationDesign.md
Create:
app/service/speaking/content_evaluator.py — 单一职责:把 (4 发音分 + AI 上一句 + 学生转录) 丢给 LLM 出 JSON 评语tests/conftest.py — pytest 异步 + mock 夹具tests/service/__init__.pytests/service/speaking/__init__.pytests/service/speaking/test_content_evaluator.py — evaluator 的单元测试tests/service/speaking/test_dialogue_service_content.py — 串联逻辑的单元测试migrations/001_add_content_feedback.sql — 对已有 DB 的增量 SQLModify:
init.sql — 对新 DB 的建表语句同步加列app/models/dialogue.py — PronunciationEvaluation 增加 content_feedback 列app/service/speaking/dialogue_service.py — _evaluate_pronunciation 成功分支后追加 content 评估;get_report 返回 contentFeedbackModify:
src/views/Editor/EnglishSpeaking/services/llmService.ts — getReport 的响应转换(把后端 rounds[i].evaluation.contentFeedback 映射到 sentenceEvaluations[i].feedback)不改:DetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 形状渲染;englishSpeaking.ts 的 SentenceEvaluation.feedback 类型也已经对齐。
content_feedback 列Files:
cococlass-english-speaking-api/init.sql(新建表语句)cococlass-english-speaking-api/migrations/001_add_content_feedback.sqlModify: cococlass-english-speaking-api/app/models/dialogue.py(SQLAlchemy 模型)
[ ] Step 1: 更新 init.sql 的 pronunciation_evaluation 建表语句
在 pronunciation_evaluation 表定义里,completed_at 之前插入 content_feedback 列:
打开 cococlass-english-speaking-api/init.sql,把:
word_analysis JSON NULL,
error_message TEXT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
completed_at DATETIME NULL,
改为:
word_analysis JSON NULL,
content_feedback JSON NULL,
error_message TEXT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
completed_at DATETIME NULL,
migrations/ 目录并写入增量 SQLcd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p migrations
创建 migrations/001_add_content_feedback.sql,内容:
-- Add content_feedback column to existing pronunciation_evaluation table.
-- Apply once against an existing database (new DBs use updated init.sql).
ALTER TABLE pronunciation_evaluation
ADD COLUMN content_feedback JSON NULL AFTER word_analysis;
打开 cococlass-english-speaking-api/app/models/dialogue.py。
定位:
word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
改为(在中间插入 content_feedback):
word_analysis: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
content_feedback: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add init.sql migrations/001_add_content_feedback.sql app/models/dialogue.py
git commit -m "feat(db): 为 pronunciation_evaluation 增加 content_feedback 列"
本仓库 tests/ 目前只有空 __init__.py。先建立可运行的单测基础。
Files:
cococlass-english-speaking-api/tests/conftest.pycococlass-english-speaking-api/tests/service/__init__.pycococlass-english-speaking-api/tests/service/speaking/__init__.pyCreate: cococlass-english-speaking-api/tests/service/speaking/test_smoke.py
[ ] Step 1: 创建 tests/conftest.py
"""Pytest global fixtures & asyncio config."""
import pytest
@pytest.fixture
def anyio_backend() -> str:
"""Force asyncio backend for anyio tests (not trio)."""
return "asyncio"
__init__.py 使 pytest 能发现嵌套目录cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
mkdir -p tests/service/speaking
touch tests/service/__init__.py tests/service/speaking/__init__.py
创建 tests/service/speaking/test_smoke.py:
def test_pytest_works() -> None:
assert 1 + 1 == 2
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_smoke.py -v
Expected: 1 passed.
如果 uv run pytest 报 "pytest: command not found",先 uv sync --group dev 装开发依赖再重跑。
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add tests/
git commit -m "chore(test): 搭建 pytest 目录骨架和 conftest"
content_evaluator 模块(TDD)Files:
cococlass-english-speaking-api/app/service/speaking/content_evaluator.pycococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py(上一任务 smoke 测试文件所在目录,新建另一个文件)ContentEvaluator 直接实例化 AsyncOpenAI(和 OneHubLLM 一样用 settings.ONEHUB_BASE_URL + settings.ONEHUB_API_KEY),因为需要 response_format 参数,现有 LLMProvider.chat() 接口不暴露它。
创建 cococlass-english-speaking-api/tests/service/speaking/test_content_evaluator.py:
"""Unit tests for ContentEvaluator."""
import json
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from app.service.speaking.content_evaluator import ContentEvaluator
def _mock_openai_response(content: str) -> MagicMock:
"""Construct a fake AsyncOpenAI chat completion response."""
choice = MagicMock()
choice.message.content = content
resp = MagicMock()
resp.choices = [choice]
return resp
@pytest.mark.asyncio
async def test_evaluate_happy_path() -> None:
fake_json = json.dumps(
{
"highlights": ["发音清晰", "句子完整"],
"corrections": [
{
"original": "I go to park yesterday",
"corrected": "I went to the park yesterday",
"explanation": "过去式应用 went,park 前加 the",
}
],
"suggestions": ["可增加连接词"],
}
)
with patch(
"app.service.speaking.content_evaluator.AsyncOpenAI"
) as MockClient:
instance = MockClient.return_value
instance.chat.completions.create = AsyncMock(
return_value=_mock_openai_response(fake_json)
)
evaluator = ContentEvaluator()
result = await evaluator.evaluate(
transcript="I go to park yesterday",
prior_ai_turn="What did you do last weekend?",
pron_scores={"accuracy": 72, "fluency": 85, "completeness": 90, "prosody": 60},
)
assert result is not None
assert result["highlights"] == ["发音清晰", "句子完整"]
assert len(result["corrections"]) == 1
assert result["corrections"][0]["corrected"] == "I went to the park yesterday"
assert result["suggestions"] == ["可增加连接词"]
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v
Expected: ModuleNotFoundError: No module named 'app.service.speaking.content_evaluator' 或类似 import 错误。
创建 cococlass-english-speaking-api/app/service/speaking/content_evaluator.py:
"""Per-turn content evaluation via LLM (JSON mode)."""
import asyncio
import json
from openai import AsyncOpenAI
from app.config import settings
from app.logging import get_logger
logger = get_logger(__name__)
SYSTEM_PROMPT = """You are an English tutor evaluating a student's single spoken turn
in an open dialogue. You receive:
- Azure pronunciation scores (accuracy/fluency/completeness/prosody, 0-100)
- The immediate prior AI turn (context)
- The student's transcript
Return JSON with exactly these keys:
- highlights: 1-2 Chinese sentences praising specific strengths. Reference a
pronunciation dimension if that score is >= 85. <= 30 chars each.
- corrections: array of grammar/word-choice fixes. Each item has keys:
original (EN), corrected (EN), explanation (ZH, <= 30 chars).
- suggestions: 1-2 Chinese actionable improvements. Reference a pronunciation
dimension if that score is < 70. <= 30 chars each.
Rules:
- Empty arrays are valid. Do not invent errors to fill quota.
- If the student only said a filler ("yes", "ok", "hmm"), return empty
corrections and suggestions plus one encouragement in highlights.
- Never include raw score numbers in output text; describe qualitatively
("发音准确度很高" not "accuracy 92").
- Output MUST be a single JSON object with keys highlights, corrections, suggestions.
"""
class ContentEvaluator:
"""Generates per-turn content feedback via LLM in JSON mode."""
def __init__(self, timeout_seconds: float = 10.0):
self.client = AsyncOpenAI(
base_url=settings.ONEHUB_BASE_URL,
api_key=settings.ONEHUB_API_KEY,
)
self.model = settings.ONEHUB_MODEL
self.timeout_seconds = timeout_seconds
async def evaluate(
self,
transcript: str,
prior_ai_turn: str,
pron_scores: dict,
) -> dict | None:
"""Return {highlights, corrections, suggestions} or None on failure."""
user_payload = json.dumps(
{
"pronunciation": pron_scores,
"ai_said": prior_ai_turn,
"student_said": transcript,
},
ensure_ascii=False,
)
try:
resp = await asyncio.wait_for(
self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_payload},
],
response_format={"type": "json_object"},
temperature=0,
),
timeout=self.timeout_seconds,
)
except asyncio.TimeoutError:
logger.warning("ContentEvaluator LLM timeout")
return None
except Exception as e:
logger.error(f"ContentEvaluator LLM error: {e}")
return None
raw = resp.choices[0].message.content or ""
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
logger.warning(f"ContentEvaluator got non-JSON: {raw[:200]}")
return None
if not self._has_required_shape(parsed):
logger.warning(f"ContentEvaluator got invalid shape: {parsed}")
return None
return {
"highlights": parsed.get("highlights", []),
"corrections": parsed.get("corrections", []),
"suggestions": parsed.get("suggestions", []),
}
@staticmethod
def _has_required_shape(obj: object) -> bool:
if not isinstance(obj, dict):
return False
for key in ("highlights", "corrections", "suggestions"):
if key not in obj or not isinstance(obj[key], list):
return False
return True
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py::test_evaluate_happy_path -v
Expected: 1 passed.
如果报错 pytest-asyncio plugin not installed,在 pyproject.toml 的 [dependency-groups].dev 里追加 "pytest-asyncio>=0.26.0",并在 tests/conftest.py 顶部加:
import pytest
pytest_plugins = ["pytest_asyncio"]
再 uv sync --group dev 重跑。
在 test_content_evaluator.py 追加:
@pytest.mark.asyncio
async def test_evaluate_returns_none_on_invalid_json() -> None:
with patch(
"app.service.speaking.content_evaluator.AsyncOpenAI"
) as MockClient:
instance = MockClient.return_value
instance.chat.completions.create = AsyncMock(
return_value=_mock_openai_response("not a json")
)
evaluator = ContentEvaluator()
result = await evaluator.evaluate(
transcript="Hi",
prior_ai_turn="Hello",
pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
)
assert result is None
追加:
@pytest.mark.asyncio
async def test_evaluate_returns_none_on_timeout() -> None:
async def never_returns(**kwargs):
await asyncio.sleep(5)
with patch(
"app.service.speaking.content_evaluator.AsyncOpenAI"
) as MockClient:
instance = MockClient.return_value
instance.chat.completions.create = never_returns
evaluator = ContentEvaluator(timeout_seconds=0.05)
result = await evaluator.evaluate(
transcript="Hi",
prior_ai_turn="Hello",
pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
)
assert result is None
同时在文件顶部 import 里加 import asyncio(如果还没有)。
追加:
@pytest.mark.asyncio
async def test_evaluate_returns_none_on_wrong_shape() -> None:
# LLM 返回 JSON 但少字段
bad = json.dumps({"highlights": ["ok"]})
with patch(
"app.service.speaking.content_evaluator.AsyncOpenAI"
) as MockClient:
instance = MockClient.return_value
instance.chat.completions.create = AsyncMock(
return_value=_mock_openai_response(bad)
)
evaluator = ContentEvaluator()
result = await evaluator.evaluate(
transcript="Hi",
prior_ai_turn="Hello",
pron_scores={"accuracy": 80, "fluency": 80, "completeness": 80, "prosody": 80},
)
assert result is None
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_content_evaluator.py -v
Expected: 4 passed.
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/content_evaluator.py tests/service/speaking/test_content_evaluator.py
# 如果改了 pyproject.toml / conftest.py
git add pyproject.toml tests/conftest.py uv.lock 2>/dev/null || true
git commit -m "feat(speaking): 新增 content_evaluator(LLM JSON 模式生成单轮评语)"
_evaluate_pronunciation(TDD)这是核心集成点:Azure 成功后追加一次 content 评估;Azure 失败则不调用;content 失败不影响 status。
Files:
cococlass-english-speaking-api/app/service/speaking/dialogue_service.pycococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py注意:原 _evaluate_pronunciation 通过 self.assessor 依赖注入。为了让 content evaluator 可被测替换,下面把它也作为依赖挂到 DialogueService 上。
创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_content.py:
"""Integration-ish tests for content evaluation wired into DialogueService._evaluate_pronunciation."""
from unittest.mock import AsyncMock, MagicMock
import pytest
from app.service.speaking.dialogue_service import DialogueService
class _StubDB:
"""Minimal stand-in for AsyncSession that supports get() + commit()."""
def __init__(self, evaluation):
self._evaluation = evaluation
self.commit = AsyncMock()
async def __aenter__(self):
return self
async def __aexit__(self, *args):
return False
async def get(self, _cls, _id):
return self._evaluation
def _fake_evaluation() -> MagicMock:
ev = MagicMock()
ev.status = "pending"
ev.accuracy_score = None
ev.fluency_score = None
ev.completeness_score = None
ev.prosody_score = None
ev.word_analysis = None
ev.content_feedback = None
ev.completed_at = None
ev.error_message = None
return ev
def _build_service(assessor, evaluator) -> DialogueService:
return DialogueService(
asr=MagicMock(),
llm=MagicMock(),
assessor=assessor,
storage=MagicMock(),
content_evaluator=evaluator,
)
@pytest.mark.asyncio
async def test_azure_success_then_content_success_writes_both(monkeypatch) -> None:
ev = _fake_evaluation()
stub_db = _StubDB(ev)
monkeypatch.setattr(
"app.service.speaking.dialogue_service.async_session", lambda: stub_db
)
assessor = MagicMock()
assessor.assess = AsyncMock(
return_value={
"accuracy_score": 80,
"fluency_score": 85,
"completeness_score": 90,
"prosody_score": 75,
"word_analysis": [],
}
)
evaluator = MagicMock()
evaluator.evaluate = AsyncMock(
return_value={
"highlights": ["nice"],
"corrections": [],
"suggestions": [],
}
)
service = _build_service(assessor, evaluator)
await service._evaluate_pronunciation(
evaluation_id=1,
audio_bytes=b"",
reference_text="hi",
prior_ai_turn="hello",
)
assert ev.status == "completed"
assert ev.accuracy_score == 80
assert ev.content_feedback == {"highlights": ["nice"], "corrections": [], "suggestions": []}
evaluator.evaluate.assert_awaited_once()
追加:
@pytest.mark.asyncio
async def test_azure_success_content_failure_keeps_status_completed(monkeypatch) -> None:
ev = _fake_evaluation()
stub_db = _StubDB(ev)
monkeypatch.setattr(
"app.service.speaking.dialogue_service.async_session", lambda: stub_db
)
assessor = MagicMock()
assessor.assess = AsyncMock(
return_value={
"accuracy_score": 80,
"fluency_score": 85,
"completeness_score": 90,
"prosody_score": 75,
"word_analysis": [],
}
)
evaluator = MagicMock()
evaluator.evaluate = AsyncMock(return_value=None) # LLM failed
service = _build_service(assessor, evaluator)
await service._evaluate_pronunciation(
evaluation_id=1,
audio_bytes=b"",
reference_text="hi",
prior_ai_turn="hello",
)
assert ev.status == "completed"
assert ev.accuracy_score == 80
assert ev.content_feedback is None
追加:
@pytest.mark.asyncio
async def test_azure_failure_skips_content_evaluator(monkeypatch) -> None:
ev = _fake_evaluation()
stub_db = _StubDB(ev)
monkeypatch.setattr(
"app.service.speaking.dialogue_service.async_session", lambda: stub_db
)
assessor = MagicMock()
assessor.assess = AsyncMock(side_effect=RuntimeError("azure exploded"))
evaluator = MagicMock()
evaluator.evaluate = AsyncMock()
service = _build_service(assessor, evaluator)
await service._evaluate_pronunciation(
evaluation_id=1,
audio_bytes=b"",
reference_text="hi",
prior_ai_turn="hello",
)
assert ev.status == "failed"
assert ev.content_feedback is None
evaluator.evaluate.assert_not_awaited()
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v
Expected: 3 tests fail(DialogueService.__init__ 还没有 content_evaluator 参数,_evaluate_pronunciation 也没有 prior_ai_turn 参数)。
DialogueService.__init__ 接受 content_evaluator打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py。
在文件顶部 import 追加:
from app.service.speaking.content_evaluator import ContentEvaluator
定位 __init__:
def __init__(
self,
asr: ASRProvider,
llm: LLMProvider,
assessor: PronunciationAssessor,
storage: AudioStorage,
):
self.asr = asr
self.llm = llm
self.assessor = assessor
self.storage = storage
改为:
def __init__(
self,
asr: ASRProvider,
llm: LLMProvider,
assessor: PronunciationAssessor,
storage: AudioStorage,
content_evaluator: ContentEvaluator | None = None,
):
self.asr = asr
self.llm = llm
self.assessor = assessor
self.storage = storage
self.content_evaluator = content_evaluator or ContentEvaluator()
_evaluate_pronunciation 签名和逻辑定位现有实现(dialogue_service.py:321 左右):
async def _evaluate_pronunciation(
self,
evaluation_id: int,
audio_bytes: bytes,
reference_text: str,
content_type: str = "audio/webm;codecs=opus",
):
"""后台静默发音评估"""
from app.models.database import async_session
async with async_session() as db:
evaluation = await db.get(PronunciationEvaluation, evaluation_id)
if not evaluation:
logger.error(f"Evaluation record not found: id={evaluation_id}")
return
try:
result = await self.assessor.assess(audio_bytes, reference_text, content_type)
logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
evaluation.status = "completed"
evaluation.accuracy_score = result["accuracy_score"]
evaluation.fluency_score = result["fluency_score"]
evaluation.completeness_score = result["completeness_score"]
evaluation.prosody_score = result["prosody_score"]
evaluation.word_analysis = result.get("word_analysis")
evaluation.completed_at = datetime.now()
except Exception as e:
logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
evaluation.status = "failed"
evaluation.error_message = str(e)
await db.commit()
改为:
async def _evaluate_pronunciation(
self,
evaluation_id: int,
audio_bytes: bytes,
reference_text: str,
prior_ai_turn: str = "",
content_type: str = "audio/webm;codecs=opus",
):
"""后台静默发音评估 + 内容评语"""
from app.models.database import async_session
async with async_session() as db:
evaluation = await db.get(PronunciationEvaluation, evaluation_id)
if not evaluation:
logger.error(f"Evaluation record not found: id={evaluation_id}")
return
try:
result = await self.assessor.assess(audio_bytes, reference_text, content_type)
logger.info(f"Pronunciation assessment done: eval={evaluation_id}, accuracy={result['accuracy_score']}")
evaluation.status = "completed"
evaluation.accuracy_score = result["accuracy_score"]
evaluation.fluency_score = result["fluency_score"]
evaluation.completeness_score = result["completeness_score"]
evaluation.prosody_score = result["prosody_score"]
evaluation.word_analysis = result.get("word_analysis")
evaluation.completed_at = datetime.now()
# Content evaluation: 仅在 Azure 成功时触发;失败不影响 status。
try:
content_feedback = await self.content_evaluator.evaluate(
transcript=reference_text,
prior_ai_turn=prior_ai_turn,
pron_scores={
"accuracy": result["accuracy_score"],
"fluency": result["fluency_score"],
"completeness": result["completeness_score"],
"prosody": result["prosody_score"],
},
)
evaluation.content_feedback = content_feedback
logger.info(
f"Content evaluation done: eval={evaluation_id}, "
f"has_feedback={content_feedback is not None}"
)
except Exception as e:
logger.error(f"Content evaluation error (soft-fail): eval={evaluation_id}, error={e}")
evaluation.content_feedback = None
except Exception as e:
logger.error(f"Pronunciation assessment failed: eval={evaluation_id}, error={e}")
evaluation.status = "failed"
evaluation.error_message = str(e)
await db.commit()
speak() 里的 asyncio.create_task 传入 prior_ai_turn定位 speak() 方法内现有的 create_task 调用(dialogue_service.py:189 左右):
asyncio.create_task(
self._evaluate_pronunciation(
evaluation_id=evaluation.id,
audio_bytes=audio_bytes,
reference_text=transcript,
content_type=content_type,
)
)
在 create_task 之前,计算 prior_ai_turn。新增变量(放在 "⑩ 后台发音评估" 之前):
# 找到本轮 student 消息之前最近的一条 AI 消息作为 content 评估的上下文
prior_ai_turn = ""
for msg in reversed(history):
if msg.role == "ai":
prior_ai_turn = msg.content
break
然后把 create_task 改为:
asyncio.create_task(
self._evaluate_pronunciation(
evaluation_id=evaluation.id,
audio_bytes=audio_bytes,
reference_text=transcript,
prior_ai_turn=prior_ai_turn,
content_type=content_type,
)
)
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_content.py -v
Expected: 3 passed.
如果报 ImportError: cannot import name 'ContentEvaluator' from partial init(循环引用),把 ContentEvaluator 的 import 放到 dialogue_service.py 的 __init__ 方法内的首行(延迟导入)而不是文件顶部。
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_content.py
git commit -m "feat(speaking): 在 _evaluate_pronunciation 串联 content_evaluator"
/report 返回 contentFeedbackFiles:
cococlass-english-speaking-api/app/service/speaking/dialogue_service.py(get_report 方法)Create: cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py
[ ] Step 1: 写测试 — evaluation 带 content_feedback 时,report entry 也带
创建 cococlass-english-speaking-api/tests/service/speaking/test_dialogue_service_report.py:
"""Tests for get_report including content_feedback pass-through."""
from unittest.mock import MagicMock
import pytest
def _stub_message(role: str, content: str, round_: int, evaluation=None):
msg = MagicMock()
msg.role = role
msg.content = content
msg.round = round_
msg.audio_url = None
msg.evaluation = evaluation
return msg
def _stub_evaluation(content_feedback=None, status="completed"):
ev = MagicMock()
ev.status = status
ev.accuracy_score = 80
ev.fluency_score = 80
ev.completeness_score = 80
ev.prosody_score = 80
ev.word_analysis = None
ev.content_feedback = content_feedback
return ev
def _build_report_entry(msg) -> dict:
"""Replicates the entry construction in DialogueService.get_report.
We only exercise the dict-shaping step in isolation — the full get_report
path hits DB/LLM summary and is not needed for this contract check.
"""
entry = {
"round": msg.round,
"role": msg.role,
"content": msg.content,
"audioUrl": msg.audio_url,
}
if msg.role == "student" and msg.evaluation:
ev = msg.evaluation
entry["evaluation"] = {
"status": ev.status,
"accuracyScore": ev.accuracy_score,
"fluencyScore": ev.fluency_score,
"completenessScore": ev.completeness_score,
"prosodyScore": ev.prosody_score,
"wordAnalysis": ev.word_analysis,
"contentFeedback": ev.content_feedback,
}
return entry
def test_report_entry_includes_content_feedback_when_present() -> None:
feedback = {"highlights": ["good"], "corrections": [], "suggestions": []}
ev = _stub_evaluation(content_feedback=feedback)
msg = _stub_message("student", "hi", 1, evaluation=ev)
entry = _build_report_entry(msg)
assert entry["evaluation"]["contentFeedback"] == feedback
def test_report_entry_content_feedback_is_null_when_absent() -> None:
ev = _stub_evaluation(content_feedback=None)
msg = _stub_message("student", "hi", 1, evaluation=ev)
entry = _build_report_entry(msg)
assert entry["evaluation"]["contentFeedback"] is None
def test_ai_message_has_no_evaluation_key() -> None:
msg = _stub_message("ai", "hello", 1, evaluation=None)
entry = _build_report_entry(msg)
assert "evaluation" not in entry
这里我们测试的是 entry-shaping 的契约(独立 helper)。真实 get_report 里我们要修改同样的 entry 构造块保持一致。
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/test_dialogue_service_report.py -v
Expected: 3 passed. 这一步验证的是契约,下一步把它应用到真实代码。
get_report 的 entry 构造块打开 cococlass-english-speaking-api/app/service/speaking/dialogue_service.py,定位:
if msg.role == "student" and msg.evaluation:
ev = msg.evaluation
entry["evaluation"] = {
"status": ev.status,
"accuracyScore": ev.accuracy_score,
"fluencyScore": ev.fluency_score,
"completenessScore": ev.completeness_score,
"prosodyScore": ev.prosody_score,
"wordAnalysis": ev.word_analysis,
}
改为:
if msg.role == "student" and msg.evaluation:
ev = msg.evaluation
entry["evaluation"] = {
"status": ev.status,
"accuracyScore": ev.accuracy_score,
"fluencyScore": ev.fluency_score,
"completenessScore": ev.completeness_score,
"prosodyScore": ev.prosody_score,
"wordAnalysis": ev.word_analysis,
"contentFeedback": ev.content_feedback,
}
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest tests/service/speaking/ -v
Expected: 全部 pass(smoke 1 + evaluator 4 + content 3 + report 3 = 11 passed)。
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
git add app/service/speaking/dialogue_service.py tests/service/speaking/test_dialogue_service_report.py
git commit -m "feat(speaking): /report 返回每轮 contentFeedback"
contentFeedback 透传到 sentence.feedbackDetailedReport.vue 已经按 sentence.feedback.{highlights, corrections, suggestions} 渲染(PPT/src/views/Editor/EnglishSpeaking/preview/DetailedReport.vue:94-116),所以前端只需要在 getReport 响应转 OverallEvaluation 的地方加一个 field pass-through。
Files:
Modify: PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts
[ ] Step 1: 定位后端→前端形状转换位置
运行:
grep -n "rounds\|sentenceEvaluations\|evaluation" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts
后端 /report 返回 { sessionId, topic, status, rounds[], summary },前端 DialogueReport 期望 { evaluation: OverallEvaluation }(sentenceEvaluations[] 里每项的 feedback 字段)。当前 RealDialogueAPI.getReport()(llmService.ts:86-92)直接 return res.json(),不做形状转换。
这意味着:当前前端要么通过其他层做 shape adaption,要么 DetailedReport.vue 从别处拿数据。先跑一次 grep 找适配位置:
grep -rn "sentenceEvaluations\|rounds" /Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking --include="*.ts" --include="*.vue" | head -30
支路 A(理想情况):如果已经有一个 mapReportToEvaluation(backendRes) 之类的函数
feedback: round.evaluation?.contentFeedback ?? undefined支路 B(没有转换层):如果 getReport 的返回值直接裸传给组件
RealDialogueAPI.getReport() 里把 rounds[] 转成 OverallEvaluation.sentenceEvaluations[],其中每个 student 角色的轮次 emit 一个 SentenceEvaluation 带 feedback: r.evaluation?.contentFeedback ?? undefined支路 C(Mock API 已经长成 { evaluation: OverallEvaluation } 但真实后端没适配): 这是当前最可能的状态。此时必须在 RealDialogueAPI.getReport() 里写一个显式 adapter。按支路 B 实现。
RealDialogueAPI.getReport 里加 adapter(假设走支路 B/C)打开 PPT/src/views/Editor/EnglishSpeaking/services/llmService.ts。
把:
async getReport(sessionId: string): Promise<DialogueReport> {
const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
credentials: 'include',
})
if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
return res.json()
}
改为:
async getReport(sessionId: string): Promise<DialogueReport> {
const res = await fetch(`${API_BASE}/report?sessionId=${encodeURIComponent(sessionId)}`, {
credentials: 'include',
})
if (!res.ok) throw new Error(`getReport failed: ${res.status}`)
const raw = await res.json() as BackendReportResponse
return adaptReport(raw)
}
在 RealDialogueAPI 类定义之前加:
interface BackendEvaluation {
status: 'pending' | 'completed' | 'failed'
accuracyScore: number | null
fluencyScore: number | null
completenessScore: number | null
prosodyScore: number | null
wordAnalysis: unknown
contentFeedback: {
highlights: string[]
corrections: { original: string; corrected: string; explanation: string }[]
suggestions: string[]
} | null
}
interface BackendRound {
round: number
role: 'ai' | 'student'
content: string
audioUrl: string | null
evaluation?: BackendEvaluation
}
interface BackendReportResponse {
sessionId: string
topic: string
status: 'evaluating' | 'ready'
rounds: BackendRound[]
summary: string | null
}
function adaptReport(raw: BackendReportResponse): DialogueReport {
const sentenceEvaluations: SentenceEvaluation[] = raw.rounds.map((r, idx) => ({
id: `${raw.sessionId}-${idx}`,
round: r.round,
role: r.role,
content: r.content,
audioUrl: r.audioUrl ?? undefined,
pronunciation: r.evaluation && r.role === 'student'
? {
accuracy: r.evaluation.accuracyScore ?? 0,
fluency: r.evaluation.fluencyScore ?? 0,
// enspeak 原型用 intonation/stress 做 UI label;把 Azure 的 prosody/completeness 分别
// 映射到这两格(prosody → intonation 表示语调、completeness → stress 表示完整读出)。
// 这是一个 UI 贴合性决定,如未来 UI 统一改用 Azure 四维,再把 key 改回来。
intonation: r.evaluation.prosodyScore ?? 0,
stress: r.evaluation.completenessScore ?? 0,
}
: undefined,
feedback: r.evaluation?.contentFeedback ?? undefined,
}))
// overallScore 先用平均分作为 MVP 占位;其他字段留空/安全默认。
const studentEvals = sentenceEvaluations.filter(s => s.role === 'student' && s.pronunciation)
const avg = studentEvals.length > 0
? Math.round(
studentEvals.reduce(
(sum, s) => sum + (s.pronunciation!.accuracy + s.pronunciation!.fluency + s.pronunciation!.intonation + s.pronunciation!.stress) / 4,
0,
) / studentEvals.length,
)
: 0
return {
evaluation: {
overallScore: avg,
scoreLevel: avg >= 85 ? 'excellent' : avg >= 70 ? 'good' : avg >= 60 ? 'fair' : 'needsWork',
percentile: 0,
dimensions: { fluency: 0, interaction: 0, vocabulary: 0, grammar: 0 },
aiComment: raw.summary ?? '',
highlights: [],
improvements: [],
nextChallenge: {},
statistics: {
totalRounds: Math.max(...sentenceEvaluations.map(s => s.round), 0),
averageScore: avg,
highestScore: 0,
highestRound: 0,
grammarErrors: 0,
excellentExpressions: 0,
totalDuration: 0,
},
sentenceEvaluations,
},
}
}
然后在顶部 import 里追加 SentenceEvaluation:
import type {
DialogueAPI, DialogueReport, SessionConfig, SessionInfo, SSEEvent,
SentenceEvaluation,
} from '@/types/englishSpeaking'
(如果 SentenceEvaluation 未从 englishSpeaking.ts 导出,先去那个文件确认 export interface SentenceEvaluation 已加 export 关键字。)
注意:如果 Step 1 的 grep 显示已经有现成的 adapter 函数,以现有适配层为准——只在那里追加 feedback 字段、不新建 adapter。跳过这里的 adaptReport 整段代码,改为找到现有函数加一行 pass-through。
cd /Users/buoy/Development/gitrepo/PPT
npm run type-check
(如果项目用 pnpm / yarn,相应调整。若没有 type-check script,跑 npx vue-tsc --noEmit。)
Expected: 无 type error。
启动后端:
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run uvicorn app.main:app --reload
bash
cd /Users/buoy/Development/gitrepo/PPT
npm run dev
浏览器进入 EnglishSpeaking 组件,完成一轮对话。
打开结果页(DetailedReport),确认每轮学生句子下面能看到"亮点 / 改正 / 建议"三段内容。
同时在后端 DB 查:
SELECT round, status, accuracy_score, content_feedback
FROM pronunciation_evaluation
WHERE session_id = (SELECT id FROM dialogue_session ORDER BY id DESC LIMIT 1);
确认 content_feedback 是 {highlights, corrections, suggestions} 结构(或 NULL 如果 LLM 失败)。
任意一项不通过,回到对应 Task 定位 bug。
cd /Users/buoy/Development/gitrepo/PPT
git add src/views/Editor/EnglishSpeaking/services/llmService.ts
git commit -m "feat(english-speaking): 结果页透传 contentFeedback 到 SentenceCard"
cd /Users/buoy/Development/gitrepo/cococlass-english-speaking-api
uv run pytest -v
Expected: 所有 test 通过(包含本次新增的 smoke 1 + evaluator 4 + content-dispatch 3 + report 3 = 11 个)。
cd /Users/buoy/Development/gitrepo/PPT
npm run type-check
Expected: 无 type error。
echo "backend: $(git -C /Users/buoy/Development/gitrepo/cococlass-english-speaking-api rev-parse --short HEAD)"
echo "frontend: $(git -C /Users/buoy/Development/gitrepo/PPT rev-parse --short HEAD)"
把输出贴到本 plan 文件底部的"完成记录"栏。
7d192be (branch feat/content-evaluator, 基线 aa5e1a7,5 个 commit)
99e64fa Task 1 DB 列1492ebe Task 2 pytest 骨架dee45e6 Task 3 content_evaluator 模块a1f1b91 Task 4 接线 _evaluate_pronunciation7d192be Task 5 /report 返回 contentFeedback7c4d1a9 (branch feat/english-speaking, 基线 4523862,1 个 commit)
7c4d1a9 Task 6 结果页 contentFeedback 透传uv run pytest 11/11 passed;前端 vue-tsc --noEmit exit=0pytest_plugins)async_session 从方法内延迟 import 提升到模块顶层(为了让 monkeypatch.setattr 能打到;无循环依赖风险)TopicDiscussionPreview.vue 当前展示的是 mockEvaluation 硬编码数据,真实 getReport() 并未被 UI 消费。adapter 是结构性就位,真正"打开结果页看到 LLM 评语"需要后续把 UI 切换到走 DialogueAPI.getReport——该切换不在本 MVP 范围,留给后续任务.env 里配好 AZURE_SPEECH_KEY 和 ONEHUB_API_KEY_StubDB 断言说明、pron_scores TypedDict、adapter 错误容忍)均标记为后续迭代,未纳入本次 MVP本轮继续完成了"UI 切换到真实 getReport"(原遗留项),并对后端/前端分别跑了一次完整 code review。对话主流程尚未跑通到结果页,下次回到本 MVP 前先让对话链路能走完 N 轮进入 completed 态,再基于真实数据验证下列修复。
新增 commit:
d1186cb — DialogueChatView emit complete 携带 DialogueReport | null;TopicDiscussionPreview 用 displayEvaluation 优先真实数据、mock 作为 fallbackd1186cb;后端 HEAD 未变(仍 7d192be)[BACKEND CRITICAL] /speak-stream WebSocket 路径完全绕过 ContentEvaluator
app/api/dialogue.py:159-184 的 _background_evaluate_pronunciation 只跑 Azure,从未调用 content evaluatoruseDialogueEngine.ts:256+ beginStudentStream),HTTP /speak 只是 fallbackcontent_feedback 永远 NULL/speak-stream 统一走 DialogueService._evaluate_pronunciation,或把 evaluator 调用复制进去[FRONTEND CRITICAL] getReport 轮询不识别 status === 'evaluating'
useDialogueEngine.ts:190-202 的 poll 只对 reject 重试;后端 200 返 status='evaluating' 且部分 round contentFeedback=null 时直接 resolveBackendReportResponse.status 类型已声明却从未被读取status === 'evaluating' 视为"未完成"继续 poll[FRONTEND IMPORTANT] getReport 失败悄悄回落到 mockEvaluation(I2)
fetchReportSafe → null → displayEvaluation → mockEvaluation(熊猫/竹子那份假数据)[FRONTEND IMPORTANT] "结束并查看报告" 阻塞最多 30s(I3)
handleExitConfirm 里 await fetchReportSafe() 串在 modal 关闭后,期间 chat view 冻结completed stage 显示 loading,getReport 后台拉[FRONTEND IMPORTANT] pending/failed 轮次 0 分污染 overallScore(I5)
llmService.ts:90-97 用 ?? 0 把未完成轮次填 0,.filter(s.pronunciation) 仍会保留status === 'completed' 且 score 非 null 才填 pronunciation[FRONTEND IMPORTANT] Axis 映射语义不对(I4)
prosody → intonation、completeness → stress:completeness 是"读完整度",stress 是"重音"SentenceEvaluation.pronunciation 为 Azure 四维DialogueService.__init__ 每请求 new AsyncOpenAI(get_dialogue_service 在 Depends 里),改 module-level 单例prior_ai_turn 依赖"学生消息已 flush 但 AI 未写入"的时序脆弱,加 role='ai' AND round < current_round 显式过滤pron_scores: dict 缺 TypedDicttest_content_evaluator 缺 prompt payload 断言;test_dialogue_service_report 契约镜像嫌疑Math.max(...arr) 栈风险、dimensions/statistics/aiComment 占位零未标 TODO、id 方案不统一、status 类型不含 'evaluating'