Browse Source

docs: add report pipeline logging design

jimmylee 2 weeks ago
parent
commit
0ec8d662cb

+ 119 - 0
docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md

@@ -247,6 +247,124 @@ If `status !== "ready"`, the frontend should show a report-generating state and
 
 The detailed report may be shown partially only if the UI explicitly distinguishes partial data from a completed report. The first implementation should prefer a single loading state until `ready` to avoid showing misleading aggregate results.
 
+## Observability and Stall Diagnosis
+
+The first implementation should use structured logs to diagnose where report generation is stuck. It should not add database job tables or persistent stage columns yet.
+
+Every report-related log entry should include these fields when available:
+
+```text
+trace_id
+session_id
+round
+message_id
+evaluation_id
+stage
+event
+duration_ms
+status
+attempt
+error_code
+error_type
+error_message
+```
+
+`trace_id` should default to the session UUID so one command can retrieve the whole conversation/report chain.
+
+`event` should use this fixed set:
+
+```text
+start | success | failed | timeout | skipped
+```
+
+`stage` should use fixed names rather than free-form strings:
+
+```text
+speak_stream.start_received
+speak_stream.session_loaded
+speak_stream.asr_started
+speak_stream.asr_completed
+speak_stream.student_message_saved
+speak_stream.ai_reply_started
+speak_stream.ai_reply_completed
+speak_stream.conversation_committed
+speak_stream.background_eval_started
+
+sentence_eval.started
+sentence_eval.azure_started
+sentence_eval.azure_completed
+sentence_eval.feedback_started
+sentence_eval.feedback_completed
+sentence_eval.committed
+sentence_eval.failed
+
+report.requested
+report.evaluations_checked
+report.waiting_sentence_evals
+report.overall_started
+report.overall_completed
+report.returned
+report.failed
+```
+
+Each external provider call should log a `start` event before the call and a `success`, `failed`, or `timeout` event after the call, with `duration_ms`.
+
+Examples:
+
+```json
+{
+  "trace_id": "session-uuid",
+  "session_id": "session-uuid",
+  "round": 2,
+  "evaluation_id": 456,
+  "stage": "sentence_eval.feedback_started",
+  "event": "start",
+  "attempt": 1
+}
+```
+
+```json
+{
+  "trace_id": "session-uuid",
+  "session_id": "session-uuid",
+  "round": 2,
+  "evaluation_id": 456,
+  "stage": "sentence_eval.feedback_completed",
+  "event": "success",
+  "duration_ms": 842,
+  "status": "completed",
+  "attempt": 1
+}
+```
+
+```json
+{
+  "trace_id": "session-uuid",
+  "session_id": "session-uuid",
+  "round": 2,
+  "evaluation_id": 456,
+  "stage": "sentence_eval.azure_completed",
+  "event": "timeout",
+  "duration_ms": 10000,
+  "error_code": "AZURE_TIMEOUT",
+  "error_type": "TimeoutError",
+  "error_message": "Azure pronunciation assessment timed out",
+  "attempt": 1
+}
+```
+
+With this logging shape, debugging should be possible with simple filters:
+
+```bash
+rg 'session_id=session-uuid' logs/app.log
+rg 'session_id=session-uuid.*event=(failed|timeout)' logs/app.log
+rg 'session_id=session-uuid.*stage=report.waiting_sentence_evals' logs/app.log
+```
+
+If structured logs are not JSON-formatted in the current runtime, the same fields should still be emitted as stable `key=value` pairs.
+
+If these logs are not enough to diagnose production stalls, a later implementation can add persistent `stage`, `stage_updated_at`, `attempt`, and `last_error` fields or a dedicated report job table.
+
 ## Error Handling
 
 Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
@@ -266,6 +384,7 @@ Backend tests should cover:
 - Azure failure skips feedback generation
 - `/report` does not generate overall data while any evaluation is pending
 - `/report` generates or returns cached overall data once all sentence evaluations are complete
+- report pipeline logs include `session_id`, `stage`, `event`, and `duration_ms` for provider calls
 
 Frontend tests or focused manual checks should cover: