2 weeks ago · 0ec8d662cb
--- a/docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md
+++ b/docs/superpowers/specs/2026-04-25-english-speaking-report-pipeline-design.md
@@ -247,6 +247,124 @@ If `status !== "ready"`, the frontend should show a report-generating state and
 
				 
			
 
				 The detailed report may be shown partially only if the UI explicitly distinguishes partial data from a completed report. The first implementation should prefer a single loading state until `ready` to avoid showing misleading aggregate results.
			
 
				 
			
 
				+## Observability and Stall Diagnosis
			
 
				+
			
 
				+The first implementation should use structured logs to diagnose where report generation is stuck. It should not add database job tables or persistent stage columns yet.
			
 
				+
			
 
				+Every report-related log entry should include these fields when available:
			
 
				+
			
 
				+```text
			
 
				+trace_id
			
 
				+session_id
			
 
				+round
			
 
				+message_id
			
 
				+evaluation_id
			
 
				+stage
			
 
				+event
			
 
				+duration_ms
			
 
				+status
			
 
				+attempt
			
 
				+error_code
			
 
				+error_type
			
 
				+error_message
			
 
				+```
			
 
				+
			
 
				+`trace_id` should default to the session UUID so one command can retrieve the whole conversation/report chain.
			
 
				+
			
 
				+`event` should use this fixed set:
			
 
				+
			
 
				+```text
			
 
				+start | success | failed | timeout | skipped
			
 
				+```
			
 
				+
			
 
				+`stage` should use fixed names rather than free-form strings:
			
 
				+
			
 
				+```text
			
 
				+speak_stream.start_received
			
 
				+speak_stream.session_loaded
			
 
				+speak_stream.asr_started
			
 
				+speak_stream.asr_completed
			
 
				+speak_stream.student_message_saved
			
 
				+speak_stream.ai_reply_started
			
 
				+speak_stream.ai_reply_completed
			
 
				+speak_stream.conversation_committed
			
 
				+speak_stream.background_eval_started
			
 
				+
			
 
				+sentence_eval.started
			
 
				+sentence_eval.azure_started
			
 
				+sentence_eval.azure_completed
			
 
				+sentence_eval.feedback_started
			
 
				+sentence_eval.feedback_completed
			
 
				+sentence_eval.committed
			
 
				+sentence_eval.failed
			
 
				+
			
 
				+report.requested
			
 
				+report.evaluations_checked
			
 
				+report.waiting_sentence_evals
			
 
				+report.overall_started
			
 
				+report.overall_completed
			
 
				+report.returned
			
 
				+report.failed
			
 
				+```
			
 
				+
			
 
				+Each external provider call should log a `start` event before the call and a `success`, `failed`, or `timeout` event after the call, with `duration_ms`.
			
 
				+
			
 
				+Examples:
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "trace_id": "session-uuid",
			
 
				+  "session_id": "session-uuid",
			
 
				+  "round": 2,
			
 
				+  "evaluation_id": 456,
			
 
				+  "stage": "sentence_eval.feedback_started",
			
 
				+  "event": "start",
			
 
				+  "attempt": 1
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "trace_id": "session-uuid",
			
 
				+  "session_id": "session-uuid",
			
 
				+  "round": 2,
			
 
				+  "evaluation_id": 456,
			
 
				+  "stage": "sentence_eval.feedback_completed",
			
 
				+  "event": "success",
			
 
				+  "duration_ms": 842,
			
 
				+  "status": "completed",
			
 
				+  "attempt": 1
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+  "trace_id": "session-uuid",
			
 
				+  "session_id": "session-uuid",
			
 
				+  "round": 2,
			
 
				+  "evaluation_id": 456,
			
 
				+  "stage": "sentence_eval.azure_completed",
			
 
				+  "event": "timeout",
			
 
				+  "duration_ms": 10000,
			
 
				+  "error_code": "AZURE_TIMEOUT",
			
 
				+  "error_type": "TimeoutError",
			
 
				+  "error_message": "Azure pronunciation assessment timed out",
			
 
				+  "attempt": 1
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+With this logging shape, debugging should be possible with simple filters:
			
 
				+
			
 
				+```bash
			
 
				+rg 'session_id=session-uuid' logs/app.log
			
 
				+rg 'session_id=session-uuid.*event=(failed|timeout)' logs/app.log
			
 
				+rg 'session_id=session-uuid.*stage=report.waiting_sentence_evals' logs/app.log
			
 
				+```
			
 
				+
			
 
				+If structured logs are not JSON-formatted in the current runtime, the same fields should still be emitted as stable `key=value` pairs.
			
 
				+
			
 
				+If these logs are not enough to diagnose production stalls, a later implementation can add persistent `stage`, `stage_updated_at`, `attempt`, and `last_error` fields or a dedicated report job table.
			
 
				+
			
 
				 ## Error Handling
			
 
				 
			
 
				 Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
			
@@ -266,6 +384,7 @@ Backend tests should cover:
 
				 - Azure failure skips feedback generation
			
 
				 - `/report` does not generate overall data while any evaluation is pending
			
 
				 - `/report` generates or returns cached overall data once all sentence evaluations are complete
			
 
				+- report pipeline logs include `session_id`, `stage`, `event`, and `duration_ms` for provider calls
			
 
				 
			
 
				 Frontend tests or focused manual checks should cover: