|
@@ -247,6 +247,124 @@ If `status !== "ready"`, the frontend should show a report-generating state and
|
|
|
|
|
|
|
|
The detailed report may be shown partially only if the UI explicitly distinguishes partial data from a completed report. The first implementation should prefer a single loading state until `ready` to avoid showing misleading aggregate results.
|
|
The detailed report may be shown partially only if the UI explicitly distinguishes partial data from a completed report. The first implementation should prefer a single loading state until `ready` to avoid showing misleading aggregate results.
|
|
|
|
|
|
|
|
|
|
+## Observability and Stall Diagnosis
|
|
|
|
|
+
|
|
|
|
|
+The first implementation should use structured logs to diagnose where report generation is stuck. It should not add database job tables or persistent stage columns yet.
|
|
|
|
|
+
|
|
|
|
|
+Every report-related log entry should include these fields when available:
|
|
|
|
|
+
|
|
|
|
|
+```text
|
|
|
|
|
+trace_id
|
|
|
|
|
+session_id
|
|
|
|
|
+round
|
|
|
|
|
+message_id
|
|
|
|
|
+evaluation_id
|
|
|
|
|
+stage
|
|
|
|
|
+event
|
|
|
|
|
+duration_ms
|
|
|
|
|
+status
|
|
|
|
|
+attempt
|
|
|
|
|
+error_code
|
|
|
|
|
+error_type
|
|
|
|
|
+error_message
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+`trace_id` should default to the session UUID so one command can retrieve the whole conversation/report chain.
|
|
|
|
|
+
|
|
|
|
|
+`event` should use this fixed set:
|
|
|
|
|
+
|
|
|
|
|
+```text
|
|
|
|
|
+start | success | failed | timeout | skipped
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+`stage` should use fixed names rather than free-form strings:
|
|
|
|
|
+
|
|
|
|
|
+```text
|
|
|
|
|
+speak_stream.start_received
|
|
|
|
|
+speak_stream.session_loaded
|
|
|
|
|
+speak_stream.asr_started
|
|
|
|
|
+speak_stream.asr_completed
|
|
|
|
|
+speak_stream.student_message_saved
|
|
|
|
|
+speak_stream.ai_reply_started
|
|
|
|
|
+speak_stream.ai_reply_completed
|
|
|
|
|
+speak_stream.conversation_committed
|
|
|
|
|
+speak_stream.background_eval_started
|
|
|
|
|
+
|
|
|
|
|
+sentence_eval.started
|
|
|
|
|
+sentence_eval.azure_started
|
|
|
|
|
+sentence_eval.azure_completed
|
|
|
|
|
+sentence_eval.feedback_started
|
|
|
|
|
+sentence_eval.feedback_completed
|
|
|
|
|
+sentence_eval.committed
|
|
|
|
|
+sentence_eval.failed
|
|
|
|
|
+
|
|
|
|
|
+report.requested
|
|
|
|
|
+report.evaluations_checked
|
|
|
|
|
+report.waiting_sentence_evals
|
|
|
|
|
+report.overall_started
|
|
|
|
|
+report.overall_completed
|
|
|
|
|
+report.returned
|
|
|
|
|
+report.failed
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Each external provider call should log a `start` event before the call and a `success`, `failed`, or `timeout` event after the call, with `duration_ms`.
|
|
|
|
|
+
|
|
|
|
|
+Examples:
|
|
|
|
|
+
|
|
|
|
|
+```json
|
|
|
|
|
+{
|
|
|
|
|
+ "trace_id": "session-uuid",
|
|
|
|
|
+ "session_id": "session-uuid",
|
|
|
|
|
+ "round": 2,
|
|
|
|
|
+ "evaluation_id": 456,
|
|
|
|
|
+ "stage": "sentence_eval.feedback_started",
|
|
|
|
|
+ "event": "start",
|
|
|
|
|
+ "attempt": 1
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+```json
|
|
|
|
|
+{
|
|
|
|
|
+ "trace_id": "session-uuid",
|
|
|
|
|
+ "session_id": "session-uuid",
|
|
|
|
|
+ "round": 2,
|
|
|
|
|
+ "evaluation_id": 456,
|
|
|
|
|
+ "stage": "sentence_eval.feedback_completed",
|
|
|
|
|
+ "event": "success",
|
|
|
|
|
+ "duration_ms": 842,
|
|
|
|
|
+ "status": "completed",
|
|
|
|
|
+ "attempt": 1
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+```json
|
|
|
|
|
+{
|
|
|
|
|
+ "trace_id": "session-uuid",
|
|
|
|
|
+ "session_id": "session-uuid",
|
|
|
|
|
+ "round": 2,
|
|
|
|
|
+ "evaluation_id": 456,
|
|
|
|
|
+ "stage": "sentence_eval.azure_completed",
|
|
|
|
|
+ "event": "timeout",
|
|
|
|
|
+ "duration_ms": 10000,
|
|
|
|
|
+ "error_code": "AZURE_TIMEOUT",
|
|
|
|
|
+ "error_type": "TimeoutError",
|
|
|
|
|
+ "error_message": "Azure pronunciation assessment timed out",
|
|
|
|
|
+ "attempt": 1
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+With this logging shape, debugging should be possible with simple filters:
|
|
|
|
|
+
|
|
|
|
|
+```bash
|
|
|
|
|
+rg 'session_id=session-uuid' logs/app.log
|
|
|
|
|
+rg 'session_id=session-uuid.*event=(failed|timeout)' logs/app.log
|
|
|
|
|
+rg 'session_id=session-uuid.*stage=report.waiting_sentence_evals' logs/app.log
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+If structured logs are not JSON-formatted in the current runtime, the same fields should still be emitted as stable `key=value` pairs.
|
|
|
|
|
+
|
|
|
|
|
+If these logs are not enough to diagnose production stalls, a later implementation can add persistent `stage`, `stage_updated_at`, `attempt`, and `last_error` fields or a dedicated report job table.
|
|
|
|
|
+
|
|
|
## Error Handling
|
|
## Error Handling
|
|
|
|
|
|
|
|
Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
|
|
Sentence feedback generation is downstream of Azure scoring. If Azure fails, skip sentence feedback.
|
|
@@ -266,6 +384,7 @@ Backend tests should cover:
|
|
|
- Azure failure skips feedback generation
|
|
- Azure failure skips feedback generation
|
|
|
- `/report` does not generate overall data while any evaluation is pending
|
|
- `/report` does not generate overall data while any evaluation is pending
|
|
|
- `/report` generates or returns cached overall data once all sentence evaluations are complete
|
|
- `/report` generates or returns cached overall data once all sentence evaluations are complete
|
|
|
|
|
+- report pipeline logs include `session_id`, `stage`, `event`, and `duration_ms` for provider calls
|
|
|
|
|
|
|
|
Frontend tests or focused manual checks should cover:
|
|
Frontend tests or focused manual checks should cover:
|
|
|
|
|
|