2 tygodni temu · c13aebc278
--- a/docs/superpowers/specs/2026-04-26-azure-tts-audio-player-design.md
+++ b/docs/superpowers/specs/2026-04-26-azure-tts-audio-player-design.md
@@ -0,0 +1,426 @@
 
															+# Azure TTS + Unified Audio Player Design
														
 
															+
														
 
															+**Date:** 2026-04-26
														
 
															+**Scope:** `src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue` and its composables/services
														
 
															+**Status:** Approved for implementation planning
														
 
															+
														
 
															+## Context
														
 
															+
														
 
															+`DialogueChatView.vue` currently has three audio playback triggers:
														
 
															+
														
 
															+1. **Auto-play AI replies** — `useDialogueEngine.speakTTS()` runs after every AI `done` event (greeting, `sendStudentMessage`, WebSocket stream, `regenerateAiMessage`), using the browser-native `SpeechSynthesisUtterance` API.
														
 
															+2. **Click-replay AI** — `togglePlay()` in the view creates a separate `SpeechSynthesisUtterance` for the same message.
														
 
															+3. **Click-replay student** — `togglePlay()` plays the message's stored `audioBlob` via `HTMLAudioElement`.
														
 
															+
														
 
															+The three triggers do not coordinate. Two owners (engine + view) both call `speechSynthesis.cancel()`, the click-replay path does not know about ongoing auto-play, and starting a new recording does not interrupt currently-playing audio. As a result:
														
 
															+
														
 
															+- Clicking the play button on a message that is currently being auto-played does not toggle to "stop" — the visual play/pause state is wrong because `playingMessageId` (view) is unaware of `ttsUtterance` (engine).
														
 
															+- Starting a recording while AI audio plays causes the AI voice to leak into the microphone.
														
 
															+- Student-recording playback can overlap with auto-TTS for a freshly arrived AI message.
														
 
															+
														
 
															+We are also switching the AI voice from browser-native TTS to **Azure Speech REST** for higher-quality output. This is the right moment to refactor.
														
 
															+
														
 
															+## Goals
														
 
															+
														
 
															+1. Introduce a single, view-level audio playback owner (`useAudioPlayer` composable).
														
 
															+2. Replace browser TTS with Azure Speech REST synthesis for AI messages.
														
 
															+3. Auto-synthesize and auto-play each AI message after streaming completes.
														
 
															+4. Support click-replay for both AI messages (cached synthesis) and student messages (existing blob).
														
 
															+5. Enforce three rules **structurally** (i.e., not by discipline):
														
 
															+   - Single playback channel (new playback interrupts any prior one).
														
 
															+   - Recording start interrupts current playback.
														
 
															+   - Click on the currently-playing message stops it.
														
 
															+6. Surface synthesis / playback failures in a unified per-message error state with one-click retry.
														
 
															+7. Decouple `useDialogueEngine` from audio entirely — the engine becomes pure dialogue state.
														
 
															+
														
 
															+## Non-goals
														
 
															+
														
 
															+- **Streaming synthesis** (synthesizing token-by-token while the model streams). Out of scope; full-text synthesis after `done` is acceptable.
														
 
															+- **Multiple voices / voice configuration UI**. Hard-code `en-US-AriaNeural` for now.
														
 
															+- **Cross-view audio coordination** (e.g., report screen also plays audio). The player is view-level; if the report screen later needs playback it can instantiate its own.
														
 
															+- **Backend Azure token endpoint** (planned but deferred — see Security Debt below).
														
 
															+- **SDK-based synthesis**. Use REST only; the `microsoft-cognitiveservices-speech-sdk` package is not introduced.
														
 
															+
														
 
															+## Non-trivial decisions
														
 
															+
														
 
															+### D1. Player ownership: view-level, not module singleton
														
 
															+
														
 
															+The player is created inside `DialogueChatView.setup()` via `useAudioPlayer()` and torn down on view unmount. Reasoning: today all three triggers live in this one view; a module singleton would add a global lifecycle hazard (ensure `stop()` on every navigation) without solving any current problem. If a future view needs playback, it will instantiate its own player; cross-view coordination is a separate, future problem.
														
 
															+
														
 
															+### D2. Auto-play trigger lives in the view, not the engine
														
 
															+
														
 
															+The engine no longer touches `speechSynthesis`. Instead, the view runs a `watch()` on `engine.messages` and triggers `player.play(...)` when an AI message transitions to `status === 'done'`. Reasoning: "auto-play after a message completes" is a presentation concern. Keeping it in the view means engine has zero audio dependency, and any future toggle ("disable auto-play") is a view-only change.
														
 
															+
														
 
															+The watcher uses a `Set<string>` of already-auto-played message IDs to avoid re-playing on unrelated re-renders.
														
 
															+
														
 
															+### D3. Synthesis is one round-trip per message, cached by message ID
														
 
															+
														
 
															+Each AI message text → one Azure REST call → MP3 blob → cached in a `Map<messageId, Blob>` inside the player. Replays hit the cache (no second call). Cache lives for the player's lifetime; on unmount, all cached object URLs are revoked and the map is cleared. Student-recording blobs are **not** added to this cache — the message itself owns the blob.
														
 
															+
														
 
															+### D4. Errors are surfaced per message, not globally
														
 
															+
														
 
															+The player exposes `errorId: Ref<string | null>`. The play button on the affected message renders an error variant (warning icon + "点击重试" text). Clicking it retries by calling `player.play(id, source)` again. Reasoning:
														
 
															+
														
 
															+- The failure scope is one message's playback, not a system state.
														
 
															+- Locating the error at the play button keeps retry intuitive — the same affordance that "starts" playback also "retries".
														
 
															+- Avoids introducing a new global toast/banner component.
														
 
															+
														
 
															+All failure paths (Azure network/5xx, `audio.play()` rejection from autoplay policy, decoder errors) collapse to the same UI: warning icon + "播放失败,点击重试". We do not differentiate causes; the user action is identical.
														
 
															+
														
 
															+### D5. WebSocket stream path: attach `audioBlob` to student message
														
 
															+
														
 
															+The `beginStudentStream` path (`useDialogueEngine.ts`) does not currently attach the recorded blob to the student message it pushes. As a result, the student's "replay recording" button is silent in WS mode. We fix this in the same change: `handleFinishRecording` will attach the blob to the in-flight student message via a new `engine.attachStudentBlob(studentMsgId, blob)` helper. Without this fix, student replay is broken in production (WS is the default path).
														
 
															+
														
 
															+### D6. Three player ref states drive button rendering
														
 
															+
														
 
															+The play button has four visual states, derived from three player refs:
														
 
															+
														
 
															+| State    | Condition                                         | Render                       |
														
 
															+|----------|---------------------------------------------------|------------------------------|
														
 
															+| idle     | none of the below                                 | ▶ play icon                  |
														
 
															+| loading  | `player.loadingId.value === message.id`           | spinner                      |
														
 
															+| playing  | `player.playingId.value === message.id`           | ⏸ pause icon                 |
														
 
															+| error    | `player.errorId.value === message.id`             | ⚠ warning icon, red border, "点击重试" |
														
 
															+
														
 
															+`loadingId` is non-null between `play()` invocation and either the audio's `onplaying` event (success path) or the catch block (failure path). It is needed because the synthesis round-trip is observable (~500ms-2s) and the user must see something happen.
														
 
															+
														
 
															+## Architecture
														
 
															+
														
 
															+### File layout
														
 
															+
														
 
															+| File                                                              | Operation | Responsibility                                                          |
														
 
															+|-------------------------------------------------------------------|-----------|-------------------------------------------------------------------------|
														
 
															+| `src/views/Editor/EnglishSpeaking/composables/useAudioPlayer.ts`  | **NEW**   | Sole audio owner. Single-channel rule, TTS cache, error surface.        |
														
 
															+| `src/views/Editor/EnglishSpeaking/services/speechService.ts`      | **NEW**   | Azure Speech REST synthesis. Stateless `synthesize(text, signal)`.      |
														
 
															+| `src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts` | **EDIT**  | Remove `speakTTS`, `cancelTTS`, `ttsUtterance`. Add `attachStudentBlob`. |
														
 
															+| `src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue`   | **EDIT**  | Use `useAudioPlayer`. Add auto-play watcher. Wire stop into recording start. Render new button states. |
														
 
															+| `.env.example`                                                    | **NEW or APPEND** | Document `VITE_AZURE_SPEECH_KEY` and `VITE_AZURE_SPEECH_REGION`. |
														
 
															+
														
 
															+### `useAudioPlayer` API
														
 
															+
														
 
															+```ts
														
 
															+function useAudioPlayer(): {
														
 
															+  playingId: Readonly<Ref<string | null>>
														
 
															+  loadingId: Readonly<Ref<string | null>>
														
 
															+  errorId:   Readonly<Ref<string | null>>
														
 
															+
														
 
															+  play(id: string, source: PlaySource): Promise<void>
														
 
															+  stop(): void
														
 
															+}
														
 
															+
														
 
															+type PlaySource =
														
 
															+  | { kind: 'tts';  text: string }
														
 
															+  | { kind: 'blob'; blob: Blob }
														
 
															+```
														
 
															+
														
 
															+**Contract:**
														
 
															+
														
 
															+- `play(id, source)` is the single playback entry point.
														
 
															+  - Clears any prior `errorId` (a fresh attempt — error is stale).
														
 
															+  - Calls internal `stop()` to interrupt the current playback.
														
 
															+  - Sets `loadingId = id`.
														
 
															+  - For `kind: 'tts'`: hits cache or calls `synthesize(text)`.
														
 
															+  - Constructs `Audio(URL.createObjectURL(blob))`.
														
 
															+  - On `audio.onplaying`: `loadingId = null; playingId = id`.
														
 
															+  - On `audio.onended`: `playingId = null` (if still us). No error.
														
 
															+  - On `audio.onerror` (mid-play decoder failure), `audio.play()` rejection, synthesis throw: `loadingId = null; playingId = null; errorId = id`.
														
 
															+- `stop()` aborts pending synthesis (`AbortController.abort()`), pauses current `HTMLAudioElement`, clears `playingId` and `loadingId`. Does **not** clear `errorId` (errors are sticky until a new `play()` for that id, or the user navigates away).
														
 
															+- `onUnmounted`: `stop()`, revoke all cached URLs, clear cache map.
														
 
															+
														
 
															+### Single-channel rule (structural enforcement)
														
 
															+
														
 
															+The composable holds at most one of each:
														
 
															+- one `currentAudio: HTMLAudioElement | null`
														
 
															+- one `synthAbort: AbortController | null`
														
 
															+- one `playingId` value
														
 
															+
														
 
															+`play()` always begins by calling `stop()`, so by construction there can never be two active audio elements or two in-flight syntheses simultaneously. The view does not need to "remember to cancel" anything; the rule is impossible to violate from outside the composable.
														
 
															+
														
 
															+### `speechService.ts`
														
 
															+
														
 
															+```ts
														
 
															+const KEY = import.meta.env.VITE_AZURE_SPEECH_KEY as string
														
 
															+const REGION = import.meta.env.VITE_AZURE_SPEECH_REGION as string
														
 
															+const VOICE = 'en-US-AriaNeural'
														
 
															+const FORMAT = 'audio-24khz-48kbitrate-mono-mp3'
														
 
															+
														
 
															+export async function synthesize(text: string, signal?: AbortSignal): Promise<Blob> {
														
 
															+  if (!KEY || !REGION) throw new Error('Azure Speech credentials not configured')
														
 
															+
														
 
															+  const ssml =
														
 
															+    `<speak version='1.0' xml:lang='en-US'>` +
														
 
															+    `<voice name='${VOICE}'>${escapeXml(text)}</voice>` +
														
 
															+    `</speak>`
														
 
															+
														
 
															+  const res = await fetch(
														
 
															+    `https://${REGION}.tts.speech.microsoft.com/cognitiveservices/v1`,
														
 
															+    {
														
 
															+      method: 'POST',
														
 
															+      signal,
														
 
															+      headers: {
														
 
															+        'Ocp-Apim-Subscription-Key': KEY,
														
 
															+        'Content-Type': 'application/ssml+xml',
														
 
															+        'X-Microsoft-OutputFormat': FORMAT,
														
 
															+        'User-Agent': 'PPT-EnglishSpeaking',
														
 
															+      },
														
 
															+      body: ssml,
														
 
															+    },
														
 
															+  )
														
 
															+  if (!res.ok) throw new Error(`Azure TTS failed: ${res.status}`)
														
 
															+  return res.blob()
														
 
															+}
														
 
															+
														
 
															+function escapeXml(s: string): string {
														
 
															+  return s.replace(/[<>&'"]/g, c => ({
														
 
															+    '<': '&lt;', '>': '&gt;', '&': '&amp;', "'": '&apos;', '"': '&quot;',
														
 
															+  }[c]!))
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+The service is stateless — no token cache, no retry logic. Each call is a single fetch. The player owns the cache.
														
 
															+
														
 
															+### View integration
														
 
															+
														
 
															+**Auto-play watcher:**
														
 
															+
														
 
															+```ts
														
 
															+const player = useAudioPlayer()
														
 
															+const autoPlayedIds = new Set<string>()
														
 
															+
														
 
															+watch(
														
 
															+  () => engine.messages.value.map(m => `${m.id}:${m.status}`).join('|'),
														
 
															+  () => {
														
 
															+    for (const m of engine.messages.value) {
														
 
															+      if (m.role === 'ai' && m.status === 'done' && m.content && !autoPlayedIds.has(m.id)) {
														
 
															+        autoPlayedIds.add(m.id)
														
 
															+        player.play(m.id, { kind: 'tts', text: m.content })
														
 
															+      }
														
 
															+    }
														
 
															+  },
														
 
															+)
														
 
															+```
														
 
															+
														
 
															+The `autoPlayedIds` set is scoped to the component instance, so it resets if the view is remounted. Re-generating an AI message creates a new `id`, so it correctly auto-plays again.
														
 
															+
														
 
															+**Click toggle:**
														
 
															+
														
 
															+```ts
														
 
															+function togglePlay(id: string) {
														
 
															+  if (player.playingId.value === id || player.loadingId.value === id) {
														
 
															+    player.stop()
														
 
															+    return
														
 
															+  }
														
 
															+  const msg = engine.messages.value.find(m => m.id === id)
														
 
															+  if (!msg) return
														
 
															+  if (msg.role === 'student' && msg.audioBlob) {
														
 
															+    player.play(id, { kind: 'blob', blob: msg.audioBlob })
														
 
															+  } else if (msg.role === 'ai' && msg.content) {
														
 
															+    player.play(id, { kind: 'tts', text: msg.content })
														
 
															+  }
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+`errorId === id` falls through to the `play()` branch, naturally retrying.
														
 
															+
														
 
															+**Recording start interrupt:**
														
 
															+
														
 
															+```ts
														
 
															+async function handleStartRecording() {
														
 
															+  if (!engine.canRecord.value || recorder.isRecording.value) return
														
 
															+  player.stop()  // ← new
														
 
															+  // ...rest unchanged
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+`handleRestart` similarly calls `player.stop()` (replacing `engine.cancelTTS()`).
														
 
															+
														
 
															+**Template state derivation:**
														
 
															+
														
 
															+Each play button reads `player.playingId.value` / `player.loadingId.value` / `player.errorId.value` directly. No local `playingMessageId` ref.
														
 
															+
														
 
															+**Error UI (per voice-bar):**
														
 
															+
														
 
															+When `player.errorId.value === message.id`:
														
 
															+- Play button gets `play-btn-error` modifier class (red border, warning icon).
														
 
															+- A `<span class="play-error-hint">点击重试</span>` replaces the duration label.
														
 
															+
														
 
															+### Engine changes
														
 
															+
														
 
															+Removed:
														
 
															+- `let ttsUtterance: SpeechSynthesisUtterance | null = null` (line 16)
														
 
															+- `function speakTTS(text)` (lines 244-252)
														
 
															+- `function cancelTTS()` (lines 254-259)
														
 
															+- All four `speakTTS(...)` call sites (lines 56, 124, 201, 369)
														
 
															+- `cancelTTS()` call in `onUnmounted` (line 429)
														
 
															+- `cancelTTS` from the returned object (line 454)
														
 
															+
														
 
															+Added:
														
 
															+
														
 
															+```ts
														
 
															+function attachStudentBlob(messageId: string, blob: Blob) {
														
 
															+  const msg = messages.value.find(m => m.id === messageId)
														
 
															+  if (msg && msg.role === 'student') msg.audioBlob = blob
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+Returned alongside other helpers.
														
 
															+
														
 
															+### View changes (audioBlob fix in WS path)
														
 
															+
														
 
															+```ts
														
 
															+async function handleFinishRecording() {
														
 
															+  if (!recorder.isRecording.value) return
														
 
															+  const ctl = streamCtl
														
 
															+  streamCtl = null
														
 
															+  try {
														
 
															+    const blob = await recorder.stopRecording()
														
 
															+    recorder.onChunk.value = null
														
 
															+    if (ctl) {
														
 
															+      engine.attachStudentBlob(ctl.studentMsgId, blob)  // ← new
														
 
															+      ctl.finish()
														
 
															+    } else {
														
 
															+      await engine.sendStudentMessage(blob)
														
 
															+    }
														
 
															+  } catch (err) {
														
 
															+    console.error('Recording/send failed:', err)
														
 
															+  }
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+### Removed view code
														
 
															+
														
 
															+- `let currentAudio: HTMLAudioElement | null = null`
														
 
															+- `let currentAudioUrl: string | null = null`
														
 
															+- `function stopCurrentPlayback()`
														
 
															+- TTS branch inside `togglePlay()`
														
 
															+- `playingMessageId` ref
														
 
															+- `engine.cancelTTS()` call in `handleRestart`
														
 
															+
														
 
															+## Data flow
														
 
															+
														
 
															+### Auto-play happy path
														
 
															+
														
 
															+```
														
 
															+1. WS / HTTP stream finishes → engine sets aiMsg.status = 'done'
														
 
															+2. View watcher detects new (id, 'done') → autoPlayedIds.add(id) → player.play(id, { kind: 'tts', text })
														
 
															+3. player.play → stop() → loadingId = id → speechService.synthesize() (≈800ms)
														
 
															+4. fetch resolves → blob → ttsCache.set(id, blob) → URL.createObjectURL(blob)
														
 
															+5. new Audio(url).play() → audio.onplaying fires → loadingId = null, playingId = id
														
 
															+6. audio finishes → onended → playingId = null
														
 
															+```
														
 
															+
														
 
															+### Replay (cache hit) path
														
 
															+
														
 
															+```
														
 
															+1. User clicks play button on AI message → togglePlay(id)
														
 
															+2. Not playing/loading → player.play(id, { kind: 'tts', text })
														
 
															+3. player.play → stop() → loadingId = id → ttsCache.get(id) hits → no fetch
														
 
															+4. URL.createObjectURL(blob) → new Audio.play() → onplaying → playingId = id
														
 
															+```
														
 
															+
														
 
															+### Recording-start interrupt
														
 
															+
														
 
															+```
														
 
															+1. User clicks 开始录音 → handleStartRecording
														
 
															+2. player.stop() → synthAbort.abort() (if synthesizing) → audio.pause() (if playing)
														
 
															+   → loadingId = null, playingId = null
														
 
															+3. recorder.startRecording() proceeds
														
 
															+```
														
 
															+
														
 
															+### Synthesis failure path
														
 
															+
														
 
															+```
														
 
															+1. player.play(id, { kind: 'tts', text }) → loadingId = id
														
 
															+2. synthesize() → fetch rejects (network) or 5xx
														
 
															+3. catch: loadingId = null, errorId = id
														
 
															+4. View re-renders: play button shows ⚠ + "点击重试"
														
 
															+5. User clicks button → togglePlay → not playing/loading → player.play(id, ...) (errorId cleared, retry)
														
 
															+```
														
 
															+
														
 
															+## Edge cases
														
 
															+
														
 
															+| Scenario                                           | Handling                                                                 |
														
 
															+|----------------------------------------------------|--------------------------------------------------------------------------|
														
 
															+| Synthesis network failure                          | `errorId = id`. Button shows retry. No global toast.                     |
														
 
															+| Azure 401 (bad key) / 5xx                          | Same as network failure. Logged to console for ops.                      |
														
 
															+| `audio.play()` rejected (autoplay policy)          | Same as failure. User clicks → counts as user gesture → succeeds.        |
														
 
															+| Synthesis in progress, recording starts            | `synthAbort.abort()` + `loadingId = null`. No audio plays. No error.     |
														
 
															+| Synthesis in progress, new AI message arrives      | Old synth aborted. New `play()` starts. Old message NOT marked errored. |
														
 
															+| User replaying old msg, new AI message arrives     | Old audio paused. New auto-play takes over (matches existing single-channel behavior). |
														
 
															+| User clicks same message twice                     | Second click hits `playingId === id` → `stop()`. Toggle.                 |
														
 
															+| User clicks errored message                        | Error cleared, retry begins.                                             |
														
 
															+| View unmounts mid-synthesis                        | `onUnmounted` → `stop()` → abort. Cached URLs revoked.                   |
														
 
															+| `crypto.randomUUID()` collision (theoretical)      | N/A — not handled. Astronomically unlikely.                              |
														
 
															+| Empty `text` for TTS                               | View guards (`m.content` truthy check in watcher and toggle).            |
														
 
															+| `VITE_AZURE_SPEECH_KEY` missing                    | `synthesize()` throws immediately. Falls into normal failure path → error UI. Console error explains. |
														
 
															+
														
 
															+## Testing strategy
														
 
															+
														
 
															+There is no existing unit test infrastructure for this view. Verification is manual, captured as a checklist:
														
 
															+
														
 
															+**Single-channel correctness:**
														
 
															+- [ ] AI message arrives → auto-plays. Click another AI message's play button → first stops, second plays.
														
 
															+- [ ] AI auto-playing → click student message replay → AI stops, student plays.
														
 
															+- [ ] Student replaying → click AI play button → student stops, AI plays.
														
 
															+- [ ] AI auto-playing → start recording → audio stops within 100ms (no leak into mic).
														
 
															+
														
 
															+**Toggle correctness:**
														
 
															+- [ ] Click playing message → stops. Click again → plays from start.
														
 
															+- [ ] Click loading message (while synthesis pending) → cancels, no audio plays.
														
 
															+
														
 
															+**Cache:**
														
 
															+- [ ] First click on AI message → Network panel shows POST to `*.tts.speech.microsoft.com`.
														
 
															+- [ ] Second click on same message → no new network call.
														
 
															+- [ ] Re-generate AI message (new id) → next click triggers new synthesis.
														
 
															+
														
 
															+**Error UI:**
														
 
															+- [ ] Disconnect network → AI message arrives → button shows ⚠ + "点击重试".
														
 
															+- [ ] Reconnect network → click retry → audio plays.
														
 
															+- [ ] Set bogus `VITE_AZURE_SPEECH_KEY` → all AI plays fail with retry UI.
														
 
															+
														
 
															+**Lifecycle:**
														
 
															+- [ ] Enter dialogue, exit, re-enter — no zombie audio. No console warnings about revoked URLs.
														
 
															+- [ ] Refresh page mid-playback — no error in console on unload.
														
 
															+
														
 
															+**WS audio attachment fix:**
														
 
															+- [ ] Speak a message via WS path → click student replay button → audio plays correctly (regression fix).
														
 
															+
														
 
															+## Configuration
														
 
															+
														
 
															+`.env.local` (developer machine, not committed):
														
 
															+
														
 
															+```
														
 
															+VITE_AZURE_SPEECH_KEY=<your-key>
														
 
															+VITE_AZURE_SPEECH_REGION=eastus
														
 
															+```
														
 
															+
														
 
															+`.env.example` (committed, no values):
														
 
															+
														
 
															+```
														
 
															+VITE_AZURE_SPEECH_KEY=
														
 
															+VITE_AZURE_SPEECH_REGION=
														
 
															+```
														
 
															+
														
 
															+## Security debt (must address before production)
														
 
															+
														
 
															+`VITE_AZURE_SPEECH_KEY` is bundled into the JavaScript shipped to browsers. Anyone with DevTools can extract it. **This is acceptable only for local development and internal demos.**
														
 
															+
														
 
															+Before shipping to external users, replace with the deferred design:
														
 
															+
														
 
															+1. Backend endpoint `GET /api/speaking/azure-token` returns `{ token, region }` (10-min expiry).
														
 
															+2. `speechService.ts` fetches the token before each batch of synthesis calls; caches it for ≤9 minutes; refreshes on 401.
														
 
															+3. Synthesis request `Authorization: Bearer <token>` instead of `Ocp-Apim-Subscription-Key`.
														
 
															+4. `VITE_AZURE_SPEECH_KEY` is removed from the frontend.
														
 
															+
														
 
															+This change is isolated to `speechService.ts` (player and view code stay identical).
														
 
															+
														
 
															+## Out of this design's scope
														
 
															+
														
 
															+- Voice persona switching (multi-voice support).
														
 
															+- Streaming synthesis with sentence boundaries.
														
 
															+- Background music / sound effects mixing.
														
 
															+- Volume / speed controls.
														
 
															+- Persisting cached audio across sessions (IndexedDB).
														
 
															+- Accessibility audit of the new error state (WCAG roles, aria-live).
														
 
															+
														
 
															+These are deferred until product asks for them.