Browse Source

docs: plan unified audio player + Azure TTS implementation

Six-task plan implementing the 2026-04-26 design: speechService,
useAudioPlayer composable, engine cleanup with attachStudentBlob,
view click-playback rewire, auto-play watcher, and 4-state play
button rendering. Each task ends green and self-contained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jimmylee 2 weeks ago
parent
commit
64e68529fd
1 changed files with 1106 additions and 0 deletions
  1. 1106 0
      docs/superpowers/plans/2026-04-26-azure-tts-audio-player.md

+ 1106 - 0
docs/superpowers/plans/2026-04-26-azure-tts-audio-player.md

@@ -0,0 +1,1106 @@
+# Azure TTS + Unified Audio Player Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Replace the dual TTS owners in `DialogueChatView` with a single view-level `useAudioPlayer` composable backed by Azure Speech REST, enforce single-channel + recording-interrupt rules structurally, and surface playback errors per message with one-click retry.
+
+**Architecture:** A new `useAudioPlayer` composable owns the only `HTMLAudioElement` and the only synthesis `AbortController`, exposing `play(id, source)` / `stop()` / three reactive ids (`playingId`, `loadingId`, `errorId`). A new stateless `speechService.synthesize()` calls Azure Speech REST and returns an MP3 blob. `useDialogueEngine` is stripped of all `speechSynthesis` references and gains `attachStudentBlob()` so the WebSocket path can attach the recorded blob to the student message it created. `DialogueChatView` watches for AI messages reaching `done` and triggers auto-play; recording start calls `player.stop()`; the play button on every voice-bar renders one of four states (idle/loading/playing/error) directly from the player refs.
+
+**Tech Stack:** Vue 3 `<script setup>` + TypeScript, Vite (`import.meta.env`), browser `fetch` + `AbortController`, browser `HTMLAudioElement`, Azure Speech REST API.
+
+---
+
+## File Structure
+
+Repo root: `/Users/buoy/Development/gitrepo/PPT`
+
+- Create: `src/views/Editor/EnglishSpeaking/services/speechService.ts` — stateless Azure REST synthesis. One exported function `synthesize(text, signal): Promise<Blob>`.
+- Create: `src/views/Editor/EnglishSpeaking/composables/useAudioPlayer.ts` — sole audio owner. Cache, abort, single-channel rule, error surface.
+- Create: `.env.example` (or append if exists) — document the two new env vars.
+- Modify: `src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts` — remove `speakTTS` / `cancelTTS` / `ttsUtterance` and all 4 call sites; add `attachStudentBlob`.
+- Modify: `src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue` — instantiate `useAudioPlayer`, replace local `currentAudio`/`stopCurrentPlayback`/`playingMessageId`, rewrite `togglePlay`, fix WS path's missing `audioBlob`, add auto-play watcher, call `player.stop()` from `handleStartRecording` / `handleRestart`, render the 4-state play buttons + error hint.
+
+Spec reference: `docs/superpowers/specs/2026-04-26-azure-tts-audio-player-design.md`
+
+Package manager: **`npm`** (project has `package-lock.json`).
+
+---
+
+### Task 1: Azure Speech REST Service + Env
+
+Adds the stateless synthesis function and documents required environment variables. Self-contained — nothing imports it yet.
+
+**Files:**
+- Create: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/services/speechService.ts`
+- Create or append: `/Users/buoy/Development/gitrepo/PPT/.env.example`
+
+- [ ] **Step 1: Create the service file**
+
+Create `src/views/Editor/EnglishSpeaking/services/speechService.ts`:
+
+```ts
+const KEY = import.meta.env.VITE_AZURE_SPEECH_KEY as string | undefined
+const REGION = import.meta.env.VITE_AZURE_SPEECH_REGION as string | undefined
+const VOICE = 'en-US-AriaNeural'
+const FORMAT = 'audio-24khz-48kbitrate-mono-mp3'
+
+/**
+ * Synthesize English text via Azure Speech REST.
+ * Returns an MP3 Blob. Throws on credential / network / non-2xx.
+ *
+ * Pass an AbortSignal so callers (the audio player) can cancel
+ * an in-flight synthesis when the user starts recording or
+ * triggers a different playback.
+ */
+export async function synthesize(text: string, signal?: AbortSignal): Promise<Blob> {
+  if (!KEY || !REGION) {
+    throw new Error('Azure Speech credentials not configured (VITE_AZURE_SPEECH_KEY / VITE_AZURE_SPEECH_REGION)')
+  }
+
+  const ssml =
+    `<speak version='1.0' xml:lang='en-US'>` +
+    `<voice name='${VOICE}'>${escapeXml(text)}</voice>` +
+    `</speak>`
+
+  const res = await fetch(
+    `https://${REGION}.tts.speech.microsoft.com/cognitiveservices/v1`,
+    {
+      method: 'POST',
+      signal,
+      headers: {
+        'Ocp-Apim-Subscription-Key': KEY,
+        'Content-Type': 'application/ssml+xml',
+        'X-Microsoft-OutputFormat': FORMAT,
+        'User-Agent': 'PPT-EnglishSpeaking',
+      },
+      body: ssml,
+    },
+  )
+
+  if (!res.ok) {
+    throw new Error(`Azure TTS failed: ${res.status} ${res.statusText}`)
+  }
+  return res.blob()
+}
+
+function escapeXml(s: string): string {
+  return s.replace(/[<>&'"]/g, c => ({
+    '<': '&lt;',
+    '>': '&gt;',
+    '&': '&amp;',
+    "'": '&apos;',
+    '"': '&quot;',
+  }[c]!))
+}
+```
+
+- [ ] **Step 2: Add env example**
+
+If `.env.example` does not exist, create it with the contents below. If it exists, append the two new lines (after a blank line).
+
+```
+VITE_AZURE_SPEECH_KEY=
+VITE_AZURE_SPEECH_REGION=
+```
+
+- [ ] **Step 3: Type-check passes**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: exits 0, no errors.
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/services/speechService.ts .env.example
+git commit -m "$(cat <<'EOF'
+feat: add Azure Speech REST synthesis service
+
+Stateless synthesize(text, signal) returning an MP3 Blob via
+the Azure Speech REST API. Reads VITE_AZURE_SPEECH_KEY and
+VITE_AZURE_SPEECH_REGION; documents both in .env.example.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+### Task 2: useAudioPlayer Composable
+
+The sole audio playback owner. Owns the only `HTMLAudioElement`, the only synthesis abort controller, and the cache. Self-contained — no consumers yet.
+
+**Files:**
+- Create: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/composables/useAudioPlayer.ts`
+
+- [ ] **Step 1: Create the composable**
+
+Create `src/views/Editor/EnglishSpeaking/composables/useAudioPlayer.ts`:
+
+```ts
+import { ref, onUnmounted, type Ref } from 'vue'
+import { synthesize } from '../services/speechService'
+
+export type PlaySource =
+  | { kind: 'tts'; text: string }
+  | { kind: 'blob'; blob: Blob }
+
+export interface AudioPlayer {
+  /** Id of the message currently playing (audio element fired `playing`). */
+  playingId: Readonly<Ref<string | null>>
+  /** Id of the message whose synthesis or play() is in flight. */
+  loadingId: Readonly<Ref<string | null>>
+  /** Id of the message whose last playback attempt failed. Sticky until next play(). */
+  errorId: Readonly<Ref<string | null>>
+
+  play(id: string, source: PlaySource): Promise<void>
+  stop(): void
+}
+
+/**
+ * Single audio playback owner for the dialogue view.
+ *
+ * Structural guarantees:
+ *  - At most one HTMLAudioElement / one in-flight synthesis at a time.
+ *    Every play() begins by aborting & pausing the prior session.
+ *  - Cached MP3 blobs (one per AI messageId) live in-memory; URLs are
+ *    revoked on view unmount.
+ *  - Errors collapse to a single per-id state. The view renders a retry
+ *    affordance; clicking the same play button re-enters play().
+ */
+export function useAudioPlayer(): AudioPlayer {
+  const playingId = ref<string | null>(null)
+  const loadingId = ref<string | null>(null)
+  const errorId = ref<string | null>(null)
+
+  // Closure-private state. Not reactive on purpose.
+  let currentAudio: HTMLAudioElement | null = null
+  let synthAbort: AbortController | null = null
+  const ttsCache = new Map<string, Blob>()
+  const cachedUrls: string[] = []
+
+  function clearCurrentAudio() {
+    if (currentAudio) {
+      currentAudio.onplaying = null
+      currentAudio.onended = null
+      currentAudio.onerror = null
+      try { currentAudio.pause() } catch { /* ignore */ }
+      currentAudio = null
+    }
+  }
+
+  function failPlayback(id: string, reason: unknown) {
+    if (loadingId.value === id) loadingId.value = null
+    if (playingId.value === id) playingId.value = null
+    errorId.value = id
+    console.warn('[audio-player] playback failed:', id, reason)
+  }
+
+  async function play(id: string, source: PlaySource): Promise<void> {
+    // A fresh attempt — drop stale error.
+    errorId.value = null
+
+    // Abort any in-flight synthesis and pause any current audio.
+    stop()
+
+    loadingId.value = id
+
+    try {
+      let blob: Blob
+      if (source.kind === 'blob') {
+        blob = source.blob
+      }
+      else {
+        const cached = ttsCache.get(id)
+        if (cached) {
+          blob = cached
+        }
+        else {
+          synthAbort = new AbortController()
+          blob = await synthesize(source.text, synthAbort.signal)
+          synthAbort = null
+          // We may have been interrupted while awaiting (loadingId changed).
+          if (loadingId.value !== id) return
+          ttsCache.set(id, blob)
+        }
+      }
+
+      const url = URL.createObjectURL(blob)
+      cachedUrls.push(url)
+
+      const audio = new Audio(url)
+      currentAudio = audio
+
+      audio.onplaying = () => {
+        if (loadingId.value === id) loadingId.value = null
+        playingId.value = id
+      }
+      audio.onended = () => {
+        if (currentAudio === audio) currentAudio = null
+        if (playingId.value === id) playingId.value = null
+      }
+      audio.onerror = () => {
+        if (currentAudio === audio) currentAudio = null
+        failPlayback(id, audio.error)
+      }
+
+      try {
+        await audio.play()
+      }
+      catch (err) {
+        // Most often: autoplay policy blocked the call.
+        if (currentAudio === audio) currentAudio = null
+        failPlayback(id, err)
+      }
+    }
+    catch (err) {
+      // Synthesis path errors. AbortError fires when stop() was called
+      // mid-synthesis — that is a normal interrupt, not a failure.
+      synthAbort = null
+      if (err instanceof Error && err.name === 'AbortError') {
+        if (loadingId.value === id) loadingId.value = null
+        return
+      }
+      failPlayback(id, err)
+    }
+  }
+
+  function stop(): void {
+    if (synthAbort) {
+      synthAbort.abort()
+      synthAbort = null
+    }
+    clearCurrentAudio()
+    loadingId.value = null
+    playingId.value = null
+    // Note: errorId is intentionally left alone here. It is only
+    // cleared at the start of a new play() attempt.
+  }
+
+  onUnmounted(() => {
+    stop()
+    for (const url of cachedUrls) {
+      try { URL.revokeObjectURL(url) } catch { /* ignore */ }
+    }
+    cachedUrls.length = 0
+    ttsCache.clear()
+  })
+
+  return {
+    playingId,
+    loadingId,
+    errorId,
+    play,
+    stop,
+  }
+}
+```
+
+- [ ] **Step 2: Type-check passes**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: exits 0.
+
+- [ ] **Step 3: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/composables/useAudioPlayer.ts
+git commit -m "$(cat <<'EOF'
+feat: add useAudioPlayer composable
+
+Sole audio playback owner with structural single-channel
+guarantee, per-id error surface, and synthesis result cache.
+Calls speechService.synthesize for kind:'tts' sources and
+plays kind:'blob' sources directly.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+### Task 3: Engine Cleanup + attachStudentBlob
+
+Strip the engine of all audio concerns and add the new helper. Keep build green by removing `engine.cancelTTS()` from the one place the view calls it (`handleRestart`).
+
+**Files:**
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts`
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue`
+
+- [ ] **Step 1: Remove `ttsUtterance` declaration**
+
+In `useDialogueEngine.ts`, delete this line (currently around line 16):
+
+```ts
+  let ttsUtterance: SpeechSynthesisUtterance | null = null
+```
+
+- [ ] **Step 2: Remove `speakTTS` call inside `generateGreeting`**
+
+In the same file, inside `generateGreeting()`, delete the line:
+
+```ts
+      speakTTS(aiMessage)
+```
+
+(Currently around line 56, between `aiMsg.status = 'done'` and the `catch`.)
+
+- [ ] **Step 3: Remove `speakTTS` call inside `sendStudentMessage`**
+
+Inside `sendStudentMessage()`'s `'done'` event branch, delete:
+
+```ts
+          speakTTS(aiMsg.content)
+```
+
+(Currently around line 124.)
+
+- [ ] **Step 4: Remove `speakTTS` call inside `regenerateAiMessage`**
+
+Inside `regenerateAiMessage()`'s `'done'` event branch, delete:
+
+```ts
+          speakTTS(aiMsg.content)
+```
+
+(Currently around line 201.)
+
+- [ ] **Step 5: Remove `speakTTS` call inside `beginStudentStream`**
+
+Inside `beginStudentStream()`'s `ws.onmessage` `done` branch, delete:
+
+```ts
+          speakTTS(aiMsg.content)
+```
+
+(Currently around line 369.)
+
+- [ ] **Step 6: Remove the `speakTTS` and `cancelTTS` function definitions**
+
+Delete the entire `==================== TTS ====================` section, currently around lines 242–259:
+
+```ts
+  // ==================== TTS ====================
+
+  function speakTTS(text: string) {
+    if (!text || typeof speechSynthesis === 'undefined') return
+
+    cancelTTS()
+    ttsUtterance = new SpeechSynthesisUtterance(text)
+    ttsUtterance.lang = 'en-US'
+    ttsUtterance.rate = 0.9
+    speechSynthesis.speak(ttsUtterance)
+  }
+
+  function cancelTTS() {
+    if (typeof speechSynthesis !== 'undefined') {
+      speechSynthesis.cancel()
+    }
+    ttsUtterance = null
+  }
+```
+
+- [ ] **Step 7: Remove `cancelTTS()` from `onUnmounted`**
+
+Inside the `onUnmounted` block (currently around line 426–431), delete the `cancelTTS()` line:
+
+```ts
+  onUnmounted(() => {
+    abort()
+    greetingAbortController?.abort()
+    cancelTTS()                          // ← remove this line
+    stopCountdown()
+  })
+```
+
+The block becomes:
+
+```ts
+  onUnmounted(() => {
+    abort()
+    greetingAbortController?.abort()
+    stopCountdown()
+  })
+```
+
+- [ ] **Step 8: Add `attachStudentBlob` helper**
+
+Insert this function after `streamFallback` and before the `// ==================== Cleanup ====================` marker:
+
+```ts
+  /**
+   * Attach the recorded audio blob to a student message that was
+   * pushed by `beginStudentStream` (which doesn't know the final
+   * blob until the user clicks 完成). Lets click-replay work in
+   * the WebSocket path the same way it does for HTTP.
+   */
+  function attachStudentBlob(messageId: string, blob: Blob) {
+    const msg = messages.value.find(m => m.id === messageId)
+    if (msg && msg.role === 'student') msg.audioBlob = blob
+  }
+```
+
+- [ ] **Step 9: Update the returned object**
+
+In the `return { ... }` at the bottom of `useDialogueEngine`, replace `cancelTTS` with `attachStudentBlob`. The block currently ends:
+
+```ts
+    streamFallback,
+    retryMessage,
+    regenerateAiMessage,
+    getReport,
+    completeSession,
+    abort,
+    cancelTTS,
+  }
+```
+
+Change to:
+
+```ts
+    streamFallback,
+    retryMessage,
+    regenerateAiMessage,
+    getReport,
+    completeSession,
+    abort,
+    attachStudentBlob,
+  }
+```
+
+- [ ] **Step 10: Remove `engine.cancelTTS()` from the view's `handleRestart`**
+
+In `DialogueChatView.vue`, inside `handleRestart` (currently around line 801–806), delete the `engine.cancelTTS()` line:
+
+```ts
+function handleRestart() {
+  showExitConfirm.value = false
+  engine.abort()
+  engine.cancelTTS()                     // ← remove this line
+  emit('restart')
+}
+```
+
+The function becomes:
+
+```ts
+function handleRestart() {
+  showExitConfirm.value = false
+  engine.abort()
+  emit('restart')
+}
+```
+
+(`player.stop()` is wired in here in Task 4.)
+
+- [ ] **Step 11: Type-check passes**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: exits 0. There is one transient regression in this commit — auto-play of AI replies stops working. That is restored in Task 5 (auto-play watcher). The build itself is green.
+
+- [ ] **Step 12: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/composables/useDialogueEngine.ts src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue
+git commit -m "$(cat <<'EOF'
+refactor: remove TTS ownership from dialogue engine
+
+Drop speakTTS / cancelTTS / ttsUtterance and all four call
+sites — the engine no longer touches speechSynthesis. Add
+attachStudentBlob so the WebSocket path can attach the final
+recorded blob to its student message. Drop the now-unused
+engine.cancelTTS() call from DialogueChatView.handleRestart.
+
+Auto-play is restored in a follow-up commit that wires the
+new view-level audio player.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+### Task 4: Wire useAudioPlayer Into the View (Replace Click-Playback Path)
+
+Replace all view-local audio state (`currentAudio`, `currentAudioUrl`, `stopCurrentPlayback`, `playingMessageId`) with the new player. Rewrite `togglePlay` against `player.play / player.stop`. Fix the WS path's missing `audioBlob`. Also wire `player.stop()` into `handleStartRecording` and `handleRestart`. Auto-play is still NOT wired — that comes in Task 5.
+
+**Files:**
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue`
+
+- [ ] **Step 1: Import the player composable**
+
+In the `<script setup>` imports near the top (currently around line 425–432), add:
+
+```ts
+import { useAudioPlayer } from '../composables/useAudioPlayer'
+```
+
+(Place it next to the other composable imports — after `useAudioRecorder`.)
+
+- [ ] **Step 2: Instantiate the player alongside engine + recorder**
+
+In the Composables block (currently around line 484–488), add `player`:
+
+```ts
+const engine = useDialogueEngine()
+const recorder = useAudioRecorder()
+const player = useAudioPlayer()
+```
+
+- [ ] **Step 3: Remove view-local audio state**
+
+Delete these declarations from the Local UI State block (currently around line 495 and lines 512–513):
+
+```ts
+const playingMessageId = ref<string | null>(null)
+```
+
+```ts
+let currentAudio: HTMLAudioElement | null = null
+let currentAudioUrl: string | null = null
+```
+
+- [ ] **Step 4: Delete `stopCurrentPlayback`**
+
+Delete the entire function (currently around lines 716–727):
+
+```ts
+function stopCurrentPlayback() {
+  if (currentAudio) {
+    currentAudio.pause()
+    currentAudio = null
+  }
+  if (currentAudioUrl) {
+    URL.revokeObjectURL(currentAudioUrl)
+    currentAudioUrl = null
+  }
+  if (typeof speechSynthesis !== 'undefined') speechSynthesis.cancel()
+  playingMessageId.value = null
+}
+```
+
+- [ ] **Step 5: Rewrite `togglePlay`**
+
+Replace the entire `togglePlay(id)` function (currently around lines 729–770) with:
+
+```ts
+function togglePlay(id: string) {
+  // Same id is currently playing or loading → stop.
+  if (player.playingId.value === id || player.loadingId.value === id) {
+    player.stop()
+    return
+  }
+
+  const msg = engine.messages.value.find(m => m.id === id)
+  if (!msg) return
+
+  if (msg.role === 'student' && msg.audioBlob) {
+    player.play(id, { kind: 'blob', blob: msg.audioBlob })
+  }
+  else if (msg.role === 'ai' && msg.content) {
+    player.play(id, { kind: 'tts', text: msg.content })
+  }
+}
+```
+
+- [ ] **Step 6: Stop player at start of recording**
+
+Update `handleStartRecording` (currently around lines 608–624). Add `player.stop()` immediately after the early-return guard:
+
+```ts
+async function handleStartRecording() {
+  if (!engine.canRecord.value || recorder.isRecording.value) return
+  player.stop()                                    // ← new
+  try {
+    await recorder.startRecording()
+    // ...rest unchanged
+```
+
+- [ ] **Step 7: Stop player on restart**
+
+Update `handleRestart` (currently — after Task 3 — three lines):
+
+```ts
+function handleRestart() {
+  showExitConfirm.value = false
+  engine.abort()
+  player.stop()                                    // ← new
+  emit('restart')
+}
+```
+
+- [ ] **Step 8: Attach recorded blob to WS-path student message**
+
+Update `handleFinishRecording` (currently around lines 639–656). The `if (ctl)` branch needs to attach the blob before calling `ctl.finish()`:
+
+```ts
+async function handleFinishRecording() {
+  if (!recorder.isRecording.value) return
+  const ctl = streamCtl
+  streamCtl = null
+  try {
+    const blob = await recorder.stopRecording()
+    recorder.onChunk.value = null
+    if (ctl) {
+      engine.attachStudentBlob(ctl.studentMsgId, blob)   // ← new
+      ctl.finish()
+    }
+    else {
+      await engine.sendStudentMessage(blob)
+    }
+  }
+  catch (err) {
+    console.error('Recording/send failed:', err)
+  }
+}
+```
+
+- [ ] **Step 9: Drop `stopCurrentPlayback()` call from `onUnmounted`**
+
+The composable's own `onUnmounted` already calls `stop()` and revokes URLs. In `DialogueChatView.vue`'s `onUnmounted` (currently around lines 927–931), remove the `stopCurrentPlayback()` line:
+
+```ts
+onUnmounted(() => {
+  if (idleHintTimer) clearTimeout(idleHintTimer)
+  if (badgeTimer) clearTimeout(badgeTimer)
+  stopCurrentPlayback()                            // ← remove
+})
+```
+
+The block becomes:
+
+```ts
+onUnmounted(() => {
+  if (idleHintTimer) clearTimeout(idleHintTimer)
+  if (badgeTimer) clearTimeout(badgeTimer)
+})
+```
+
+- [ ] **Step 10: Update template references to `playingMessageId`**
+
+The template currently uses `playingMessageId` in two places (around lines 52 and 108) inside the play-button SVGs:
+
+```vue
+<svg v-if="playingMessageId !== message.id" ...>
+```
+
+These conditions will be replaced fully in Task 6. For now, just keep the build green — replace each `playingMessageId !== message.id` with `player.playingId.value !== message.id`:
+
+AI message play button (line ~52):
+```vue
+<button class="play-btn play-ai" @click="togglePlay(message.id)">
+  <svg v-if="player.playingId.value !== message.id" width="12" height="12" viewBox="0 0 24 24" fill="currentColor">
+    <polygon points="5 3 19 12 5 21 5 3" />
+  </svg>
+  <svg v-else width="12" height="12" viewBox="0 0 24 24" fill="currentColor">
+    <rect x="6" y="4" width="4" height="16" /><rect x="14" y="4" width="4" height="16" />
+  </svg>
+</button>
+```
+
+Student message play button (line ~108):
+```vue
+<button class="play-btn play-student" @click="togglePlay(message.id)">
+  <svg v-if="player.playingId.value !== message.id" width="12" height="12" viewBox="0 0 24 24" fill="currentColor">
+    <polygon points="5 3 19 12 5 21 5 3" />
+  </svg>
+  <svg v-else width="12" height="12" viewBox="0 0 24 24" fill="currentColor">
+    <rect x="6" y="4" width="4" height="16" /><rect x="14" y="4" width="4" height="16" />
+  </svg>
+</button>
+```
+
+(Loading and error states are added in Task 6; this step is a holding edit so the file compiles.)
+
+- [ ] **Step 11: Type-check + build pass**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: exits 0.
+
+- [ ] **Step 12: Manual smoke check (click replay only — auto-play still missing)**
+
+Start dev server in another terminal: `npm run dev`. Open the dialogue view. Verify:
+- Click play on the existing AI greeting (manually generated): synthesizes via Azure (Network panel shows POST to `*.tts.speech.microsoft.com`) and plays.
+- Click play on a student message: existing recording plays.
+- Click play on a playing message: stops.
+- Start recording while AI is replaying: AI stops immediately.
+
+Note: AI auto-play after model `done` does NOT yet fire — that is Task 5.
+
+- [ ] **Step 13: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue
+git commit -m "$(cat <<'EOF'
+refactor: route DialogueChatView click-playback through useAudioPlayer
+
+Replace view-local currentAudio / stopCurrentPlayback /
+playingMessageId with the new player composable. togglePlay
+now dispatches student blobs and AI text to player.play, and
+handleStartRecording / handleRestart call player.stop so
+recording always interrupts audio. WebSocket path now attaches
+the recorded blob to its student message (engine.attachStudentBlob)
+so click-replay works in WS mode.
+
+Auto-play of new AI replies is restored in the next commit.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+### Task 5: Auto-Play Watcher
+
+Restore auto-play of AI replies after `done`. View watches `engine.messages` for transitions to `(role: 'ai', status: 'done')` and fires `player.play(id, { kind:'tts', text })`, deduplicated via a per-component `Set<string>`.
+
+**Files:**
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue`
+
+- [ ] **Step 1: Add the dedup set + watcher**
+
+In the Watchers block (after the existing watcher that scrolls to bottom — currently the first `watch(...)` around line 844), insert:
+
+```ts
+// 自动播放:AI 消息流式 done 后,合成并播一次。
+// 用 Set 去重防止 watcher 因为不相关重渲染重复触发。
+const autoPlayedIds = new Set<string>()
+watch(
+  () => engine.messages.value.map(m => `${m.id}:${m.status}`).join('|'),
+  () => {
+    for (const m of engine.messages.value) {
+      if (
+        m.role === 'ai' &&
+        m.status === 'done' &&
+        m.content &&
+        !autoPlayedIds.has(m.id)
+      ) {
+        autoPlayedIds.add(m.id)
+        player.play(m.id, { kind: 'tts', text: m.content })
+      }
+    }
+  },
+)
+```
+
+(Place it next to the other `watch(...)` blocks. Order does not matter functionally.)
+
+- [ ] **Step 2: Type-check passes**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+```
+
+Expected: exits 0.
+
+- [ ] **Step 3: Manual verification**
+
+`npm run dev` → open dialogue view. Verify:
+- Open dialogue → greeting AI message arrives → auto-plays once (Network panel shows one POST to Azure).
+- Speak a turn → AI replies → auto-plays once.
+- During AI auto-play, click replay on the same message: the play button correctly toggles to ⏸ then back to ▶ when stopped (because `playingId === id`).
+- Re-generating an AI error message: the new id auto-plays again (different id, dedup set doesn't block).
+
+- [ ] **Step 4: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue
+git commit -m "$(cat <<'EOF'
+feat: auto-play AI replies via useAudioPlayer
+
+Watch engine.messages for AI messages reaching status='done'
+and dispatch them to player.play once each. A per-component
+Set<string> deduplicates so unrelated reactive churn does not
+re-trigger playback.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+### Task 6: Play Button 4-State Rendering + Error Hint
+
+Render `idle / loading / playing / error` on every voice-bar play button, swap the duration label for "点击重试" when in error, and add CSS for the loading spinner + error variant.
+
+**Files:**
+- Modify: `/Users/buoy/Development/gitrepo/PPT/src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue`
+
+- [ ] **Step 1: Replace the AI message play button**
+
+Locate the AI voice-bar play button (currently around lines 51–58). Replace the entire `<button>` element with:
+
+```vue
+<button
+  class="play-btn play-ai"
+  :class="{ 'play-btn-error': player.errorId.value === message.id }"
+  @click="togglePlay(message.id)"
+>
+  <svg
+    v-if="player.loadingId.value === message.id"
+    class="play-spinner"
+    width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor"
+    stroke-width="2" stroke-linecap="round"
+  >
+    <path d="M21 12a9 9 0 1 1-6.219-8.56" />
+  </svg>
+  <svg
+    v-else-if="player.playingId.value === message.id"
+    width="12" height="12" viewBox="0 0 24 24" fill="currentColor"
+  >
+    <rect x="6" y="4" width="4" height="16" /><rect x="14" y="4" width="4" height="16" />
+  </svg>
+  <svg
+    v-else-if="player.errorId.value === message.id"
+    width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor"
+    stroke-width="2" stroke-linecap="round" stroke-linejoin="round"
+  >
+    <path d="M12 9v4" />
+    <path d="M12 17h.01" />
+    <path d="M10.29 3.86 1.82 18a2 2 0 0 0 1.71 3h16.94a2 2 0 0 0 1.71-3L13.71 3.86a2 2 0 0 0-3.42 0z" />
+  </svg>
+  <svg
+    v-else
+    width="12" height="12" viewBox="0 0 24 24" fill="currentColor"
+  >
+    <polygon points="5 3 19 12 5 21 5 3" />
+  </svg>
+</button>
+```
+
+- [ ] **Step 2: Replace the AI voice-bar duration label**
+
+Immediately after the `<div class="wave-bar-group">…</div>` block in the AI voice-bar (currently around line 67), replace:
+
+```vue
+<span class="voice-duration voice-duration-ai">0:04</span>
+```
+
+with:
+
+```vue
+<span
+  v-if="player.errorId.value === message.id"
+  class="play-error-hint"
+>点击重试</span>
+<span
+  v-else
+  class="voice-duration voice-duration-ai"
+>0:04</span>
+```
+
+- [ ] **Step 3: Replace the student message play button**
+
+Locate the student voice-bar play button (currently around lines 107–114). Replace the entire `<button>` element with:
+
+```vue
+<button
+  class="play-btn play-student"
+  :class="{ 'play-btn-error': player.errorId.value === message.id }"
+  @click="togglePlay(message.id)"
+>
+  <svg
+    v-if="player.loadingId.value === message.id"
+    class="play-spinner"
+    width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor"
+    stroke-width="2" stroke-linecap="round"
+  >
+    <path d="M21 12a9 9 0 1 1-6.219-8.56" />
+  </svg>
+  <svg
+    v-else-if="player.playingId.value === message.id"
+    width="12" height="12" viewBox="0 0 24 24" fill="currentColor"
+  >
+    <rect x="6" y="4" width="4" height="16" /><rect x="14" y="4" width="4" height="16" />
+  </svg>
+  <svg
+    v-else-if="player.errorId.value === message.id"
+    width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor"
+    stroke-width="2" stroke-linecap="round" stroke-linejoin="round"
+  >
+    <path d="M12 9v4" />
+    <path d="M12 17h.01" />
+    <path d="M10.29 3.86 1.82 18a2 2 0 0 0 1.71 3h16.94a2 2 0 0 0 1.71-3L13.71 3.86a2 2 0 0 0-3.42 0z" />
+  </svg>
+  <svg
+    v-else
+    width="12" height="12" viewBox="0 0 24 24" fill="currentColor"
+  >
+    <polygon points="5 3 19 12 5 21 5 3" />
+  </svg>
+</button>
+```
+
+- [ ] **Step 4: Replace the student voice-bar duration label**
+
+Find the student voice-bar duration `<span>` (currently around line 98):
+
+```vue
+<span class="voice-duration voice-duration-student">0:04</span>
+```
+
+Replace with:
+
+```vue
+<span
+  v-if="player.errorId.value === message.id"
+  class="play-error-hint play-error-hint-student"
+>点击重试</span>
+<span
+  v-else
+  class="voice-duration voice-duration-student"
+>0:04</span>
+```
+
+- [ ] **Step 5: Add the new styles**
+
+In the `<style lang="scss" scoped>` block, locate the `.play-student` rule (currently around line 1093). Append the new styles immediately after `.play-student { ... }` (and before `.wave-bar-group`):
+
+```scss
+.play-btn-error {
+  background: #fef2f2 !important;
+  color: #dc2626 !important;
+  border: 1px solid #fecaca;
+  &:hover { background: #fee2e2 !important; }
+}
+.play-spinner {
+  animation: spin 1s linear infinite;
+}
+.play-error-hint {
+  font-size: 10px;
+  color: #dc2626;
+  font-weight: 500;
+  flex-shrink: 0;
+  white-space: nowrap;
+}
+.play-error-hint-student {
+  color: #fff;
+}
+```
+
+(`@keyframes spin` already exists later in this file — no need to redeclare.)
+
+- [ ] **Step 6: Type-check + build pass**
+
+Run:
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+npm run type-check
+npm run build
+```
+
+Both commands: exits 0.
+
+- [ ] **Step 7: Manual verification — happy paths**
+
+`npm run dev` → open dialogue view.
+- AI message arrives → loading spinner shows briefly on the play button → switches to ⏸ icon while audio plays → returns to ▶ when finished.
+- Click student replay button → spinner is barely visible (no synthesis) → ⏸ → ▶.
+- Click an active playing message: button toggles to ▶ instantly.
+
+- [ ] **Step 8: Manual verification — error paths**
+
+Set a bogus `VITE_AZURE_SPEECH_KEY` in `.env.local` and restart dev server.
+- AI message arrives → spinner briefly → switches to ⚠ red icon, "0:04" replaced by "点击重试".
+- Click the warning button → re-attempts → fails again same way (since key is still bogus). No global toast or error card appears.
+- Restore the real key → click again → succeeds, button returns to play state.
+
+Disconnect network mid-conversation:
+- New AI reply arrives → ⚠ + 点击重试 on its button.
+- Reconnect → click button → plays.
+
+- [ ] **Step 9: Manual verification — single-channel + interrupt**
+
+- AI auto-playing → click another AI message's button → first stops, second plays.
+- AI auto-playing → click a student message's button → first stops, student plays.
+- Student replaying → AI auto-play arrives → student stops, AI plays.
+- AI auto-playing → click 开始录音 → audio stops within 100ms.
+- Synthesis in progress (slow network — throttle to "Slow 3G") → click 开始录音 → spinner clears immediately, no audio plays.
+
+- [ ] **Step 10: Manual verification — cache + lifecycle**
+
+Open DevTools Network panel filtered to `tts.speech.microsoft.com`.
+- Click the same AI message twice: only one POST appears.
+- Re-generate an AI message (trigger an error then click 重新生成): a new POST appears for the new message id.
+- Navigate out of the dialogue view → back in → no zombie audio. No "URL revoked" console warnings.
+
+- [ ] **Step 11: Commit**
+
+```bash
+cd /Users/buoy/Development/gitrepo/PPT
+git add src/views/Editor/EnglishSpeaking/preview/DialogueChatView.vue
+git commit -m "$(cat <<'EOF'
+feat: render player state on voice-bar play buttons
+
+Each play button derives idle / loading / playing / error from
+the player refs. Errors swap "0:04" for "点击重试"; clicking
+the warning button retries via the same togglePlay path. AI and
+student variants share the same logic with the existing color
+schemes preserved.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+## Self-Review
+
+**Spec coverage:**
+
+- D1 (view-level player) — Task 2 + Task 4 step 2.
+- D2 (auto-play in view, not engine) — Task 3 (engine cleanup) + Task 5 (watcher).
+- D3 (synthesis cached by message id) — Task 2 (cache map in composable).
+- D4 (per-message error UI) — Task 6 (4-state button + 点击重试).
+- D5 (WS attachStudentBlob fix) — Task 3 step 8 (helper) + Task 4 step 8 (call site).
+- D6 (three-ref state model) — Task 2 (refs) + Task 6 (rendering).
+- File layout — Tasks 1, 2, 3, 4, 5, 6 cover all 5 files in the spec.
+- Goals 1–7 — all mapped: 1→T2, 2→T1, 3→T5, 4→T4 step 5+T5, 5→T2 (structural) + T4 step 6, 6→T6, 7→T3.
+- Edge cases — covered by player implementation in T2 (synth abort, AbortError handling, autoplay rejection, audio.onerror) + verification steps in T6 step 7–10.
+- Security debt note — already in the spec; not an action item for this plan.
+
+**Placeholder scan:** No "TBD", no "implement later", no "similar to Task N", no unspecified error handlers — every step has full code.
+
+**Type consistency:** `play(id, source)`, `stop()`, `playingId`/`loadingId`/`errorId`, `attachStudentBlob(messageId, blob)`, `PlaySource` discriminated union — used identically across the composable definition (T2), engine signature (T3), and view consumers (T4, T5, T6). Method calls in the view (`player.play / player.stop / player.playingId.value`) match the composable's exported shape. The engine's new helper signature matches its single call site. No mismatches found.
+
+---
+
+Plan complete and saved to `docs/superpowers/plans/2026-04-26-azure-tts-audio-player.md`. Two execution options:
+
+**1. Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration
+
+**2. Inline Execution** — Execute tasks in this session using executing-plans, batch execution with checkpoints
+
+Which approach?