mirror of https://github.com/siteboon/claudecodeui.git synced 2026-06-25 04:13:51 +08:00

Files

newsbubbles 711936d279 refactor(voice): provider-agnostic backend and in-app config

Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.

Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.

2026-06-09 10:05:06 +01:00

1.9 KiB

Raw Blame History

Voice (optional)

Two opt-in voice features in the chat:

Push-to-talk dictation — a mic button in the composer records, transcribes, and fills the input.
Read-aloud — a speaker button on each assistant message plays it back.

Voice is off by default. Turn it on with the Voice toggle in Quick Settings or in Settings → Voice. When off, the mic and speaker controls are hidden.

Backend

Voice uses any OpenAI-compatible audio backend, configured in Settings → Voice:

Field	Example	Notes
Base URL	`https://api.openai.com/v1`	OpenAI, Groq, or a local server
API key	`sk-…`	sent only to this app's backend, which proxies the request
Speech-to-text model	`whisper-1`, `gpt-4o-transcribe`, `whisper-large-v3-turbo`
Text-to-speech model	`tts-1`, `gpt-4o-mini-tts`, `kokoro`
Voice	`alloy`, `af_heart`, …	depends on the backend

The backend must expose the standard endpoints:

POST {baseUrl}/audio/transcriptions   (multipart 'file' + 'model')   -> { "text": "..." }
POST {baseUrl}/audio/speech           ({ model, voice, input })       -> audio bytes

That covers OpenAI and Groq, plus local servers like LocalAI, Speaches, Kokoro-FastAPI, and openedai-speech. Requests are proxied through the app's authenticated /api/voice/* routes, so a local backend only needs to listen on localhost.

Server-side defaults (optional)

Instead of (or as defaults behind) the Settings fields, you can set env vars on the server:

VOICE_API_BASE_URL=http://127.0.0.1:8765/v1
VOICE_API_KEY=...
VOICE_STT_MODEL=whisper-1
VOICE_TTS_MODEL=tts-1
VOICE_TTS_VOICE=alloy

Per-user Settings values override these. If neither is set, the voice routes return 503.

Notes

Recording needs a secure context (HTTPS or localhost) for microphone access.
On iOS, read-aloud is tap-initiated to satisfy Safari's autoplay policy.

1.9 KiB Raw Blame History

Voice (optional)

Backend

Server-side defaults (optional)

Notes

1.9 KiB

Raw Blame History