diff --git a/.gitignore b/.gitignore
index 8b7815d7..e6b7985b 100755
--- a/.gitignore
+++ b/.gitignore
@@ -142,10 +142,3 @@ tasks/
 
 # Git worktrees
 .worktrees/
-
-# Voice sidecar (Python) — generated, machine-specific, not committed
-voice-sidecar/.venv/
-voice-sidecar/voice_messages/
-voice-sidecar/**/__pycache__/
-*.pyc
-*.wav
diff --git a/docs/voice.md b/docs/voice.md
index b8e5baec..71f81b84 100644
--- a/docs/voice.md
+++ b/docs/voice.md
@@ -1,57 +1,51 @@
 # Voice (optional)
 
-Adds two opt-in voice features to the chat:
+Two opt-in voice features in the chat:
 
-- **Push-to-talk dictation** — a mic button in the composer records your voice, transcribes it
-  (speech-to-text), and drops the text into the input.
-- **Read-aloud** — a speaker button on each assistant message plays it back (text-to-speech).
+- **Push-to-talk dictation** — a mic button in the composer records, transcribes, and fills the input.
+- **Read-aloud** — a speaker button on each assistant message plays it back.
 
-Voice is **disabled by default**. The UI only appears when a voice backend is configured, so it has
-zero impact on installs that don't use it.
+Voice is **off by default**. Turn it on with the **Voice** toggle in Quick Settings or in
+**Settings → Voice**. When off, the mic and speaker controls are hidden.
 
-## Enable it
+## Backend
 
-Set `VOICE_SIDECAR_URL` for the server to point at a voice backend, then restart:
+Voice uses any **OpenAI-compatible audio backend**, configured in **Settings → Voice**:
 
-```bash
-VOICE_SIDECAR_URL=http://127.0.0.1:8765 npm run server
-```
-
-When set, `GET /api/voice/health` reports `{ "enabled": true }` and the mic + speaker controls appear.
-All voice requests are proxied through the app's authenticated `/api/voice/*` routes, so the backend
-itself only needs to listen on localhost and is never exposed directly.
-
-## Backend contract
-
-`VOICE_SIDECAR_URL` can point at **any** service that implements two endpoints:
-
-| Method & path | Request | Response |
+| Field | Example | Notes |
 |---|---|---|
-| `POST /transcribe` | multipart, field `audio` (webm/mp4/wav/…) | `{ "text": "..." }` |
-| `POST /tts` | form field `text` | audio bytes (`audio/*`, e.g. wav/mp3) |
+| Base URL | `https://api.openai.com/v1` | OpenAI, Groq, or a local server |
+| API key | `sk-…` | sent only to this app's backend, which proxies the request |
+| Speech-to-text model | `whisper-1`, `gpt-4o-transcribe`, `whisper-large-v3-turbo` | |
+| Text-to-speech model | `tts-1`, `gpt-4o-mini-tts`, `kokoro` | |
+| Voice | `alloy`, `af_heart`, … | depends on the backend |
 
-This keeps the feature provider-agnostic — you can back it with the bundled local sidecar, or a cloud
-transcription + TTS gateway, as long as it speaks that contract.
+The backend must expose the standard endpoints:
 
-## Reference backend: `voice-sidecar/`
-
-A local, no-API-key reference implementation using **faster-whisper** (STT) and **Kokoro-82M** (TTS),
-both CPU-capable.
-
-```bash
-cd voice-sidecar
-python -m venv .venv && . .venv/bin/activate    # (Windows: .venv\Scripts\activate)
-pip install -r requirements.txt
-python -m uvicorn app:app --host 127.0.0.1 --port 8765
+```
+POST {baseUrl}/audio/transcriptions   (multipart 'file' + 'model')   -> { "text": "..." }
+POST {baseUrl}/audio/speech           ({ model, voice, input })       -> audio bytes
 ```
 
-Then run the app with `VOICE_SIDECAR_URL=http://127.0.0.1:8765`.
+That covers OpenAI and Groq, plus local servers like **LocalAI**, **Speaches**, **Kokoro-FastAPI**,
+and **openedai-speech**. Requests are proxied through the app's authenticated `/api/voice/*` routes,
+so a local backend only needs to listen on localhost.
 
-Config (env, all optional) — see `voice-sidecar/.env.example`: `WHISPER_MODEL_SIZE`, `WHISPER_DEVICE`
-(`cpu`/`cuda`), `KOKORO_VOICE`, `VOICE_PORT`.
+### Server-side defaults (optional)
+
+Instead of (or as defaults behind) the Settings fields, you can set env vars on the server:
+
+```
+VOICE_API_BASE_URL=http://127.0.0.1:8765/v1
+VOICE_API_KEY=...
+VOICE_STT_MODEL=whisper-1
+VOICE_TTS_MODEL=tts-1
+VOICE_TTS_VOICE=alloy
+```
+
+Per-user Settings values override these. If neither is set, the voice routes return 503.
 
 ## Notes
 
-- The first read-aloud is slow (~10–20s) while the model lazy-loads; it's near-instant and cached after.
 - Recording needs a secure context (HTTPS or localhost) for microphone access.
-- On iOS, playback is tap-initiated (manual read-aloud) to satisfy Safari's autoplay policy.
+- On iOS, read-aloud is tap-initiated to satisfy Safari's autoplay policy.
diff --git a/server/voice-proxy.js b/server/voice-proxy.js
index 3bdb748a..770a91de 100644
--- a/server/voice-proxy.js
+++ b/server/voice-proxy.js
@@ -1,48 +1,71 @@
-// Optional voice proxy — forwards speech-to-text / text-to-speech to a configurable backend.
+// Optional voice proxy — forwards STT/TTS to an OpenAI-compatible audio backend.
 //
-// Opt-in: voice is DISABLED unless VOICE_SIDECAR_URL is set. When set, it must point at a
-// backend (any implementation) exposing:
-//     POST /transcribe   (multipart field 'audio')  -> { text }
-//     POST /tts          (form field 'text')        -> audio bytes (audio/*)
-// A reference backend (local faster-whisper + Kokoro) ships in /voice-sidecar, but any
-// service implementing the two endpoints works (e.g. a cloud transcription + TTS gateway).
+// The backend is whatever the user points at: OpenAI, Groq, or a local server
+// (LocalAI / Speaches / Kokoro-FastAPI / openedai-speech / etc.). It must expose the
+// standard OpenAI audio endpoints:
+//     POST {base}/audio/transcriptions   (multipart 'file' + 'model')      -> { text }
+//     POST {base}/audio/speech           ({ model, voice, input })         -> audio bytes
 //
-// Mounted at /api/voice behind authenticateToken, so it inherits the app's auth. The backend
-// should bind to localhost and is never exposed directly.
+// Config is resolved per-request from headers (set by the client's voice settings),
+// falling back to server env defaults. Mounted at /api/voice behind authenticateToken.
 import express from 'express';
 
-const VOICE_SIDECAR_URL = (process.env.VOICE_SIDECAR_URL || '').replace(/\/$/, '');
-const VOICE_ENABLED = Boolean(VOICE_SIDECAR_URL);
+const ENV = {
+  baseUrl: (process.env.VOICE_API_BASE_URL || '').replace(/\/$/, ''),
+  apiKey: process.env.VOICE_API_KEY || '',
+  sttModel: process.env.VOICE_STT_MODEL || 'whisper-1',
+  ttsModel: process.env.VOICE_TTS_MODEL || 'tts-1',
+  ttsVoice: process.env.VOICE_TTS_VOICE || 'alloy',
+  ttsFormat: process.env.VOICE_TTS_FORMAT || 'mp3',
+};
+
+// Per-request config: client headers (from the user's voice settings) override env defaults.
+function resolveConfig(req) {
+  const h = req.headers;
+  return {
+    baseUrl: (String(h['x-voice-base-url'] || '') || ENV.baseUrl).replace(/\/$/, ''),
+    apiKey: String(h['x-voice-api-key'] || '') || ENV.apiKey,
+    sttModel: String(h['x-voice-stt-model'] || '') || ENV.sttModel,
+    ttsModel: String(h['x-voice-tts-model'] || '') || ENV.ttsModel,
+    ttsVoice: String(h['x-voice-tts-voice'] || '') || ENV.ttsVoice,
+  };
+}
 
 const router = express.Router();
 
-// Lazy multer (memory storage) for the audio upload — matches index.js's pattern.
+const VOICE_TIMEOUT_MS = Number(process.env.VOICE_TIMEOUT_MS || 60000);
+async function fetchWithTimeout(url, options = {}) {
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), VOICE_TIMEOUT_MS);
+  try {
+    return await fetch(url, { ...options, signal: controller.signal });
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
 let _upload = null;
 async function getUpload() {
   if (!_upload) {
     const multer = (await import('multer')).default;
-    _upload = multer({
-      storage: multer.memoryStorage(),
-      limits: { fileSize: 25 * 1024 * 1024 }, // 25MB — short dictation clips
-    });
+    _upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 25 * 1024 * 1024 } });
   }
   return _upload;
 }
 
-function ensureEnabled(res) {
-  if (!VOICE_ENABLED) {
-    res.status(503).json({ error: 'Voice is not configured. Set VOICE_SIDECAR_URL to enable it.' });
-    return false;
-  }
-  return true;
+function authHeader(apiKey) {
+  return apiKey ? { Authorization: `Bearer ${apiKey}` } : {};
 }
 
-// GET /api/voice/health -> { enabled }  (frontend hides the voice UI when disabled)
-router.get('/health', (_req, res) => res.json({ enabled: VOICE_ENABLED }));
+// GET /api/voice/health -> { configured } (true if a base URL is available)
+router.get('/health', (req, res) => {
+  res.json({ configured: Boolean(resolveConfig(req).baseUrl) });
+});
 
 // POST /api/voice/transcribe  (multipart 'audio') -> { text }
 router.post('/transcribe', async (req, res) => {
-  if (!ensureEnabled(res)) return;
+  const cfg = resolveConfig(req);
+  if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
   const upload = await getUpload();
   upload.single('audio')(req, res, async (err) => {
     if (err) return res.status(400).json({ error: err.message });
@@ -50,13 +73,21 @@ router.post('/transcribe', async (req, res) => {
     try {
       const fd = new FormData();
       fd.append(
-        'audio',
+        'file',
         new Blob([req.file.buffer], { type: req.file.mimetype || 'audio/webm' }),
         req.file.originalname || 'recording.webm',
       );
-      const r = await fetch(`${VOICE_SIDECAR_URL}/transcribe`, { method: 'POST', body: fd });
-      const data = await r.json().catch(() => ({ error: 'bad voice backend response' }));
-      res.status(r.status).json(data);
+      fd.append('model', cfg.sttModel);
+      const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/transcriptions`, {
+        method: 'POST',
+        headers: authHeader(cfg.apiKey),
+        body: fd,
+      });
+      const text = await r.text();
+      if (!r.ok) return res.status(r.status).json({ error: text || 'transcription failed' });
+      let data;
+      try { data = JSON.parse(text); } catch { data = { text }; }
+      res.json({ text: data.text ?? '' });
     } catch (e) {
       res.status(502).json({ error: `voice backend unreachable: ${e.message}` });
     }
@@ -65,18 +96,26 @@ router.post('/transcribe', async (req, res) => {
 
 // POST /api/voice/tts  { text } -> audio bytes
 router.post('/tts', async (req, res) => {
-  if (!ensureEnabled(res)) return;
+  const cfg = resolveConfig(req);
+  if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
   const text = req.body?.text;
   if (!text || !text.trim()) return res.status(400).json({ error: 'text required' });
   try {
-    const fd = new FormData();
-    fd.append('text', text);
-    const r = await fetch(`${VOICE_SIDECAR_URL}/tts`, { method: 'POST', body: fd });
+    const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/speech`, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json', ...authHeader(cfg.apiKey) },
+      body: JSON.stringify({
+        model: cfg.ttsModel,
+        voice: cfg.ttsVoice,
+        input: text,
+        response_format: ENV.ttsFormat,
+      }),
+    });
     if (!r.ok) {
       const errText = await r.text().catch(() => 'tts failed');
       return res.status(r.status).json({ error: errText });
     }
-    res.setHeader('Content-Type', r.headers.get('content-type') || 'audio/wav');
+    res.setHeader('Content-Type', r.headers.get('content-type') || 'audio/mpeg');
     res.setHeader('Cache-Control', 'no-store');
     res.send(Buffer.from(await r.arrayBuffer()));
   } catch (e) {
diff --git a/src/components/chat/hooks/useTts.ts b/src/components/chat/hooks/useTts.ts
index 46ab0f27..4ceb3887 100644
--- a/src/components/chat/hooks/useTts.ts
+++ b/src/components/chat/hooks/useTts.ts
@@ -1,5 +1,6 @@
 import { useCallback, useEffect, useRef, useState } from 'react';
 import { authenticatedFetch } from '../../../utils/api';
+import { voiceConfigHeaders } from '../../../hooks/useVoiceConfig';
 
 // Only one message speaks at a time across the whole app.
 let stopActive: (() => void) | null = null;
@@ -36,8 +37,14 @@ export function useTts(getText: () => string) {
     if (stopActive) stopActive = null;
   }, [reset]);
 
-  // Cleanup on unmount.
-  useEffect(() => () => reset(), [reset]);
+  // Cleanup on unmount: drop the global stop handler if it points at us, then reset.
+  useEffect(
+    () => () => {
+      if (stopActive === stop) stopActive = null;
+      reset();
+    },
+    [reset, stop],
+  );
 
   const play = useCallback(async () => {
     if (stopActive) stopActive();
@@ -63,12 +70,16 @@ export function useTts(getText: () => string) {
       const res = await authenticatedFetch('/api/voice/tts', {
         method: 'POST',
         body: JSON.stringify({ text }),
+        headers: voiceConfigHeaders(),
       });
       if (!res.ok) throw new Error(`tts ${res.status}`);
       const blob = await res.blob();
       const url = URL.createObjectURL(blob);
+      if (audioRef.current !== audio) {
+        URL.revokeObjectURL(url); // stopped while loading; don't leak the blob URL
+        return;
+      }
       urlRef.current = url;
-      if (audioRef.current !== audio) return; // stopped while loading
       audio.src = url;
       audio.load();
       await audio.play();
diff --git a/src/components/chat/hooks/useVoiceAvailable.ts b/src/components/chat/hooks/useVoiceAvailable.ts
index 463e4ff3..388411f0 100644
--- a/src/components/chat/hooks/useVoiceAvailable.ts
+++ b/src/components/chat/hooks/useVoiceAvailable.ts
@@ -1,38 +1,37 @@
 import { useEffect, useState } from 'react';
-import { authenticatedFetch } from '../../../utils/api';
 
-// Whether the optional voice feature is configured on the server (VOICE_SIDECAR_URL set).
-// Probed once and cached app-wide so the mic/speak controls can hide themselves when off.
-let cached: boolean | null = null;
-let inflight: Promise<boolean> | null = null;
+// Voice UI is gated on the `voiceEnabled` UI preference (toggled in Quick Settings /
+// the Settings modal). This is a lightweight read-only view of that preference so the
+// mic/speak controls can hide themselves, kept in sync via the same events
+// useUiPreferences emits. No server probe.
+const STORAGE_KEY = 'uiPreferences';
+const SYNC_EVENT = 'ui-preferences:sync';
 
-function probe(): Promise<boolean> {
-  if (cached !== null) return Promise.resolve(cached);
-  if (!inflight) {
-    inflight = authenticatedFetch('/api/voice/health')
-      .then((r) => (r.ok ? r.json() : { enabled: false }))
-      .then((d) => {
-        cached = Boolean(d?.enabled);
-        return cached;
-      })
-      .catch(() => {
-        cached = false;
-        return false;
-      });
+function readVoiceEnabled(): boolean {
+  try {
+    const raw = localStorage.getItem(STORAGE_KEY);
+    if (!raw) return false;
+    const parsed = JSON.parse(raw);
+    return parsed?.voiceEnabled === true || parsed?.voiceEnabled === 'true';
+  } catch {
+    return false;
   }
-  return inflight;
 }
 
 export function useVoiceAvailable(): boolean {
-  const [available, setAvailable] = useState<boolean>(cached ?? false);
+  const [enabled, setEnabled] = useState<boolean>(() =>
+    typeof window === 'undefined' ? false : readVoiceEnabled(),
+  );
+
   useEffect(() => {
-    let mounted = true;
-    probe().then((v) => {
-      if (mounted) setAvailable(v);
-    });
+    const update = () => setEnabled(readVoiceEnabled());
+    window.addEventListener('storage', update);
+    window.addEventListener(SYNC_EVENT, update as EventListener);
     return () => {
-      mounted = false;
+      window.removeEventListener('storage', update);
+      window.removeEventListener(SYNC_EVENT, update as EventListener);
     };
   }, []);
-  return available;
+
+  return enabled;
 }
diff --git a/src/components/chat/hooks/useVoiceInput.ts b/src/components/chat/hooks/useVoiceInput.ts
index bc83a803..ccf0ed53 100644
--- a/src/components/chat/hooks/useVoiceInput.ts
+++ b/src/components/chat/hooks/useVoiceInput.ts
@@ -1,5 +1,6 @@
-import { useCallback, useRef, useState } from 'react';
+import { useCallback, useEffect, useRef, useState } from 'react';
 import { authenticatedFetch } from '../../../utils/api';
+import { voiceConfigHeaders } from '../../../hooks/useVoiceConfig';
 
 // Mobile-safe recording: iOS Safari 18.4+ supports webm/opus; older iOS needs mp4.
 const MIME_CANDIDATES = [
@@ -39,6 +40,15 @@ export function useVoiceInput(onTranscript: (text: string) => void, onError?: (m
     streamRef.current = null;
   };
 
+  // Stop the mic if the component unmounts mid-recording.
+  useEffect(() => {
+    return () => {
+      streamRef.current?.getTracks().forEach((t) => t.stop());
+      streamRef.current = null;
+      recorderRef.current = null;
+    };
+  }, []);
+
   const start = useCallback(async () => {
     try {
       const stream = await navigator.mediaDevices.getUserMedia({
@@ -68,7 +78,11 @@ export function useVoiceInput(onTranscript: (text: string) => void, onError?: (m
           const ext = type.includes('mp4') ? 'm4a' : type.includes('ogg') ? 'ogg' : 'webm';
           const fd = new FormData();
           fd.append('audio', blob, `recording.${ext}`);
-          const res = await authenticatedFetch('/api/voice/transcribe', { method: 'POST', body: fd });
+          const res = await authenticatedFetch('/api/voice/transcribe', {
+            method: 'POST',
+            body: fd,
+            headers: voiceConfigHeaders(),
+          });
           if (!res.ok) throw new Error(`transcribe ${res.status}`);
           const data = await res.json();
           const text = String(data?.text || '').trim();
diff --git a/src/components/chat/view/ChatInterface.tsx b/src/components/chat/view/ChatInterface.tsx
index df2bcd88..18996b71 100644
--- a/src/components/chat/view/ChatInterface.tsx
+++ b/src/components/chat/view/ChatInterface.tsx
@@ -404,7 +404,7 @@ function ChatInterface({
           renderInputWithMentions={renderInputWithMentions}
           textareaRef={textareaRef}
           input={input}
-          onVoiceTranscript={(text) => setInput(input ? `${input} ${text}` : text)}
+          onVoiceTranscript={(text) => setInput(input.trim() ? `${input.trim()} ${text}` : text)}
           onInputChange={handleInputChange}
           onTextareaClick={handleTextareaClick}
           onTextareaKeyDown={handleKeyDown}
diff --git a/src/components/chat/view/subcomponents/VoiceInputButton.tsx b/src/components/chat/view/subcomponents/VoiceInputButton.tsx
index aeb3585f..6a6304e1 100644
--- a/src/components/chat/view/subcomponents/VoiceInputButton.tsx
+++ b/src/components/chat/view/subcomponents/VoiceInputButton.tsx
@@ -1,3 +1,4 @@
+import { useEffect, useRef, useState } from 'react';
 import { Mic, Square, Loader2 } from 'lucide-react';
 import { useTranslation } from 'react-i18next';
 import { useVoiceInput } from '../../hooks/useVoiceInput';
@@ -10,10 +11,25 @@ type Props = {
 };
 
 // Push-to-talk mic button. Renders nothing unless the optional voice feature is enabled.
+// Surfaces transcription errors itself (transiently) so they aren't silently swallowed.
 export default function VoiceInputButton({ onTranscript, onError }: Props) {
   const { t } = useTranslation('chat');
   const available = useVoiceAvailable();
-  const { state, toggle } = useVoiceInput(onTranscript, onError);
+  const [errorMsg, setErrorMsg] = useState<string | null>(null);
+  const errorTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
+
+  const handleError = (msg: string) => {
+    onError?.(msg);
+    setErrorMsg(msg);
+    if (errorTimer.current) clearTimeout(errorTimer.current);
+    errorTimer.current = setTimeout(() => setErrorMsg(null), 4000);
+  };
+
+  const { state, toggle } = useVoiceInput(onTranscript, handleError);
+
+  useEffect(() => () => {
+    if (errorTimer.current) clearTimeout(errorTimer.current);
+  }, []);
 
   if (!available) return null;
 
@@ -27,14 +43,21 @@ export default function VoiceInputButton({ onTranscript, onError }: Props) {
     );
 
   return (
-    <PromptInputButton
-      tooltip={{ content: state === 'recording' ? t('voice.stopRecording') : t('voice.input') }}
-      onClick={(e: { preventDefault: () => void }) => {
-        e.preventDefault();
-        toggle();
-      }}
-    >
-      {icon}
-    </PromptInputButton>
+    <span className="relative inline-flex">
+      {errorMsg && (
+        <span className="absolute bottom-full left-1/2 mb-1 -translate-x-1/2 whitespace-nowrap rounded bg-red-600 px-2 py-1 text-xs text-white shadow-lg">
+          {errorMsg}
+        </span>
+      )}
+      <PromptInputButton
+        tooltip={{ content: state === 'recording' ? t('voice.stopRecording') : t('voice.input') }}
+        onClick={(e: { preventDefault: () => void }) => {
+          e.preventDefault();
+          toggle();
+        }}
+      >
+        {icon}
+      </PromptInputButton>
+    </span>
   );
 }
diff --git a/src/components/quick-settings-panel/constants.ts b/src/components/quick-settings-panel/constants.ts
index 15c15458..408a64c7 100644
--- a/src/components/quick-settings-panel/constants.ts
+++ b/src/components/quick-settings-panel/constants.ts
@@ -4,6 +4,7 @@ import {
   Eye,
   Languages,
   Maximize2,
+  Mic,
 } from 'lucide-react';
 import type { PreferenceToggleItem } from './types';
 
@@ -54,4 +55,9 @@ export const INPUT_SETTING_TOGGLES: PreferenceToggleItem[] = [
     labelKey: 'quickSettings.sendByCtrlEnter',
     icon: Languages,
   },
+  {
+    key: 'voiceEnabled',
+    labelKey: 'quickSettings.voiceEnabled',
+    icon: Mic,
+  },
 ];
diff --git a/src/components/quick-settings-panel/types.ts b/src/components/quick-settings-panel/types.ts
index 16002694..8d4f0826 100644
--- a/src/components/quick-settings-panel/types.ts
+++ b/src/components/quick-settings-panel/types.ts
@@ -6,7 +6,8 @@ export type PreferenceToggleKey =
   | 'showRawParameters'
   | 'showThinking'
   | 'autoScrollToBottom'
-  | 'sendByCtrlEnter';
+  | 'sendByCtrlEnter'
+  | 'voiceEnabled';
 
 export type QuickSettingsPreferences = Record<PreferenceToggleKey, boolean>;
 
diff --git a/src/components/settings/types/types.ts b/src/components/settings/types/types.ts
index 8fe3b7ff..bab8a430 100644
--- a/src/components/settings/types/types.ts
+++ b/src/components/settings/types/types.ts
@@ -2,7 +2,7 @@ import type { Dispatch, SetStateAction } from 'react';
 import type { LLMProvider } from '../../../types/app';
 import type { ProviderAuthStatus } from '../../provider-auth/types';
 
-export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'tasks' | 'notifications' | 'plugins' | 'about';
+export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'voice' | 'tasks' | 'notifications' | 'plugins' | 'about';
 export type AgentProvider = LLMProvider;
 export type AgentCategory = 'account' | 'permissions' | 'mcp';
 export type ProjectSortOrder = 'name' | 'date';
diff --git a/src/components/settings/view/Settings.tsx b/src/components/settings/view/Settings.tsx
index 8340a547..0b591402 100644
--- a/src/components/settings/view/Settings.tsx
+++ b/src/components/settings/view/Settings.tsx
@@ -6,6 +6,7 @@ import SettingsSidebar from '../view/SettingsSidebar';
 import AgentsSettingsTab from '../view/tabs/agents-settings/AgentsSettingsTab';
 import AppearanceSettingsTab from '../view/tabs/AppearanceSettingsTab';
 import CredentialsSettingsTab from '../view/tabs/api-settings/CredentialsSettingsTab';
+import VoiceSettingsTab from '../view/tabs/VoiceSettingsTab';
 import GitSettingsTab from '../view/tabs/git-settings/GitSettingsTab';
 import NotificationsSettingsTab from '../view/tabs/NotificationsSettingsTab';
 import TasksSettingsTab from '../view/tabs/tasks-settings/TasksSettingsTab';
@@ -153,6 +154,8 @@ function Settings({ isOpen, onClose, projects = [], initialTab = 'agents' }: Set
 
               {activeTab === 'api' && <CredentialsSettingsTab />}
 
+              {activeTab === 'voice' && <VoiceSettingsTab />}
+
               {activeTab === 'plugins' && <PluginSettingsTab />}
 
               {activeTab === 'about' && <AboutTab />}
diff --git a/src/components/settings/view/SettingsSidebar.tsx b/src/components/settings/view/SettingsSidebar.tsx
index 149c1492..194ccc98 100644
--- a/src/components/settings/view/SettingsSidebar.tsx
+++ b/src/components/settings/view/SettingsSidebar.tsx
@@ -1,4 +1,4 @@
-import { Bell, Bot, GitBranch, Info, Key, ListChecks, Palette, Puzzle } from 'lucide-react';
+import { Bell, Bot, GitBranch, Info, Key, ListChecks, Mic, Palette, Puzzle } from 'lucide-react';
 import { useTranslation } from 'react-i18next';
 import { cn } from '../../../lib/utils';
 import { PillBar, Pill } from '../../../shared/view/ui';
@@ -20,6 +20,7 @@ const NAV_ITEMS: NavItem[] = [
   { id: 'appearance', labelKey: 'mainTabs.appearance', icon: Palette },
   { id: 'git', labelKey: 'mainTabs.git', icon: GitBranch },
   { id: 'api', labelKey: 'mainTabs.apiTokens', icon: Key },
+  { id: 'voice', labelKey: 'mainTabs.voice', icon: Mic },
   { id: 'tasks', labelKey: 'mainTabs.tasks', icon: ListChecks },
   { id: 'plugins', labelKey: 'mainTabs.plugins', icon: Puzzle },
   { id: 'notifications', labelKey: 'mainTabs.notifications', icon: Bell },
diff --git a/src/components/settings/view/tabs/VoiceSettingsTab.tsx b/src/components/settings/view/tabs/VoiceSettingsTab.tsx
new file mode 100644
index 00000000..3de61fba
--- /dev/null
+++ b/src/components/settings/view/tabs/VoiceSettingsTab.tsx
@@ -0,0 +1,82 @@
+import type { InputHTMLAttributes } from 'react';
+import { useTranslation } from 'react-i18next';
+import SettingsSection from '../SettingsSection';
+import SettingsToggle from '../SettingsToggle';
+import { useUiPreferences } from '../../../../hooks/useUiPreferences';
+import { useVoiceConfig } from '../../../../hooks/useVoiceConfig';
+
+const inputClass =
+  'w-full rounded-md border border-border bg-background px-3 py-2 text-sm text-foreground placeholder:text-muted-foreground focus:outline-none focus:ring-2 focus:ring-ring';
+
+function Field({ label, ...props }: { label: string } & InputHTMLAttributes<HTMLInputElement>) {
+  return (
+    <label className="block space-y-1">
+      <span className="text-sm font-medium text-foreground">{label}</span>
+      <input className={inputClass} {...props} />
+    </label>
+  );
+}
+
+export default function VoiceSettingsTab() {
+  const { t } = useTranslation('settings');
+  const { preferences, setPreference } = useUiPreferences();
+  const { config, update } = useVoiceConfig();
+
+  return (
+    <div className="space-y-8">
+      <SettingsSection title={t('voiceSettings.title')} description={t('voiceSettings.description')}>
+        <div className="flex items-center justify-between rounded-lg border border-border p-3">
+          <div className="pr-3">
+            <div className="text-sm font-medium text-foreground">{t('voiceSettings.enable')}</div>
+            <div className="text-xs text-muted-foreground">{t('voiceSettings.enableDescription')}</div>
+          </div>
+          <SettingsToggle
+            checked={preferences.voiceEnabled}
+            onChange={(v) => setPreference('voiceEnabled', v)}
+            ariaLabel={t('voiceSettings.enable')}
+          />
+        </div>
+      </SettingsSection>
+
+      <SettingsSection title={t('voiceSettings.backendTitle')} description={t('voiceSettings.backendDescription')}>
+        <div className="space-y-4">
+          <Field
+            label={t('voiceSettings.baseUrl')}
+            placeholder="https://api.openai.com/v1"
+            value={config.baseUrl}
+            onChange={(e) => update({ baseUrl: e.target.value })}
+          />
+          <Field
+            label={t('voiceSettings.apiKey')}
+            type="password"
+            autoComplete="off"
+            placeholder="sk-…"
+            value={config.apiKey}
+            onChange={(e) => update({ apiKey: e.target.value })}
+          />
+          <div className="grid grid-cols-1 gap-4 sm:grid-cols-3">
+            <Field
+              label={t('voiceSettings.sttModel')}
+              placeholder="whisper-1"
+              value={config.sttModel}
+              onChange={(e) => update({ sttModel: e.target.value })}
+            />
+            <Field
+              label={t('voiceSettings.ttsModel')}
+              placeholder="tts-1"
+              value={config.ttsModel}
+              onChange={(e) => update({ ttsModel: e.target.value })}
+            />
+            <Field
+              label={t('voiceSettings.voice')}
+              placeholder="alloy"
+              value={config.ttsVoice}
+              onChange={(e) => update({ ttsVoice: e.target.value })}
+            />
+          </div>
+          <p className="text-xs text-muted-foreground">{t('voiceSettings.note')}</p>
+        </div>
+      </SettingsSection>
+    </div>
+  );
+}
diff --git a/src/hooks/useUiPreferences.ts b/src/hooks/useUiPreferences.ts
index eb0b8339..342f1698 100644
--- a/src/hooks/useUiPreferences.ts
+++ b/src/hooks/useUiPreferences.ts
@@ -7,6 +7,7 @@ type UiPreferences = {
   autoScrollToBottom: boolean;
   sendByCtrlEnter: boolean;
   sidebarVisible: boolean;
+  voiceEnabled: boolean;
 };
 
 type UiPreferenceKey = keyof UiPreferences;
@@ -39,6 +40,7 @@ const DEFAULTS: UiPreferences = {
   autoScrollToBottom: true,
   sendByCtrlEnter: false,
   sidebarVisible: true,
+  voiceEnabled: false,
 };
 
 const PREFERENCE_KEYS = Object.keys(DEFAULTS) as UiPreferenceKey[];
diff --git a/src/hooks/useVoiceConfig.ts b/src/hooks/useVoiceConfig.ts
new file mode 100644
index 00000000..fa170bca
--- /dev/null
+++ b/src/hooks/useVoiceConfig.ts
@@ -0,0 +1,57 @@
+import { useState } from 'react';
+
+export type VoiceConfig = {
+  baseUrl: string;
+  apiKey: string;
+  sttModel: string;
+  ttsModel: string;
+  ttsVoice: string;
+};
+
+const STORAGE_KEY = 'voiceConfig';
+const DEFAULTS: VoiceConfig = { baseUrl: '', apiKey: '', sttModel: '', ttsModel: '', ttsVoice: '' };
+
+function read(): VoiceConfig {
+  try {
+    const raw = localStorage.getItem(STORAGE_KEY);
+    if (!raw) return { ...DEFAULTS };
+    const parsed = JSON.parse(raw);
+    return { ...DEFAULTS, ...(parsed && typeof parsed === 'object' ? parsed : {}) };
+  } catch {
+    return { ...DEFAULTS };
+  }
+}
+
+// Headers the voice proxy reads to target a per-user OpenAI-compatible backend.
+// Empty fields are omitted so the server's env defaults apply.
+export function voiceConfigHeaders(): Record<string, string> {
+  if (typeof window === 'undefined') return {};
+  const c = read();
+  const h: Record<string, string> = {};
+  if (c.baseUrl) h['x-voice-base-url'] = c.baseUrl;
+  if (c.apiKey) h['x-voice-api-key'] = c.apiKey;
+  if (c.sttModel) h['x-voice-stt-model'] = c.sttModel;
+  if (c.ttsModel) h['x-voice-tts-model'] = c.ttsModel;
+  if (c.ttsVoice) h['x-voice-tts-voice'] = c.ttsVoice;
+  return h;
+}
+
+export function useVoiceConfig() {
+  const [config, setConfig] = useState<VoiceConfig>(() =>
+    typeof window === 'undefined' ? { ...DEFAULTS } : read(),
+  );
+
+  const update = (patch: Partial<VoiceConfig>) => {
+    setConfig((prev) => {
+      const next = { ...prev, ...patch };
+      try {
+        localStorage.setItem(STORAGE_KEY, JSON.stringify(next));
+      } catch {
+        /* ignore persistence errors */
+      }
+      return next;
+    });
+  };
+
+  return { config, update };
+}
diff --git a/src/i18n/locales/en/settings.json b/src/i18n/locales/en/settings.json
index b80d17d2..d95151df 100644
--- a/src/i18n/locales/en/settings.json
+++ b/src/i18n/locales/en/settings.json
@@ -49,6 +49,20 @@
     "resetToDefaults": "Reset to Defaults",
     "cancelChanges": "Cancel Changes"
   },
+  "voiceSettings": {
+    "title": "Voice",
+    "description": "Speech-to-text input and read-aloud, via an OpenAI-compatible audio backend.",
+    "enable": "Enable voice",
+    "enableDescription": "Show the mic button and the read-aloud button on messages.",
+    "backendTitle": "Backend",
+    "backendDescription": "Point at OpenAI, Groq, or a local server (LocalAI, Speaches, Kokoro-FastAPI). Leave blank to use the server default.",
+    "baseUrl": "Base URL",
+    "apiKey": "API key",
+    "sttModel": "Speech-to-text model",
+    "ttsModel": "Text-to-speech model",
+    "voice": "Voice",
+    "note": "The shown defaults work with OpenAI once you add a key. For other providers, set the base URL and model names to match."
+  },
   "quickSettings": {
     "title": "Quick Settings",
     "sections": {
@@ -63,6 +77,7 @@
     "showThinking": "Show thinking",
     "autoScrollToBottom": "Auto-scroll to bottom",
     "sendByCtrlEnter": "Send by Ctrl+Enter",
+    "voiceEnabled": "Voice (mic + read aloud)",
     "sendByCtrlEnterDescription": "When enabled, pressing Ctrl+Enter will send the message instead of just Enter. This is useful for IME users to avoid accidental sends.",
     "dragHandle": {
       "dragging": "Dragging handle",
@@ -93,6 +108,7 @@
     "appearance": "Appearance",
     "git": "Git",
     "apiTokens": "API & Tokens",
+    "voice": "Voice",
     "tasks": "Tasks",
     "notifications": "Notifications",
     "plugins": "Plugins",
diff --git a/voice-sidecar/.env.example b/voice-sidecar/.env.example
deleted file mode 100644
index 92842059..00000000
--- a/voice-sidecar/.env.example
+++ /dev/null
@@ -1,14 +0,0 @@
-# Voice sidecar config (all optional — these are the defaults).
-# The sidecar binds 127.0.0.1 only; CloudCLI's Express proxy reaches it.
-
-# Port the sidecar listens on (CloudCLI reaches it via VOICE_SIDECAR_URL).
-VOICE_PORT=8765
-
-# faster-whisper model size: tiny | base | small | medium | large-v3
-WHISPER_MODEL_SIZE=base
-# cpu (int8, default) or cuda (float16, needs a CUDA torch in the venv)
-WHISPER_DEVICE=cpu
-
-# Kokoro voice (see https://github.com/hexgrad/kokoro for the full list) and language code.
-KOKORO_VOICE=af_heart
-KOKORO_LANG=a
diff --git a/voice-sidecar/app.py b/voice-sidecar/app.py
deleted file mode 100644
index 518f83bf..00000000
--- a/voice-sidecar/app.py
+++ /dev/null
@@ -1,187 +0,0 @@
-"""
-CloudCLI voice sidecar — local STT (faster-whisper) + local TTS (Kokoro-82M).
-
-Ported from the tooler voice endpoints (D:\\tooler\\backend\\server.py), swapping
-edge-tts -> Kokoro. Bound to 127.0.0.1 only; CloudCLI's Express server proxies to
-it behind JWT auth. Never exposed to the tailnet directly.
-
-Endpoints:
-  GET  /health           -> {status, whisper_loaded, kokoro_loaded}
-  POST /transcribe       (multipart 'audio')        -> {text, duration_ms}
-  POST /tts              (form 'text')              -> audio/wav bytes (cached)
-"""
-import asyncio
-import hashlib
-import logging
-import os
-import re
-import tempfile
-import time
-from pathlib import Path
-
-import numpy as np
-import soundfile as sf
-from fastapi import FastAPI, File, Form, HTTPException, UploadFile
-from fastapi.responses import Response
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger("voice-sidecar")
-
-# ---- Config (env-overridable) -------------------------------------------------
-PORT = int(os.getenv("VOICE_PORT", "8765"))
-WHISPER_MODEL_SIZE = os.getenv("WHISPER_MODEL_SIZE", "base")
-WHISPER_DEVICE = os.getenv("WHISPER_DEVICE", "cpu").lower()      # "cpu" | "cuda"
-KOKORO_VOICE = os.getenv("KOKORO_VOICE", "af_heart")
-KOKORO_LANG = os.getenv("KOKORO_LANG", "a")                      # 'a' = American English
-KOKORO_SR = 24000
-
-VOICE_DIR = Path(__file__).parent / "voice_messages"
-VOICE_DIR.mkdir(exist_ok=True)
-
-# ---- Lazy model singletons ----------------------------------------------------
-_whisper = None
-_whisper_lock = asyncio.Lock()
-_kpipe = None
-_kpipe_lock = asyncio.Lock()
-
-
-async def get_whisper():
-    global _whisper
-    if _whisper is not None:
-        return _whisper
-    async with _whisper_lock:
-        if _whisper is not None:
-            return _whisper
-
-        def _load():
-            from faster_whisper import WhisperModel
-            if WHISPER_DEVICE == "cuda":
-                try:
-                    logger.info("[WHISPER] loading on CUDA (float16)...")
-                    return WhisperModel(WHISPER_MODEL_SIZE, device="cuda", compute_type="float16")
-                except Exception as e:  # noqa: BLE001
-                    logger.warning("[WHISPER] CUDA failed (%s), falling back to CPU", e)
-            logger.info("[WHISPER] loading '%s' on CPU (int8)", WHISPER_MODEL_SIZE)
-            return WhisperModel(WHISPER_MODEL_SIZE, device="cpu", compute_type="int8")
-
-        _whisper = await asyncio.get_event_loop().run_in_executor(None, _load)
-        logger.info("[WHISPER] ready")
-        return _whisper
-
-
-async def get_kokoro():
-    global _kpipe
-    if _kpipe is not None:
-        return _kpipe
-    async with _kpipe_lock:
-        if _kpipe is not None:
-            return _kpipe
-
-        def _load():
-            from kokoro import KPipeline
-            logger.info("[KOKORO] loading pipeline (lang=%s)...", KOKORO_LANG)
-            return KPipeline(lang_code=KOKORO_LANG)
-
-        _kpipe = await asyncio.get_event_loop().run_in_executor(None, _load)
-        logger.info("[KOKORO] ready")
-        return _kpipe
-
-
-# ---- Text cleaning (ported verbatim from tooler prepare_text_for_tts) ---------
-def prepare_text_for_tts(text: str) -> str:
-    """Strip/transform markdown for natural speech."""
-    text = re.sub(r"```[\s\S]*?```", " code block ", text)   # code fences -> spoken stub
-    text = re.sub(r"`([^`]+)`", r"\1", text)                  # unwrap inline code
-    text = re.sub(r"\*\*([^*]+)\*\*", r"\1", text)            # bold
-    text = re.sub(r"\*([^*]+)\*", r"\1", text)                # italic
-    text = re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text)      # links -> link text
-    text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)  # headers
-    text = re.sub(r"\s+", " ", text).strip()
-    return text
-
-
-# ---- App ----------------------------------------------------------------------
-app = FastAPI(title="CloudCLI voice sidecar")
-
-
-@app.get("/health")
-async def health():
-    return {
-        "status": "ok",
-        "whisper_loaded": _whisper is not None,
-        "kokoro_loaded": _kpipe is not None,
-    }
-
-
-@app.post("/transcribe")
-async def transcribe(audio: UploadFile = File(...)):
-    start = time.time()
-    suffix = Path(audio.filename or "rec.webm").suffix or ".webm"
-    content = await audio.read()
-    logger.info("[STT] %d bytes (%s)", len(content), audio.content_type)
-
-    tmp_path = None
-    try:
-        with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
-            tmp.write(content)
-            tmp_path = tmp.name
-
-        model = await get_whisper()
-
-        def _run():
-            segments, _info = model.transcribe(tmp_path, beam_size=5)
-            return "".join(seg.text for seg in segments).strip()
-
-        text = await asyncio.get_event_loop().run_in_executor(None, _run)
-        duration_ms = int((time.time() - start) * 1000)
-        logger.info("[STT] %dms: %s", duration_ms, text[:100])
-        return {"text": text, "duration_ms": duration_ms}
-    except Exception as e:  # noqa: BLE001
-        logger.error("[STT] failed: %s", e, exc_info=True)
-        raise HTTPException(status_code=500, detail=f"Transcription failed: {e}")
-    finally:
-        if tmp_path and os.path.exists(tmp_path):
-            try:
-                os.unlink(tmp_path)
-            except OSError:
-                pass
-
-
-@app.post("/tts")
-async def tts(text: str = Form(...)):
-    if not text.strip():
-        raise HTTPException(status_code=400, detail="Text cannot be empty")
-    if len(text) > 8000:
-        raise HTTPException(status_code=400, detail="Text too long (max 8000 chars)")
-
-    start = time.time()
-    clean = prepare_text_for_tts(text)
-    # Cache on the RAW text hash (matches tooler) so identical messages reuse audio.
-    key = hashlib.sha256(text.encode()).hexdigest()[:16]
-    out_path = VOICE_DIR / f"{key}.wav"
-
-    if not out_path.exists():
-        try:
-            pipeline = await get_kokoro()
-
-            def _synth():
-                chunks = [audio for _gs, _ps, audio in pipeline(clean, voice=KOKORO_VOICE)]
-                if not chunks:
-                    raise RuntimeError("Kokoro produced no audio")
-                full = np.concatenate([np.asarray(c, dtype=np.float32) for c in chunks])
-                sf.write(str(out_path), full, KOKORO_SR)
-
-            await asyncio.get_event_loop().run_in_executor(None, _synth)
-            logger.info("[TTS] generated %s in %dms", out_path.name, int((time.time() - start) * 1000))
-        except Exception as e:  # noqa: BLE001
-            logger.error("[TTS] failed: %s", e, exc_info=True)
-            raise HTTPException(status_code=500, detail=f"TTS failed: {e}")
-    else:
-        logger.info("[TTS] cache hit %s", out_path.name)
-
-    return Response(content=out_path.read_bytes(), media_type="audio/wav")
-
-
-if __name__ == "__main__":
-    import uvicorn
-    uvicorn.run(app, host="127.0.0.1", port=PORT, log_level="info")
diff --git a/voice-sidecar/requirements.txt b/voice-sidecar/requirements.txt
deleted file mode 100644
index c37d56e9..00000000
--- a/voice-sidecar/requirements.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-# CloudCLI voice sidecar — STT (faster-whisper) + TTS (Kokoro-82M)
-fastapi>=0.110.0
-uvicorn[standard]>=0.27.0
-python-multipart>=0.0.9
-faster-whisper>=1.0.0
-kokoro>=0.9.4
-misaki[en]>=0.9.4
-soundfile>=0.12.1
-numpy>=1.26.0
diff --git a/voice-sidecar/test_smoke.py b/voice-sidecar/test_smoke.py
deleted file mode 100644
index 224729fe..00000000
--- a/voice-sidecar/test_smoke.py
+++ /dev/null
@@ -1,29 +0,0 @@
-"""Smoke test: Kokoro TTS -> faster-whisper STT round-trip."""
-import time
-import numpy as np
-import soundfile as sf
-
-PHRASE = "Hello, this is a test of the CloudCLI voice sidecar."
-
-print("[1/3] Loading Kokoro pipeline...")
-t = time.time()
-from kokoro import KPipeline
-pipe = KPipeline(lang_code="a")
-print(f"      loaded in {time.time()-t:.1f}s")
-
-print("[2/3] Synthesizing...")
-t = time.time()
-chunks = [audio for _gs, _ps, audio in pipe(PHRASE, voice="af_heart")]
-full = np.concatenate([np.asarray(c, dtype=np.float32) for c in chunks])
-sf.write("test.wav", full, 24000)
-dur = len(full) / 24000
-print(f"      synth {time.time()-t:.1f}s -> test.wav ({dur:.1f}s audio, {len(full)} samples)")
-
-print("[3/3] Transcribing back with faster-whisper (base, cpu int8)...")
-t = time.time()
-from faster_whisper import WhisperModel
-model = WhisperModel("base", device="cpu", compute_type="int8")
-segments, _info = model.transcribe("test.wav", beam_size=5)
-text = "".join(s.text for s in segments).strip()
-print(f"      transcribe {time.time()-t:.1f}s -> {text!r}")
-print("\nROUND-TRIP OK" if text else "\nROUND-TRIP PRODUCED NO TEXT")