Compare commits

..

12 Commits

Author SHA1 Message Date
Haileyesus
3bcb541560 fix(chat): stop orphaned tool results rendering as raw text during pagination
Sessions load in pages from the bottom up, so a loaded page often contains
a tool_result whose tool_use sits in an older, not-yet-loaded page. That
result wasn't recognized as attached, so it was pushed as a standalone
assistant message and its raw output rendered as unstyled Markdown. It only
"fixed itself" once the user scrolled up far enough to load the page with
the matching tool_use.

Skip results that have a toolId but no matching tool_use in the loaded set —
they attach and render correctly (inside their command row/group) once the
older page loads. Results with no toolId still render as before.
2026-06-28 18:45:22 +03:00
Haileyesus
fcc469b55c feat(chat): group shell commands interleaved within thinking mode 2026-06-28 18:34:15 +03:00
Haileyesus
2afe0955ed fix(chat): open file references in editor instead of new browser window
Clicking a file reference in a chat message (e.g. `useShellTerminal.ts`)
opened a new browser window because it was rendered as a plain anchor with
target="_blank" and an empty/relative href.

The markdown link renderer now intercepts file-path links — using the href,
or the link text when the href is empty — strips any `:line:col` suffix, and
opens the file in the in-app editor side panel while keeping the Chat tab
active (matching the inline edit view).

- useFileOpenResolver: resolves bare/partial references to real project
  files via the cached project file tree
- PaletteOpsContext: add `openFileInEditor` op that opens the editor without
  switching tabs
2026-06-28 18:30:29 +03:00
Haileyesus
c88baaf8dc feat(chat): render shell commands as collapsible Codex-style rows
Show Bash tool calls as a compact, single-line command with a chevron
that expands to reveal the output inline, instead of hiding successful
output and popping a separate red box on error.

- Add BashCommandDisplay: command row with $ prompt, status/spinner,
  line-count hint, copy button, and an inline output panel (errors
  auto-expand and tint red).
- Add CommandRunGroup: collapse 2+ consecutive commands under one
  "Ran N commands" header; expanding reveals each command, which stays
  independently expandable. Collapsed by default; opens on error.
- Group consecutive Bash runs in ChatMessagesPane and route single Bash
  calls through BashCommandDisplay in ToolRenderer.
- Suppress the duplicate generic result section for Bash in
  MessageComponent since output now lives in the command row.
- Theme-integrated surfaces (no hard black boxes), emerald accent,
  subtle motion, and clean focus states for a modern, uncluttered look.
2026-06-28 18:19:57 +03:00
Haileyesus
f8430dc886 fix(sidebar): remove horizontal scroll in conversation search view 2026-06-28 17:48:56 +03:00
Haileyesus
98a3a3a1f4 fix(sidebar): make sessions list hyperlinks 2026-06-28 17:33:17 +03:00
Haileyesus
2c08060f65 fix: enlarge language selector in quick settings panel 2026-06-28 17:22:02 +03:00
Haileyesus
75bbafb438 fix: minimize clutter in /models 2026-06-28 17:20:15 +03:00
Haileyesus
7c8928c66d fix: show provider icon 2026-06-28 17:19:53 +03:00
turato
ed4ae3114a fix(chat): prevent chat interface crash on malformed AskUserQuestion payload (#920)
* fix(chat): prevent chat interface crash when AskUserQuestion payload is malformed

Loading a session that contains an AskUserQuestion tool call could crash the
entire chat interface with "TypeError: e.map is not a function".

The AskUserQuestion tool is configured with `defaultOpen: true`, so
QuestionAnswerContent renders as soon as the session loads. Its array guard
(`!questions || questions.length === 0`) only checked for truthiness, and
`q.options` was mapped/iterated with no guard at all. When `questions` or
`options` arrive from the session transcript as a non-array value, the
`.map()` / `.some()` calls throw and take down the whole chat view via the
error boundary.

Guard both with `Array.isArray()` so a single malformed message can no longer
crash the interface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(chat): cover QuestionAnswerContent against malformed AskUserQuestion payloads

Adds the first frontend regression test, guarding the crash fixed in the
previous commit: a non-array `questions` value or a question missing its
`options` array must render gracefully instead of throwing
"e.map is not a function" and taking down the whole chat interface.

Follows the repo's existing test convention (node:test + tsx); uses
react-dom/server renderToStaticMarkup so no DOM/jsdom is required.
Run with: npx tsx --test src/**/QuestionAnswerContent.test.tsx

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(chat): harden QuestionAnswerContent against malformed question entries

Addresses review feedback: even with the array guards, a malformed transcript
could still crash before the options fallback ran —

- a `questions` entry that is null/non-object threw on `q.question` access
- a non-string `answers[q.question]` threw on `answer.split(', ')`

Skip entries that aren't a proper question object with a string prompt, and
only call string methods on the answer when it is actually a string. Extends
the regression test to cover both vectors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(chat): guard malformed question options

---------

Co-authored-by: hustuhao <hustuhao@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:47:24 +02:00
Haile
591e8e7642 fix: voice tts format settings (#919)
* feat(voice): add optional speech-to-text input and read-aloud TTS

Adds a push-to-talk mic button in the composer and a read-aloud button on
assistant messages. Both are opt-in and hidden unless a voice backend is
configured via VOICE_SIDECAR_URL.

The auth-gated /api/voice proxy forwards to a configurable backend exposing
/transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health
and hides the controls when disabled. Adds i18n keys and docs/voice.md.

Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper
for STT, Kokoro-82M for TTS, both CPU-capable).

* refactor(voice): provider-agnostic backend and in-app config

Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.

Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.

* fix(voice): relax backend timeout and surface timeout errors

Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can
synthesize long messages at roughly real-time, and returns a clear timed-out
message (504) instead of failing silently. The read-aloud button now shows
backend errors.

* fix(voice): play read-aloud through an app-level player to stop cutoffs

Read-aloud now runs in a single module-level player outside the React tree instead
of per-message component state. Switching chats or re-rendering a message no longer
revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so
re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once).

* fix(voice): address review (SSRF guard, auth mapping, client timeout)

Validates the user-supplied backend URL (http/https only, blocks the link-local
metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key
isn't read as the app's own auth failing; adds a client-side AbortController timeout
on the read-aloud request so the button can't sit in loading if a request stalls.

* docs(voice): provider-agnostic wording and jsdoc on proxy functions

drop leftover sidecar/faster-whisper references now that the backend is any
openai-compatible voice api, and add jsdoc to the voice-proxy functions so the
docstring coverage check passes.

* fix(voice): harden timeout parsing, tts input check, and player abort

- fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a
  bad override can't make the abort fire immediately
- type-check the tts `text` before calling .trim() so a non-string body returns
  400 instead of throwing
- abort the in-flight TTS fetch on stop() and on a superseding play, so tapping
  read-aloud repeatedly doesn't leave orphaned requests generating audio

* feat(voice): send transcript with the main send button while recording

while dictating, the main send button stops recording, transcribes, and sends
in one tap, matching the codex-style flow. the mic button still stops and drops
the transcript into the input box to edit before sending. voice recording state
is lifted into the composer so both buttons share it, and the send button is
enabled (not grayed) while recording. also fix a pre-existing type error: the
quick-settings preferences map was missing voiceEnabled.

* fix(voice): make stop() idempotent so a double tap can't throw

guard on the recorder's own state instead of react state, so a double tap or
the mic and send buttons both firing won't call stop() on an already-inactive
MediaRecorder.

* fix(voice): expose TTS format in user settings

* fix(voice): harden recording and backend behavior

Redirects could bypass the backend URL guard, and TTS playback waited for full buffering.

Recording could overlap or finish after teardown. Controls also ignored backend readiness.

Explicit formats and config-aware cache keys prevent stale audio after settings change.

* fix(voice): validate config and request boundaries

Malformed stored settings could break voice requests instead of using safe defaults.

Health results could outlive auth changes. URL checks also did not guard the fetch sink.

Remove constant recorder branches so lifecycle cancellation stays clear.

* fix(voice): separate client and server backends

User-selected backend URLs must remain usable without letting clients control server requests.

Call custom providers from the browser while keeping the server proxy bound to its configured host.

This restores voice controls for frontend settings without reopening the SSRF path.

* fix: hide voice options until enabled

---------

Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:06:40 +02:00
Haile
c947eaaee5 feat: play sound for pending tool requests (#918) 2026-06-25 14:57:10 +02:00
41 changed files with 1921 additions and 237 deletions

View File

@@ -61,6 +61,7 @@ import userRoutes from './routes/user.js';
import geminiRoutes from './routes/gemini.js';
import pluginsRoutes from './routes/plugins.js';
import providerRoutes from './modules/providers/provider.routes.js';
import voiceRoutes from './voice-proxy.js';
import browserUseRoutes from './modules/browser-use/browser-use.routes.js';
import browserUseMcpRoutes from './modules/browser-use/browser-use-mcp.routes.js';
import { browserUseService } from './modules/browser-use/browser-use.service.js';
@@ -222,6 +223,8 @@ app.use('/api/providers', authenticateToken, providerRoutes);
// Agent API Routes (uses API key authentication)
app.use('/api/agent', agentRoutes);
app.use('/api/voice', authenticateToken, voiceRoutes);
// Serve public files (like api-docs.html)
app.use(express.static(path.join(APP_ROOT, 'public')));

224
server/voice-proxy.js Normal file
View File

@@ -0,0 +1,224 @@
// Optional voice proxy — forwards STT/TTS to an OpenAI-compatible audio backend.
//
// The backend is whatever the user points at: OpenAI, Groq, or a local server
// (LocalAI / Speaches / Kokoro-FastAPI / openedai-speech / etc.). It must expose the
// standard OpenAI audio endpoints:
// POST {base}/audio/transcriptions (multipart 'file' + 'model') -> { text }
// POST {base}/audio/speech ({ model, voice, input }) -> audio bytes
//
// Config is resolved per-request from headers (set by the client's voice settings),
// falling back to server env defaults. Mounted at /api/voice behind authenticateToken.
import { Readable } from 'node:stream';
import express from 'express';
const ENV = {
baseUrl: (process.env.VOICE_API_BASE_URL || '').replace(/\/$/, ''),
apiKey: process.env.VOICE_API_KEY || '',
sttModel: process.env.VOICE_STT_MODEL || 'whisper-1',
ttsModel: process.env.VOICE_TTS_MODEL || 'tts-1',
ttsVoice: process.env.VOICE_TTS_VOICE || 'alloy',
};
/**
* Resolve the voice backend config for a request. Client headers (set from the
* user's in-app voice settings) take precedence over the server env defaults.
* @param {import('express').Request} req
* @returns {{baseUrl: string, apiKey: string, sttModel: string, ttsModel: string, ttsVoice: string, ttsFormat: string}}
*/
function resolveConfig(req) {
const h = req.headers;
return {
// Security: do not allow clients to control the outbound backend host.
// Always use the server-side configured base URL.
baseUrl: ENV.baseUrl,
apiKey: String(h['x-voice-api-key'] || '') || ENV.apiKey,
sttModel: String(h['x-voice-stt-model'] || '') || ENV.sttModel,
ttsModel: String(h['x-voice-tts-model'] || '') || ENV.ttsModel,
ttsVoice: String(h['x-voice-tts-voice'] || '') || ENV.ttsVoice,
ttsFormat: String(h['x-voice-tts-format'] || '').trim(),
};
}
const router = express.Router();
// Generous by default — local TTS can synthesize long messages at ~real-time on CPU.
// Guard against a non-numeric/zero override that would make setTimeout fire immediately.
const DEFAULT_VOICE_TIMEOUT_MS = 300000;
const _parsedTimeout = Number(process.env.VOICE_TIMEOUT_MS);
const VOICE_TIMEOUT_MS = Number.isFinite(_parsedTimeout) && _parsedTimeout > 0
? _parsedTimeout
: DEFAULT_VOICE_TIMEOUT_MS;
/**
* fetch() with an AbortController timeout so a stalled backend can't hold the
* request open indefinitely. Aborts after VOICE_TIMEOUT_MS.
* @param {string} url
* @param {RequestInit} [options]
* @returns {Promise<Response>}
*/
async function fetchWithTimeout(url, options = {}) {
const parsed = new URL(url);
if (!['http:', 'https:'].includes(parsed.protocol) || !isAllowedBackendUrl(parsed.origin)) {
throw new Error('Blocked outbound voice backend URL');
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), VOICE_TIMEOUT_MS);
try {
return await fetch(parsed.toString(), { redirect: 'manual', ...options, signal: controller.signal });
} finally {
clearTimeout(timer);
}
}
/**
* Turn a backend fetch failure into a clear, actionable client response:
* 504 on timeout (AbortError), 502 otherwise.
* @param {import('express').Response} res
* @param {Error} e
*/
function backendError(res, e) {
if (e && e.name === 'AbortError') {
return res.status(504).json({
error: `Voice backend timed out after ${Math.round(VOICE_TIMEOUT_MS / 1000)}s. Check your voice backend.`,
});
}
return res.status(502).json({ error: `Voice backend unreachable: ${e.message}` });
}
/**
* SSRF guard for the user-configurable backend URL: allow http/https only and
* block the link-local / cloud-metadata range (169.254.x). localhost and private
* ranges are allowed on purpose so users can point at a local voice server
* (LocalAI, Speaches, Kokoro-FastAPI, etc.).
* @param {string} raw
* @returns {boolean}
*/
function isAllowedBackendUrl(raw) {
let u;
try {
u = new URL(raw);
} catch {
return false;
}
if (u.protocol !== 'http:' && u.protocol !== 'https:') return false;
if (u.hostname === '169.254.169.254' || u.hostname.startsWith('169.254.')) return false;
return true;
}
/**
* Relay an upstream (backend) error to the client without making an upstream
* 401/403 look like the user's own app login failed.
* @param {import('express').Response} res
* @param {number} status
* @param {string} [text]
*/
function upstreamError(res, status, text) {
if (status === 401 || status === 403) {
return res.status(502).json({ error: 'Voice backend rejected the request (check the API key).' });
}
return res.status(status).json({ error: text || 'voice backend error' });
}
let _upload = null;
/**
* Lazily build a memory-storage multer instance (25 MB cap) for audio uploads,
* so multer is only imported when the voice feature is actually used.
* @returns {Promise<import('multer').Multer>}
*/
async function getUpload() {
if (!_upload) {
const multer = (await import('multer')).default;
_upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 25 * 1024 * 1024 } });
}
return _upload;
}
/**
* Build the Authorization header for the backend, or an empty object when no
* key is configured (e.g. a local server that needs none).
* @param {string} apiKey
* @returns {Record<string, string>}
*/
function authHeader(apiKey) {
return apiKey ? { Authorization: `Bearer ${apiKey}` } : {};
}
/**
* GET /api/voice/health -> { configured } (true when a backend base URL is set).
*/
router.get('/health', (req, res) => {
res.json({ configured: Boolean(resolveConfig(req).baseUrl) });
});
/**
* POST /api/voice/transcribe (multipart 'audio') -> { text }.
* Forwards the uploaded audio to the backend's /audio/transcriptions endpoint.
*/
router.post('/transcribe', async (req, res) => {
const cfg = resolveConfig(req);
if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
if (!isAllowedBackendUrl(cfg.baseUrl)) return res.status(400).json({ error: 'Invalid voice backend URL.' });
const upload = await getUpload();
upload.single('audio')(req, res, async (err) => {
if (err) return res.status(400).json({ error: err.message });
if (!req.file) return res.status(400).json({ error: 'No audio uploaded' });
try {
const fd = new FormData();
fd.append(
'file',
new Blob([req.file.buffer], { type: req.file.mimetype || 'audio/webm' }),
req.file.originalname || 'recording.webm',
);
fd.append('model', cfg.sttModel);
const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/transcriptions`, {
method: 'POST',
headers: authHeader(cfg.apiKey),
body: fd,
});
const text = await r.text();
if (!r.ok) return upstreamError(res, r.status, text);
let data;
try { data = JSON.parse(text); } catch { data = { text }; }
res.json({ text: data.text ?? '' });
} catch (e) {
backendError(res, e);
}
});
});
/**
* POST /api/voice/tts { text } -> audio bytes.
* Forwards the text to the backend's /audio/speech endpoint and streams the audio back.
*/
router.post('/tts', async (req, res) => {
const cfg = resolveConfig(req);
if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
if (!isAllowedBackendUrl(cfg.baseUrl)) return res.status(400).json({ error: 'Invalid voice backend URL.' });
const text = req.body?.text;
if (typeof text !== 'string' || !text.trim()) return res.status(400).json({ error: 'text required' });
try {
const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/speech`, {
method: 'POST',
headers: { 'Content-Type': 'application/json', ...authHeader(cfg.apiKey) },
body: JSON.stringify({
model: cfg.ttsModel,
voice: cfg.ttsVoice,
input: text,
...(cfg.ttsFormat ? { response_format: cfg.ttsFormat } : {}),
}),
});
if (!r.ok) {
const errText = await r.text().catch(() => 'tts failed');
return upstreamError(res, r.status, errText);
}
res.setHeader('Content-Type', r.headers.get('content-type') || 'audio/mpeg');
res.setHeader('Cache-Control', 'no-store');
if (!r.body) return res.end();
Readable.fromWeb(r.body).on('error', (error) => res.destroy(error)).pipe(res);
} catch (e) {
backendError(res, e);
}
});
export default router;

View File

@@ -775,6 +775,17 @@ export function useChatComposerState({
handleSubmitRef.current = handleSubmit;
}, [handleSubmit]);
// A voice transcript either fills the input (to edit before sending) or, when the
// user tapped "stop and send", is submitted straight away. Mirror the value into
// inputValueRef synchronously so handleSubmit reads the new text, not the stale state.
const handleVoiceTranscript = useCallback((text: string, send?: boolean) => {
const base = inputValueRef.current.trim();
const next = base ? `${base} ${text}` : text;
setInput(next);
inputValueRef.current = next;
if (send) handleSubmitRef.current?.(createFakeSubmitEvent());
}, [setInput]);
useEffect(() => {
inputValueRef.current = input;
}, [input]);
@@ -1013,6 +1024,7 @@ export function useChatComposerState({
isDragActive,
openImagePicker: open,
handleSubmit,
handleVoiceTranscript,
handleInputChange,
handleKeyDown,
handlePaste,

View File

@@ -207,6 +207,15 @@ export function normalizedToChatMessages(messages: NormalizedMessage[]): ChatMes
break;
}
// A result with a toolId but no matching tool_use in the loaded set is
// almost always a tool_use/tool_result pair split across a pagination
// boundary (older page not loaded yet). Rendering its raw content here
// produces an unstyled dump that "fixes itself" once the older page
// loads; skip it and let it attach to its tool_use when that arrives.
if (msg.toolId) {
break;
}
const content = formatToolResultContent(msg.content || '');
if (!content.trim()) {
break;

View File

@@ -0,0 +1,33 @@
import { useCallback, useEffect, useState } from 'react';
import { voicePlayer, voiceId, type VoiceSnapshot } from '../../../lib/voicePlayer';
export type TtsState = VoiceSnapshot['state'];
/**
* Thin adapter over the app-level voicePlayer. Playback lives outside React (see
* lib/voicePlayer), so switching chats or re-rendering a message no longer cuts the
* audio off. This hook just reflects the player's state for one message and forwards taps.
*/
export function useTts(getText: () => string) {
const content = getText();
const id = voiceId(content);
const [snap, setSnap] = useState<VoiceSnapshot>(() => voicePlayer.getSnapshot(id));
useEffect(() => {
const update = () =>
setSnap((prev) => {
const next = voicePlayer.getSnapshot(id);
return prev.state === next.state && prev.error === next.error ? prev : next;
});
update();
return voicePlayer.subscribe(update);
}, [id]);
const toggle = useCallback(() => {
voicePlayer.unlock(); // synchronous, within the click gesture (iOS)
voicePlayer.toggle(content);
}, [content]);
return { state: snap.state, toggle, error: snap.error };
}

View File

@@ -0,0 +1,85 @@
import { useEffect, useState } from 'react';
import { authenticatedFetch } from '../../../utils/api';
import { readVoiceConfig, VOICE_CONFIG_SYNC_EVENT } from '../../../hooks/useVoiceConfig';
// Voice UI is gated on the `voiceEnabled` UI preference (toggled in Quick Settings /
// the Settings modal) and a configured voice backend.
const STORAGE_KEY = 'uiPreferences';
const SYNC_EVENT = 'ui-preferences:sync';
let healthRequest: Promise<boolean> | null = null;
function checkVoiceHealth(): Promise<boolean> {
if (healthRequest) return healthRequest;
const request = authenticatedFetch('/api/voice/health')
.then(async (response) => {
if (!response.ok) throw new Error(`Voice health check failed (${response.status})`);
const data = await response.json();
return data?.configured === true;
})
.finally(() => {
healthRequest = null;
});
healthRequest = request;
return request;
}
function readVoiceEnabled(): boolean {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return false;
const parsed = JSON.parse(raw);
return parsed?.voiceEnabled === true || parsed?.voiceEnabled === 'true';
} catch {
return false;
}
}
export function useVoiceAvailable(): boolean {
const [enabled, setEnabled] = useState<boolean>(() =>
typeof window === 'undefined' ? false : readVoiceEnabled(),
);
const [available, setAvailable] = useState(false);
useEffect(() => {
const update = () => setEnabled(readVoiceEnabled());
window.addEventListener('storage', update);
window.addEventListener(SYNC_EVENT, update as EventListener);
return () => {
window.removeEventListener('storage', update);
window.removeEventListener(SYNC_EVENT, update as EventListener);
};
}, []);
useEffect(() => {
let active = true;
let requestId = 0;
const check = async () => {
if (!enabled) {
setAvailable(false);
return;
}
if (readVoiceConfig().baseUrl.trim()) {
setAvailable(true);
return;
}
const id = ++requestId;
try {
const result = await checkVoiceHealth();
if (active && id === requestId) setAvailable(result);
} catch {
if (active && id === requestId) setAvailable(false);
}
};
void check();
window.addEventListener(VOICE_CONFIG_SYNC_EVENT, check);
return () => {
active = false;
window.removeEventListener(VOICE_CONFIG_SYNC_EVENT, check);
};
}, [enabled]);
return enabled && available;
}

View File

@@ -0,0 +1,149 @@
import { useCallback, useEffect, useRef, useState } from 'react';
import { transcribeVoice } from '../../../lib/voiceApi';
// Mobile-safe recording: iOS Safari 18.4+ supports webm/opus; older iOS needs mp4.
const MIME_CANDIDATES = [
'audio/webm;codecs=opus',
'audio/webm',
'audio/mp4',
'audio/ogg;codecs=opus',
'audio/ogg',
];
function pickMime(): string {
for (const t of MIME_CANDIDATES) {
try {
if (typeof MediaRecorder !== 'undefined' && MediaRecorder.isTypeSupported(t)) return t;
} catch {
/* isTypeSupported can throw on some iOS versions */
}
}
return '';
}
export type VoiceInputState = 'idle' | 'recording' | 'transcribing';
/**
* Push-to-talk dictation. Records the mic, uploads to /api/voice/transcribe
* (an OpenAI-compatible speech-to-text backend via the Express proxy), and
* returns the transcript through onTranscript.
*/
export function useVoiceInput(
onTranscript: (text: string, send?: boolean) => void,
onError?: (msg: string) => void,
) {
const [state, setState] = useState<VoiceInputState>('idle');
const recorderRef = useRef<MediaRecorder | null>(null);
const chunksRef = useRef<Blob[]>([]);
const streamRef = useRef<MediaStream | null>(null);
const cancelledRef = useRef(false);
const startingRef = useRef(false);
// Whether the in-progress stop should auto-send the transcript (vs just fill the box).
const sendRef = useRef(false);
const stopTracks = () => {
streamRef.current?.getTracks().forEach((t) => t.stop());
streamRef.current = null;
};
// Stop the mic if the component unmounts mid-recording.
useEffect(() => {
cancelledRef.current = false;
return () => {
cancelledRef.current = true;
startingRef.current = false;
streamRef.current?.getTracks().forEach((t) => t.stop());
streamRef.current = null;
recorderRef.current = null;
};
}, []);
const start = useCallback(async () => {
if (startingRef.current || (recorderRef.current && recorderRef.current.state !== 'inactive')) return;
startingRef.current = true;
try {
const stream = await navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true },
});
if (cancelledRef.current) {
stream.getTracks().forEach((t) => t.stop());
return;
}
streamRef.current = stream;
const mimeType = pickMime();
const rec = mimeType ? new MediaRecorder(stream, { mimeType }) : new MediaRecorder(stream);
recorderRef.current = rec;
chunksRef.current = [];
rec.ondataavailable = (e) => {
if (e.data.size > 0) chunksRef.current.push(e.data);
};
rec.onstop = async () => {
stopTracks();
if (cancelledRef.current) return;
// Capture and clear the send intent for this stop before any async work.
const shouldSend = sendRef.current;
sendRef.current = false;
const type = rec.mimeType || 'audio/webm';
const blob = new Blob(chunksRef.current, { type });
if (blob.size < 800) {
setState('idle');
onError?.('Recording too short');
return;
}
setState('transcribing');
try {
const ext = type.includes('mp4') ? 'm4a' : type.includes('ogg') ? 'ogg' : 'webm';
const res = await transcribeVoice(blob, `recording.${ext}`);
if (!res.ok) throw new Error(`transcribe ${res.status}`);
const data = await res.json();
if (cancelledRef.current) return;
const text = String(data?.text || '').trim();
if (text) onTranscript(text, shouldSend);
else onError?.('No speech detected');
} catch (e) {
if (!cancelledRef.current) {
onError?.(`Transcription failed: ${e instanceof Error ? e.message : String(e)}`);
}
} finally {
if (!cancelledRef.current) setState('idle');
}
};
rec.start();
setState('recording');
} catch (e) {
recorderRef.current = null;
stopTracks();
if (cancelledRef.current) return;
const err = e as { name?: string; message?: string };
let msg = `Mic error: ${err?.message || e}`;
if (err?.name === 'NotAllowedError') msg = 'Microphone access denied.';
else if (err?.name === 'NotFoundError') msg = 'No microphone found.';
onError?.(msg);
setState('idle');
} finally {
startingRef.current = false;
}
}, [onTranscript, onError]);
// Stop recording. Pass { send: true } to auto-send the transcript once it's ready.
// Guard on the recorder's own state (not React state) so a double tap, or the mic
// and Send buttons both firing, can't call stop() on an already-inactive recorder.
const stop = useCallback((opts?: { send?: boolean }) => {
const rec = recorderRef.current;
if (rec && rec.state !== 'inactive') {
sendRef.current = opts?.send ?? false;
rec.stop();
}
}, []);
const toggle = useCallback(() => {
if (state === 'recording') stop();
else if (state === 'idle') start();
}, [state, start, stop]);
return { state, toggle, stop };
}

View File

@@ -4,7 +4,7 @@ import type { Project } from '../../../types/app';
import type { SubagentChildTool } from '../types/types';
import { getToolConfig } from './configs/toolConfigs';
import { OneLineDisplay, CollapsibleDisplay, ToolDiffViewer, MarkdownContent, FileListContent, TodoListContent, TaskListContent, TextContent, QuestionAnswerContent, SubagentContainer } from './components';
import { OneLineDisplay, BashCommandDisplay, CollapsibleDisplay, ToolDiffViewer, MarkdownContent, FileListContent, TodoListContent, TaskListContent, TextContent, QuestionAnswerContent, SubagentContainer } from './components';
import { PlanDisplay } from './components/PlanDisplay';
import { ToolStatusBadge } from './components/ToolStatusBadge';
import type { ToolStatus } from './components/ToolStatusBadge';
@@ -125,6 +125,31 @@ export const ToolRenderer: React.FC<ToolRendererProps> = memo(({
if (!displayConfig) return null;
// Bash renders as a Codex-style command row: the command on a single line with
// a chevron that expands to show the output inline. The combined view lives on
// the input render; the separate result section is suppressed in MessageComponent.
if (toolName === 'Bash' && mode === 'input') {
const command = parsedData?.command || '';
const description = parsedData?.description;
const output = typeof toolResult?.content === 'string'
? toolResult.content
: toolResult?.content != null
? String(toolResult.content)
: '';
return (
<BashCommandDisplay
command={command}
description={description}
output={output}
isError={Boolean(toolResult?.isError)}
status={toolStatus !== 'completed' ? toolStatus : undefined}
// Commands stay collapsed by default (even consecutive ones); only
// failures auto-expand so they remain visible.
defaultOpen={false}
/>
);
}
if (displayConfig.type === 'one-line') {
const value = displayConfig.getValue?.(parsedData) || '';
const secondary = displayConfig.getSecondary?.(parsedData);

View File

@@ -0,0 +1,155 @@
import React, { useEffect, useRef, useState } from 'react';
import { ChevronRight, Copy, Check } from 'lucide-react';
import { cn } from '../../../../lib/utils';
import { copyTextToClipboard } from '../../../../utils/clipboard';
import { ToolStatusBadge } from './ToolStatusBadge';
import type { ToolStatus } from './ToolStatusBadge';
interface BashCommandDisplayProps {
command: string;
description?: string;
/** Combined stdout/stderr from the tool result (empty while running). */
output?: string;
isError?: boolean;
status?: ToolStatus;
defaultOpen?: boolean;
}
/**
* Codex-in-VSCode style command row: a compact, single-line command with a
* chevron on the left. When the command produced output, the row becomes a
* dropdown that expands to reveal the output inline. Theme-integrated surfaces
* keep it clean in both light and dark mode; consecutive commands stack tightly
* into a clean list.
*/
export const BashCommandDisplay: React.FC<BashCommandDisplayProps> = ({
command,
description,
output,
isError = false,
status,
defaultOpen = false,
}) => {
const trimmedOutput = (output || '').replace(/\s+$/, '');
const hasOutput = trimmedOutput.length > 0;
const outputLineCount = hasOutput ? trimmedOutput.split('\n').length : 0;
const isRunning = status === 'running';
const [open, setOpen] = useState(false);
const [copied, setCopied] = useState(false);
// Output (and errors) often arrive after this component first mounts, so apply
// the auto-open intent once when there is finally something to show. After that
// the user is in control of the toggle.
const autoAppliedRef = useRef(false);
useEffect(() => {
if (!autoAppliedRef.current && hasOutput && (defaultOpen || isError)) {
autoAppliedRef.current = true;
setOpen(true);
}
}, [hasOutput, defaultOpen, isError]);
const toggle = () => {
if (hasOutput) {
setOpen((prev) => !prev);
}
};
const handleCopy = async (event: React.MouseEvent) => {
event.stopPropagation();
const didCopy = await copyTextToClipboard(command);
if (!didCopy) return;
setCopied(true);
setTimeout(() => setCopied(false), 2000);
};
return (
<div
className={cn(
'group/cmd overflow-hidden rounded-lg border bg-muted/40 backdrop-blur-sm transition-all duration-200',
isError ? 'border-red-500/30' : 'border-border/60',
hasOutput && !open && 'hover:border-border hover:bg-muted/60',
open && 'bg-muted/50 shadow-sm',
)}
>
{/* Command header — clickable when there is output to expand */}
<div
role={hasOutput ? 'button' : undefined}
tabIndex={hasOutput ? 0 : undefined}
aria-expanded={hasOutput ? open : undefined}
onClick={toggle}
onKeyDown={(event) => {
if (hasOutput && (event.key === 'Enter' || event.key === ' ')) {
event.preventDefault();
toggle();
}
}}
className={cn(
'flex items-center gap-2 px-2.5 py-1.5 outline-none',
hasOutput && 'cursor-pointer focus-visible:ring-1 focus-visible:ring-ring',
)}
>
<ChevronRight
className={cn(
'h-3.5 w-3.5 flex-shrink-0 text-muted-foreground/70 transition-transform duration-200',
open && 'rotate-90',
!hasOutput && 'opacity-0',
)}
/>
<span className="flex-shrink-0 select-none font-mono text-xs font-semibold text-emerald-500 dark:text-emerald-400">
$
</span>
<code
className={cn(
'min-w-0 flex-1 font-mono text-xs text-foreground',
open ? 'whitespace-pre-wrap break-all' : 'truncate',
)}
>
{command}
</code>
{isRunning && (
<span className="h-2.5 w-2.5 flex-shrink-0 animate-spin rounded-full border-[1.5px] border-muted-foreground/30 border-t-emerald-400" />
)}
{status && status !== 'running' && <ToolStatusBadge status={status} className="flex-shrink-0" />}
{!open && hasOutput && !isRunning && (
<span className="flex-shrink-0 text-[10px] tabular-nums text-muted-foreground/70 transition-opacity group-hover/cmd:opacity-0">
{outputLineCount} {outputLineCount === 1 ? 'line' : 'lines'}
</span>
)}
<button
onClick={handleCopy}
className="flex-shrink-0 rounded p-0.5 text-muted-foreground/60 opacity-0 transition-all hover:bg-foreground/10 hover:text-foreground focus:opacity-100 group-hover/cmd:opacity-100"
title="Copy command"
aria-label="Copy command"
>
{copied ? <Check className="h-3.5 w-3.5 text-emerald-500" /> : <Copy className="h-3.5 w-3.5" />}
</button>
</div>
{description && !open && (
<div className="truncate px-2.5 pb-1.5 pl-[2.4rem] text-[11px] italic text-muted-foreground/70">
{description}
</div>
)}
{/* Expanded output */}
{open && hasOutput && (
<div className="settings-content-enter border-t border-border/50 bg-background/50">
{description && (
<div className="px-3 pt-2 text-[11px] italic text-muted-foreground/70">{description}</div>
)}
<pre
className={cn(
'max-h-80 overflow-auto whitespace-pre-wrap break-all px-3 py-2 font-mono text-xs leading-relaxed',
isError ? 'text-red-600 dark:text-red-400' : 'text-muted-foreground',
)}
>
{trimmedOutput}
</pre>
</div>
)}
</div>
);
};

View File

@@ -0,0 +1,124 @@
import React, { useEffect, useRef, useState } from 'react';
import { ChevronRight, Terminal } from 'lucide-react';
import { cn } from '../../../../lib/utils';
import type { ChatMessage } from '../../types/types';
import { BashCommandDisplay } from './BashCommandDisplay';
import { ToolStatusBadge } from './ToolStatusBadge';
import type { ToolStatus } from './ToolStatusBadge';
interface CommandRunGroupProps {
messages: ChatMessage[];
}
type ExtractedCommand = {
key: string;
command: string;
description?: string;
output: string;
isError: boolean;
status: ToolStatus;
};
function extractCommand(message: ChatMessage, index: number): ExtractedCommand {
let command = '';
let description: string | undefined;
try {
const parsed =
typeof message.toolInput === 'string' ? JSON.parse(message.toolInput) : message.toolInput;
command = parsed?.command || '';
description = parsed?.description;
} catch {
command = typeof message.toolInput === 'string' ? message.toolInput : '';
}
const result = message.toolResult;
const rawContent = result?.content;
const output =
typeof rawContent === 'string' ? rawContent : rawContent != null ? String(rawContent) : '';
const isError = Boolean(result?.isError);
const status: ToolStatus = !result ? 'running' : isError ? 'error' : 'completed';
return {
key: message.toolId || `${command}-${index}`,
command,
description,
output,
isError,
status,
};
}
/**
* Groups a run of consecutive shell commands under a single collapsible header
* (Codex-in-VSCode style). Collapsed by default so long command runs stay tidy;
* expanding reveals every command in the run, each independently expandable for
* its own output.
*/
export const CommandRunGroup: React.FC<CommandRunGroupProps> = ({ messages }) => {
const commands = messages.map(extractCommand);
const count = commands.length;
const anyRunning = commands.some((c) => c.status === 'running');
const anyError = commands.some((c) => c.isError);
const [open, setOpen] = useState(false);
// Surface failed runs without a click: open once when an error first appears.
const autoAppliedRef = useRef(false);
useEffect(() => {
if (!autoAppliedRef.current && anyError) {
autoAppliedRef.current = true;
setOpen(true);
}
}, [anyError]);
return (
<div
className={cn(
'overflow-hidden rounded-xl border bg-muted/30 transition-all duration-200',
anyError ? 'border-red-500/30' : 'border-border/60',
open && 'shadow-sm',
)}
>
<button
type="button"
aria-expanded={open}
onClick={() => setOpen((prev) => !prev)}
className="flex w-full items-center gap-2.5 px-3 py-2 text-left outline-none transition-colors hover:bg-muted/50 focus-visible:ring-1 focus-visible:ring-ring"
>
<ChevronRight
className={cn(
'h-4 w-4 flex-shrink-0 text-muted-foreground/70 transition-transform duration-200',
open && 'rotate-90',
)}
/>
<span className="grid h-6 w-6 flex-shrink-0 place-items-center rounded-md bg-emerald-500/10 text-emerald-600 dark:text-emerald-400">
<Terminal className="h-3.5 w-3.5" />
</span>
<span className="flex-1 text-xs font-medium text-foreground">
{anyRunning ? 'Running' : 'Ran'} <span className="text-muted-foreground">{count} commands</span>
</span>
{anyRunning && (
<span className="h-2.5 w-2.5 flex-shrink-0 animate-spin rounded-full border-[1.5px] border-muted-foreground/30 border-t-emerald-400" />
)}
{anyError && <ToolStatusBadge status="error" className="flex-shrink-0" />}
</button>
{open && (
<div className="settings-content-enter space-y-1 border-t border-border/50 p-2">
{commands.map((cmd) => (
<BashCommandDisplay
key={cmd.key}
command={cmd.command}
description={cmd.description}
output={cmd.output}
isError={cmd.isError}
status={cmd.status !== 'completed' ? cmd.status : undefined}
defaultOpen={false}
/>
))}
</div>
)}
</div>
);
};

View File

@@ -0,0 +1,77 @@
import test from 'node:test';
import assert from 'node:assert/strict';
import React from 'react';
import { renderToStaticMarkup } from 'react-dom/server';
import { QuestionAnswerContent } from './QuestionAnswerContent';
// Regression coverage for the chat-interface crash where an AskUserQuestion
// payload loaded from a session transcript arrives with a non-array `questions`
// or a question missing its `options` array. Rendering must degrade gracefully
// instead of throwing "TypeError: e.map is not a function".
test('renders without throwing when questions is a non-array value', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
// Malformed: object instead of an array
questions: { 0: { question: 'q?', options: [{ label: 'a' }] } } as never,
answers: {},
}),
);
});
});
test('renders without throwing when a question is missing options[]', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', header: 'H' } as never],
answers: { 'Pick one?': 'X' },
}),
);
});
});
test('renders without throwing when options[] contains malformed entries', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', options: [null, 'oops', { label: 'A' }] } as never],
answers: { 'Pick one?': 'A, Custom' },
}),
);
});
});
test('renders without throwing when a questions entry is null/non-object', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [null, 'oops', { question: 'Ok?', options: [{ label: 'A' }] }] as never,
answers: {},
}),
);
});
});
test('renders without throwing when an answer is a non-string value', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', options: [{ label: 'A' }] }],
// Malformed: answer is an object instead of the expected string
answers: { 'Pick one?': { unexpected: true } } as never,
}),
);
});
});
test('still renders a well-formed question + answer', () => {
const html = renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', header: 'H', options: [{ label: 'A' }, { label: 'B' }] }],
answers: { 'Pick one?': 'A' },
}),
);
assert.ok(html.includes('Pick one?'));
});

View File

@@ -15,7 +15,11 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
}) => {
const [expandedIdx, setExpandedIdx] = useState<number | null>(null);
if (!questions || questions.length === 0) {
// Tool inputs are runtime data loaded from session transcripts and may be
// malformed (e.g. `questions` arriving as a non-array). Guard with
// Array.isArray so a single bad payload can't crash the whole chat view
// with "e.map is not a function".
if (!Array.isArray(questions) || questions.length === 0) {
return null;
}
@@ -24,11 +28,23 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
return (
<div className={`space-y-2 ${className}`}>
{questions.map((q, idx) => {
{questions.map((rawQuestion, idx) => {
// Entries come from session transcripts and may be malformed; skip
// anything that isn't a proper question object with a string prompt.
if (!rawQuestion || typeof rawQuestion !== 'object' || typeof rawQuestion.question !== 'string') {
return null;
}
const q = rawQuestion;
const answer = answers?.[q.question];
const answerLabels = answer ? answer.split(', ') : [];
// `answer` may be a non-string (or absent) in malformed payloads.
const answerLabels = typeof answer === 'string' ? answer.split(', ') : [];
const skipped = !answer;
const isExpanded = expandedIdx === idx;
// `options` is typed as an array but comes from untrusted runtime data;
// keep only valid entries so `.some`/`.map` below never throw.
const options = Array.isArray(q.options)
? q.options.filter((opt) => opt && typeof opt === 'object' && typeof opt.label === 'string')
: [];
return (
<div
@@ -74,7 +90,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
{!isExpanded && answerLabels.length > 0 && (
<div className="mt-1.5 flex flex-wrap gap-1">
{answerLabels.map((lbl) => {
const isCustom = !q.options.some(o => o.label === lbl);
const isCustom = !options.some(o => o.label === lbl);
return (
<span
key={lbl}
@@ -110,7 +126,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
{isExpanded && (
<div className="border-t border-gray-100 px-3 pb-2.5 pt-0.5 dark:border-gray-700/40">
<div className="ml-6.5 space-y-1">
{q.options.map((opt) => {
{options.map((opt) => {
const wasSelected = answerLabels.includes(opt.label);
return (
<div
@@ -148,7 +164,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
);
})}
{answerLabels.filter(lbl => !q.options.some(o => o.label === lbl)).map(lbl => (
{answerLabels.filter(lbl => !options.some(o => o.label === lbl)).map(lbl => (
<div
key={lbl}
className="flex items-start gap-2 rounded-lg border border-blue-200/60 bg-blue-50/80 px-2.5 py-1.5 text-[12px] dark:border-blue-800/40 dark:bg-blue-900/20"

View File

@@ -1,6 +1,8 @@
export { CollapsibleSection } from './CollapsibleSection';
export { ToolDiffViewer } from './ToolDiffViewer';
export { OneLineDisplay } from './OneLineDisplay';
export { BashCommandDisplay } from './BashCommandDisplay';
export { CommandRunGroup } from './CommandRunGroup';
export { CollapsibleDisplay } from './CollapsibleDisplay';
export { SubagentContainer } from './SubagentContainer';
export * from './ContentRenderers';

View File

@@ -173,6 +173,7 @@ function ChatInterface({
isDragActive,
openImagePicker,
handleSubmit,
handleVoiceTranscript,
handleInputChange,
handleKeyDown,
handlePaste,
@@ -406,6 +407,7 @@ function ChatInterface({
renderInputWithMentions={renderInputWithMentions}
textareaRef={textareaRef}
input={input}
onVoiceTranscript={handleVoiceTranscript}
onInputChange={handleInputChange}
onTextareaClick={handleTextareaClick}
onTextareaKeyDown={handleKeyDown}

View File

@@ -1,4 +1,5 @@
import { useTranslation } from 'react-i18next';
import { useCallback, useEffect, useRef, useState } from 'react';
import type {
ChangeEvent,
ClipboardEvent,
@@ -9,8 +10,10 @@ import type {
RefObject,
TouchEvent,
} from 'react';
import { ImageIcon, MessageSquareIcon, XIcon, ArrowDownIcon } from 'lucide-react';
import { ImageIcon, MessageSquareIcon, XIcon, ArrowDownIcon, Loader2 } from 'lucide-react';
import { useVoiceInput } from '../../hooks/useVoiceInput';
import { useVoiceAvailable } from '../../hooks/useVoiceAvailable';
import type { SessionActivity } from '../../../../hooks/useSessionProtection';
import type { PendingPermissionRequest, PermissionMode } from '../../types/types';
import {
@@ -27,6 +30,7 @@ import {
import CommandMenu from './CommandMenu';
import ActivityIndicator from './ActivityIndicator';
import ImageAttachment from './ImageAttachment';
import VoiceInputButton from './VoiceInputButton';
import PermissionRequestsBanner from './PermissionRequestsBanner';
import TokenUsageSummary from './TokenUsageSummary';
@@ -89,6 +93,7 @@ interface ChatComposerProps {
renderInputWithMentions: (text: string) => ReactNode;
textareaRef: RefObject<HTMLTextAreaElement>;
input: string;
onVoiceTranscript?: (text: string, send?: boolean) => void;
onInputChange: (event: ChangeEvent<HTMLTextAreaElement>) => void;
onTextareaClick: (event: MouseEvent<HTMLTextAreaElement>) => void;
onTextareaKeyDown: (event: KeyboardEvent<HTMLTextAreaElement>) => void;
@@ -142,6 +147,7 @@ export default function ChatComposer({
renderInputWithMentions,
textareaRef,
input,
onVoiceTranscript,
onInputChange,
onTextareaClick,
onTextareaKeyDown,
@@ -154,6 +160,28 @@ export default function ChatComposer({
sendByCtrlEnter,
}: ChatComposerProps) {
const { t } = useTranslation('chat');
// Voice state is hosted here (not in the mic button) so the main Send button can stop
// recording and send the transcript in one tap, the way the mic button drops it in the box.
const voiceAvailable = useVoiceAvailable();
const [voiceError, setVoiceError] = useState<string | null>(null);
const voiceErrorTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
const handleVoiceError = useCallback((msg: string) => {
setVoiceError(msg);
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
voiceErrorTimer.current = setTimeout(() => setVoiceError(null), 4000);
}, []);
useEffect(() => () => {
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
}, []);
const noopTranscript = useCallback(() => {}, []);
const { state: voiceState, toggle: voiceToggle, stop: voiceStop } = useVoiceInput(
onVoiceTranscript ?? noopTranscript,
handleVoiceError,
);
const isRecording = voiceState === 'recording';
const isTranscribing = voiceState === 'transcribing';
const textareaRect = textareaRef.current?.getBoundingClientRect();
const commandMenuPosition = {
top: textareaRect ? Math.max(16, textareaRect.top - 316) : 0,
@@ -309,6 +337,10 @@ export default function ChatComposer({
<ImageIcon />
</PromptInputButton>
{onVoiceTranscript && voiceAvailable && (
<VoiceInputButton state={voiceState} onToggle={voiceToggle} errorMsg={voiceError} />
)}
<button
type="button"
onClick={onModeSwitch}
@@ -387,10 +419,21 @@ export default function ChatComposer({
{sendByCtrlEnter ? t('input.hintText.ctrlEnter') : t('input.hintText.enter')}
</div>
<PromptInputSubmit
onClick={isLoading ? onAbortSession : undefined}
disabled={!isLoading && !input.trim()}
onClick={
isLoading
? onAbortSession
: isRecording
? (e: MouseEvent<HTMLButtonElement>) => {
e.preventDefault();
voiceStop({ send: true });
}
: undefined
}
disabled={isLoading ? false : isRecording ? false : isTranscribing ? true : !input.trim()}
className="h-10 w-10 sm:h-10 sm:w-10"
/>
>
{isTranscribing ? <Loader2 className="h-4 w-4 animate-spin" /> : undefined}
</PromptInputSubmit>
</div>
</PromptInputFooter>
</PromptInput>

View File

@@ -1,6 +1,6 @@
import { useTranslation } from 'react-i18next';
import { useCallback, useRef } from 'react';
import type { Dispatch, RefObject, SetStateAction } from 'react';
import type { Dispatch, ReactNode, RefObject, SetStateAction } from 'react';
import type { ChatMessage } from '../../types/types';
import type {
@@ -13,6 +13,7 @@ import { getIntrinsicMessageKey } from '../../utils/messageKeys';
import MessageComponent from './MessageComponent';
import ProviderSelectionEmptyState from './ProviderSelectionEmptyState';
import { CommandRunGroup } from '../../tools';
interface ChatMessagesPaneProps {
scrollContainerRef: RefObject<HTMLDivElement>;
@@ -252,25 +253,81 @@ export default function ChatMessagesPane({
</div>
)}
{visibleMessages.map((message, index) => {
const prevMessage = index > 0 ? visibleMessages[index - 1] : null;
return (
<MessageComponent
key={getMessageKey(message)}
message={message}
prevMessage={prevMessage}
createDiff={createDiff}
onFileOpen={onFileOpen}
onShowSettings={onShowSettings}
onGrantToolPermission={onGrantToolPermission}
autoExpandTools={autoExpandTools}
showRawParameters={showRawParameters}
showThinking={showThinking}
selectedProject={selectedProject}
provider={provider}
/>
);
})}
{(() => {
const isBashCommand = (m: ChatMessage | null | undefined) =>
Boolean(m && m.isToolUse && m.toolName === 'Bash' && !m.isSubagentContainer);
// Messages that render nothing (e.g. thinking hidden when showThinking
// is off) shouldn't break a visual run of commands.
const isRendered = (m: ChatMessage) => !(m.isThinking && !showThinking);
const items: ReactNode[] = [];
for (let index = 0; index < visibleMessages.length; index++) {
const message = visibleMessages[index];
// Collapse a run of 2+ consecutive shell commands under a single
// header so long command runs stay tidy (Codex-in-VSCode style).
// Skip over non-rendered messages (e.g. hidden reasoning that Codex
// interleaves between commands) so they don't split the run.
if (isBashCommand(message)) {
const runIndices = [index];
let cursor = index + 1;
while (cursor < visibleMessages.length) {
const candidate = visibleMessages[cursor];
if (!isRendered(candidate)) {
cursor++;
continue;
}
if (isBashCommand(candidate)) {
runIndices.push(cursor);
cursor++;
continue;
}
break;
}
if (runIndices.length >= 2) {
const groupMessages = runIndices.map((i) => visibleMessages[i]);
items.push(
<CommandRunGroup key={getMessageKey(groupMessages[0])} messages={groupMessages} />,
);
// Consume everything up to the last command in the run (any
// trailing skipped messages render nothing anyway).
index = runIndices[runIndices.length - 1];
continue;
}
}
// Walk back past messages that are not actually rendered (e.g. thinking
// messages hidden when showThinking is off). Otherwise a hidden thinking
// message would make the following message look "grouped" and suppress its
// provider header/icon — which is why Claude turns lost their icon.
let prevMessage: ChatMessage | null = null;
for (let i = index - 1; i >= 0; i--) {
const candidate = visibleMessages[i];
if (candidate.isThinking && !showThinking) continue;
prevMessage = candidate;
break;
}
items.push(
<MessageComponent
key={getMessageKey(message)}
message={message}
prevMessage={prevMessage}
createDiff={createDiff}
onFileOpen={onFileOpen}
onShowSettings={onShowSettings}
onGrantToolPermission={onGrantToolPermission}
autoExpandTools={autoExpandTools}
showRawParameters={showRawParameters}
showThinking={showThinking}
selectedProject={selectedProject}
provider={provider}
/>,
);
}
return items;
})()}
</>
)}
</div>

View File

@@ -2,9 +2,7 @@ import { useMemo, useState } from 'react';
import {
Activity,
BadgeCheck,
Check,
CircleHelp,
Clipboard,
Coins,
Cpu,
Gauge,
@@ -59,19 +57,6 @@ type ModelOption = {
description?: string;
};
const formatUpdatedAt = (value?: string) => {
if (!value) {
return 'Not cached yet';
}
const parsed = new Date(value);
if (Number.isNaN(parsed.getTime())) {
return 'Not cached yet';
}
return parsed.toLocaleString();
};
const PROVIDER_LABELS: Record<string, string> = {
claude: 'Claude',
cursor: 'Cursor',
@@ -246,7 +231,6 @@ function HelpContent({ data }: { data: HelpCommandData }) {
function ModelsContent({
data,
providerModelCatalog,
providerModelCacheCatalog,
providerModelsRefreshing,
onHardRefreshProviderModels,
currentSessionId,
@@ -254,14 +238,12 @@ function ModelsContent({
}: {
data: ModelCommandData;
providerModelCatalog: Partial<Record<LLMProvider, ProviderModelsDefinition>>;
providerModelCacheCatalog: Partial<Record<LLMProvider, ProviderModelsCacheInfo>>;
providerModelsRefreshing: boolean;
onHardRefreshProviderModels: () => void;
currentSessionId: string | null;
onSelectProviderModel: CommandResultModalProps['onSelectProviderModel'];
}) {
const [query, setQuery] = useState('');
const [copiedModel, setCopiedModel] = useState<string | null>(null);
const [changingModel, setChangingModel] = useState<string | null>(null);
const [pendingSessionModel, setPendingSessionModel] = useState<string | null>(null);
const [selectionNotice, setSelectionNotice] = useState<string | null>(null);
@@ -269,7 +251,6 @@ function ModelsContent({
const currentModel = data?.current?.model || 'Unknown';
const providerLabel = data?.current?.providerLabel || getProviderLabel(currentProvider);
const liveDefinition = providerModelCatalog[currentProvider];
const currentCache = providerModelCacheCatalog[currentProvider] ?? data?.cache;
const availableOptions = useMemo<ModelOption[]>(() => {
if (liveDefinition?.OPTIONS && liveDefinition.OPTIONS.length > 0) {
return liveDefinition.OPTIONS;
@@ -282,7 +263,6 @@ function ModelsContent({
const availableModels = Array.isArray(data?.availableModels) ? data.availableModels : [];
return availableModels.map((model) => ({ value: model, label: model }));
}, [data, liveDefinition]);
const defaultModel = liveDefinition?.DEFAULT || data?.defaultModel || currentModel;
const filteredOptions = useMemo(() => {
const normalized = query.trim().toLowerCase();
@@ -296,18 +276,8 @@ function ModelsContent({
});
}, [availableOptions, query]);
const activeOption = availableOptions.find((option) => option.value === currentModel);
const hasConcreteSessionId = typeof currentSessionId === 'string' && currentSessionId.trim().length > 0;
const copyModel = (model: string) => {
if (typeof navigator !== 'undefined' && navigator.clipboard) {
void navigator.clipboard.writeText(model).catch(() => undefined);
}
setCopiedModel(model);
window.setTimeout(() => {
setCopiedModel((current) => (current === model ? null : current));
}, 1300);
};
const showSearch = availableOptions.length > 6;
const handleSelectModel = async (model: string) => {
setChangingModel(model);
@@ -330,162 +300,106 @@ function ModelsContent({
};
return (
<div className="flex h-full min-h-0 flex-col gap-2.5">
<div className="rounded-2xl border border-border/70 bg-muted/20 p-2.5">
<div className="grid gap-2.5 lg:grid-cols-[minmax(0,1.55fr)_minmax(12rem,0.7fr)_minmax(15rem,0.9fr)] lg:items-start">
<div className="min-w-0">
<div className="flex flex-wrap items-center gap-2">
<Badge variant="secondary" className="rounded-lg border border-primary/20 bg-primary/10 px-2.5 py-1 text-[10px] font-semibold uppercase tracking-[0.18em] text-primary">
{providerLabel}
</Badge>
<Badge variant="secondary" className="rounded-lg px-2.5 py-1 text-[10px] font-semibold uppercase tracking-[0.18em] text-foreground">
{availableOptions.length} models
</Badge>
</div>
<div className="mt-2 rounded-xl border border-primary/15 bg-primary/[0.06] px-3 py-2">
<p className="text-[11px] font-bold uppercase tracking-[0.2em] text-primary">Active Model</p>
<p className="mt-1 break-all font-mono text-[0.98rem] font-semibold leading-5 text-foreground sm:text-[1.05rem]">
{currentModel}
</p>
{activeOption?.label && activeOption.label !== currentModel && (
<p className="mt-1 text-[11px] font-medium text-foreground/85">{activeOption.label}</p>
)}
{activeOption?.description && (
<p className="mt-0.5 line-clamp-1 text-[11px] text-muted-foreground">{activeOption.description}</p>
)}
{pendingSessionModel && pendingSessionModel !== currentModel && (
<p className="mt-1 text-[10px] font-semibold uppercase tracking-[0.16em] text-primary">
Next response: {pendingSessionModel}
</p>
)}
</div>
</div>
<div className="grid gap-2 sm:grid-cols-2 lg:grid-cols-1">
<div className="rounded-xl border border-border/60 bg-background/55 px-2.5 py-1.5">
<p className="text-[10px] font-bold uppercase tracking-[0.18em] text-foreground/80">Default</p>
<p className="mt-1 break-all font-mono text-[11px] font-medium text-foreground">{defaultModel}</p>
</div>
<div className="rounded-xl border border-border/60 bg-background/55 px-2.5 py-1.5">
<p className="text-[10px] font-bold uppercase tracking-[0.18em] text-foreground/80">Updated</p>
<p className="mt-1 text-[11px] font-medium text-foreground">{formatUpdatedAt(currentCache?.updatedAt)}</p>
</div>
</div>
<div className="rounded-xl border border-border/60 bg-background/55 p-2.5">
<div className="flex flex-wrap items-center gap-1.5">
<p className="text-[10px] font-bold uppercase tracking-[0.18em] text-foreground/80">
Catalog Refresh
</p>
<Badge variant="secondary" className="rounded-md px-1.5 py-0 text-[9px] uppercase tracking-[0.14em]">
All providers
</Badge>
</div>
<p className="mt-1.5 text-[11px] leading-4 text-muted-foreground">
Model lists are cached for 3 days. Refresh after CLI, auth, or config changes,
or when a new model is missing.
</p>
<Button
type="button"
variant="outline"
size="sm"
onClick={onHardRefreshProviderModels}
disabled={providerModelsRefreshing}
className="mt-2 h-8 w-full rounded-xl px-3"
>
<RefreshCw className={providerModelsRefreshing ? 'animate-spin' : ''} />
{providerModelsRefreshing ? 'Refreshing catalogs...' : 'Refresh from providers'}
</Button>
</div>
</div>
<div className="mt-2 border-t border-border/50 pt-1.5 text-[11px] text-muted-foreground">
{hasConcreteSessionId
? 'Selecting a model stores a session override and applies it on the next response for this session.'
: 'Selecting a model updates the default model used for new turns in this provider.'}
{selectionNotice && <span className="ml-2 text-foreground">{selectionNotice}</span>}
<div className="flex h-full min-h-0 flex-col gap-3">
{/* Compact context bar: active model + refresh, no clutter */}
<div className="flex items-center justify-between gap-3 rounded-2xl border border-border/70 bg-muted/20 px-3.5 py-2.5">
<div className="min-w-0">
<p className="text-[10px] font-semibold uppercase tracking-[0.18em] text-muted-foreground">
Active model · {providerLabel}
</p>
<p className="mt-0.5 flex flex-wrap items-center gap-x-2 gap-y-0.5">
<span className="break-all font-mono text-sm font-semibold text-foreground">{currentModel}</span>
{pendingSessionModel && pendingSessionModel !== currentModel && (
<span className="text-[11px] font-semibold uppercase tracking-[0.14em] text-emerald-500 dark:text-emerald-400">
{pendingSessionModel} next
</span>
)}
</p>
</div>
<Button
type="button"
variant="ghost"
size="icon"
onClick={onHardRefreshProviderModels}
disabled={providerModelsRefreshing}
title="Refresh model list from providers"
aria-label="Refresh model list from providers"
className="h-9 w-9 shrink-0 rounded-xl text-muted-foreground hover:text-foreground"
>
<RefreshCw className={`h-4 w-4 ${providerModelsRefreshing ? 'animate-spin' : ''}`} />
</Button>
</div>
<div className="flex min-h-0 flex-1 flex-col rounded-3xl border border-border/70 bg-muted/15 p-3 sm:p-4">
<div className="mb-2.5 grid gap-2 sm:grid-cols-[1fr_auto] sm:items-center">
<div className="min-w-0">
<SearchField value={query} onChange={setQuery} placeholder={`Search ${providerLabel} models...`} />
</div>
<Badge variant="secondary" className="h-9 justify-center rounded-xl px-3 font-mono text-xs">
{filteredOptions.length} shown
</Badge>
</div>
{showSearch && (
<SearchField value={query} onChange={setQuery} placeholder={`Search ${providerLabel} models...`} />
)}
{filteredOptions.length > 0 ? (
<div className="scrollbar-thin min-h-0 flex-1 overflow-y-auto pr-1">
<div className="grid gap-2 md:grid-cols-2">
{filteredOptions.map((option, index) => {
const isCurrent = option.value === currentModel;
const wasCopied = copiedModel === option.value;
const isPendingSelection = option.value === pendingSessionModel;
const isChanging = option.value === changingModel;
return (
<div
key={option.value}
className={`settings-content-enter group flex min-h-[4.5rem] items-start gap-3 rounded-2xl border p-3 shadow-sm transition-all duration-200 hover:-translate-y-0.5 hover:shadow-md ${
isCurrent
? 'border-primary/45 bg-primary/10'
: isPendingSelection
? 'border-emerald-500/35 bg-emerald-500/10'
: 'border-border/70 bg-background/80 hover:border-primary/30 hover:bg-background'
}`}
style={{ animationDelay: `${Math.min(index * 14, 180)}ms` }}
>
<button
type="button"
onClick={() => handleSelectModel(option.value)}
disabled={Boolean(changingModel)}
className="min-w-0 flex-1 text-left focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring"
aria-label={`Use model ${option.value}`}
>
<span className="flex items-center gap-2">
<span className="break-all font-mono text-sm font-semibold text-foreground">{option.value}</span>
{isCurrent && <BadgeCheck className="h-4 w-4 shrink-0 text-primary" />}
</span>
{option.label && option.label !== option.value && (
<span className="mt-1 block text-xs text-muted-foreground">{option.label}</span>
)}
{option.description && (
<span className="mt-1 block text-xs leading-5 text-muted-foreground">{option.description}</span>
)}
{isCurrent && <span className="mt-2 block text-[11px] font-semibold uppercase tracking-[0.16em] text-primary">Current selection</span>}
{isPendingSelection && !isCurrent && (
<span className="mt-2 block text-[11px] font-semibold uppercase tracking-[0.16em] text-emerald-400">
Next response selection
</span>
)}
{isChanging && (
<span className="mt-2 block text-[11px] font-semibold uppercase tracking-[0.16em] text-primary">
Applying...
</span>
)}
</button>
<button
type="button"
onClick={() => copyModel(option.value)}
className="rounded-lg border border-border/70 bg-muted/30 p-2 text-muted-foreground transition-colors group-hover:text-primary"
aria-label={`Copy model id ${option.value}`}
>
{wasCopied ? <Check className="h-4 w-4" /> : <Clipboard className="h-4 w-4" />}
</button>
</div>
);
})}
</div>
{filteredOptions.length > 0 ? (
<div className="scrollbar-thin -mr-1 min-h-0 flex-1 overflow-y-auto pr-1">
<div className="grid gap-2 md:grid-cols-2">
{filteredOptions.map((option, index) => {
const isCurrent = option.value === currentModel;
const isPendingSelection = option.value === pendingSessionModel;
const isChanging = option.value === changingModel;
return (
<button
key={option.value}
type="button"
onClick={() => handleSelectModel(option.value)}
disabled={Boolean(changingModel)}
aria-label={`Select model ${option.value}`}
className={`settings-content-enter group flex min-h-[4rem] flex-col rounded-2xl border p-3 text-left shadow-sm transition-all duration-200 hover:-translate-y-0.5 hover:shadow-md focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring disabled:cursor-default disabled:opacity-60 ${
isCurrent
? 'border-primary/45 bg-primary/10'
: isPendingSelection
? 'border-emerald-500/35 bg-emerald-500/10'
: 'border-border/70 bg-background/80 hover:border-primary/30 hover:bg-background'
}`}
style={{ animationDelay: `${Math.min(index * 14, 180)}ms` }}
>
<span className="flex items-center justify-between gap-2">
<span className="break-all font-mono text-sm font-semibold text-foreground">{option.value}</span>
{isCurrent ? (
<BadgeCheck className="h-4 w-4 shrink-0 text-primary" />
) : isChanging ? (
<RefreshCw className="h-4 w-4 shrink-0 animate-spin text-primary" />
) : null}
</span>
{option.label && option.label !== option.value && (
<span className="mt-1 text-xs font-medium text-foreground/85">{option.label}</span>
)}
{option.description && (
<span className="mt-1 text-xs leading-5 text-muted-foreground">{option.description}</span>
)}
{isCurrent && (
<span className="mt-2 text-[11px] font-semibold uppercase tracking-[0.16em] text-primary">Current selection</span>
)}
{isPendingSelection && !isCurrent && (
<span className="mt-2 text-[11px] font-semibold uppercase tracking-[0.16em] text-emerald-500 dark:text-emerald-400">
Applies next response
</span>
)}
</button>
);
})}
</div>
</div>
) : (
<div className="rounded-2xl border border-dashed border-border bg-background/60 px-4 py-10 text-center text-sm text-muted-foreground">
No models match that search.
</div>
)}
{/* Single quiet line of guidance / feedback */}
<p className="shrink-0 text-[11px] leading-4 text-muted-foreground">
{selectionNotice ? (
<span className="text-foreground">{selectionNotice}</span>
) : hasConcreteSessionId ? (
'Your choice applies to this session on the next response.'
) : (
<div className="rounded-2xl border border-dashed border-border bg-background/60 px-4 py-10 text-center text-sm text-muted-foreground">
No models match that search.
</div>
'Your choice becomes the default model for new turns.'
)}
</div>
</p>
</div>
);
}
@@ -606,7 +520,6 @@ export default function CommandResultModal({
payload,
onClose,
providerModelCatalog,
providerModelCacheCatalog,
providerModelsRefreshing,
onHardRefreshProviderModels,
currentSessionId,
@@ -624,9 +537,9 @@ export default function CommandResultModal({
icon: CircleHelp,
},
models: {
eyebrow: 'Model inventory',
title: 'Available Models',
subtitle: 'Browse, search, and copy model IDs for the active provider.',
eyebrow: 'Model selection',
title: 'Choose a Model',
subtitle: 'Pick the model this provider should use.',
icon: Cpu,
},
cost: {
@@ -700,7 +613,6 @@ export default function CommandResultModal({
<ModelsContent
data={payload.data as ModelCommandData}
providerModelCatalog={providerModelCatalog}
providerModelCacheCatalog={providerModelCacheCatalog}
providerModelsRefreshing={providerModelsRefreshing}
onHardRefreshProviderModels={onHardRefreshProviderModels}
currentSessionId={currentSessionId}

View File

@@ -8,12 +8,48 @@ import { oneDark } from 'react-syntax-highlighter/dist/esm/styles/prism';
import { useTranslation } from 'react-i18next';
import { normalizeInlineCodeFences } from '../../utils/chatFormatting';
import { copyTextToClipboard } from '../../../../utils/clipboard';
import { usePaletteOps } from '../../../../contexts/PaletteOpsContext';
type MarkdownProps = {
children: React.ReactNode;
className?: string;
};
// Links to the wider web (or in-page anchors) keep normal browser navigation;
// everything else is treated as a workspace file reference.
const isExternalHref = (href?: string): boolean =>
!!href && (/^(https?:|mailto:|tel:|data:)/i.test(href) || href.startsWith('#'));
// Strip a trailing `:line` / `:line:col` suffix (e.g. `src/foo.ts:130`).
const stripLineSuffix = (value: string): string => value.replace(/:\d+(?::\d+)?$/, '');
// A usable file path contains a separator or a filename with an extension.
const looksLikeFilePath = (value?: string): value is string => {
if (!value) {
return false;
}
const cleaned = stripLineSuffix(value.trim());
if (!cleaned || cleaned === '#') {
return false;
}
return /[\\/]/.test(cleaned) || /\.[a-z0-9]+$/i.test(cleaned);
};
// Extract plain text from link children so a reference rendered only as link
// text (e.g. `[src/foo.ts]()` with an empty href) can still be opened.
const childrenToText = (children: React.ReactNode): string => {
if (typeof children === 'string' || typeof children === 'number') {
return String(children);
}
if (Array.isArray(children)) {
return children.map(childrenToText).join('');
}
if (React.isValidElement(children)) {
return childrenToText((children.props as { children?: React.ReactNode }).children);
}
return '';
};
type CodeBlockProps = {
node?: any;
inline?: boolean;
@@ -123,11 +159,6 @@ const markdownComponents = {
{children}
</blockquote>
),
a: ({ href, children }: { href?: string; children?: React.ReactNode }) => (
<a href={href} className="text-blue-600 hover:underline dark:text-blue-400" target="_blank" rel="noopener noreferrer">
{children}
</a>
),
p: ({ children }: { children?: React.ReactNode }) => <div className="mb-2 last:mb-0">{children}</div>,
table: ({ children }: { children?: React.ReactNode }) => (
<div className="my-2 overflow-x-auto">
@@ -147,10 +178,50 @@ export function Markdown({ children, className }: MarkdownProps) {
const content = normalizeInlineCodeFences(String(children ?? ''));
const remarkPlugins = useMemo(() => [remarkGfm, remarkMath], []);
const rehypePlugins = useMemo(() => [rehypeKatex], []);
const { openFileInEditor } = usePaletteOps();
const components = useMemo(
() => ({
...markdownComponents,
a: ({ href, children: linkChildren }: { href?: string; children?: React.ReactNode }) => {
// Prefer the href when it is a real path; otherwise fall back to the
// link text, since models often emit `[src/foo.ts]()` with an empty href.
const linkText = childrenToText(linkChildren);
const fileRef = looksLikeFilePath(href) ? href : looksLikeFilePath(linkText) ? linkText : undefined;
if (fileRef && !isExternalHref(href)) {
return (
<a
href={href || fileRef}
className="cursor-pointer text-blue-600 hover:underline dark:text-blue-400"
onClick={(event) => {
event.preventDefault();
openFileInEditor(stripLineSuffix(fileRef));
}}
>
{linkChildren}
</a>
);
}
return (
<a
href={href}
className="text-blue-600 hover:underline dark:text-blue-400"
target="_blank"
rel="noopener noreferrer"
>
{linkChildren}
</a>
);
},
}),
[openFileInEditor],
);
return (
<div className={className}>
<ReactMarkdown remarkPlugins={remarkPlugins} rehypePlugins={rehypePlugins} components={markdownComponents as any}>
<ReactMarkdown remarkPlugins={remarkPlugins} rehypePlugins={rehypePlugins} components={components as any}>
{content}
</ReactMarkdown>
</div>

View File

@@ -15,6 +15,7 @@ import { Reasoning, ReasoningTrigger, ReasoningContent } from '../../../../share
import { Markdown } from './Markdown';
import MessageCopyControl from './MessageCopyControl';
import MessageSpeakControl from './MessageSpeakControl';
type DiffLine = {
type: string;
@@ -217,8 +218,8 @@ const MessageComponent = memo(({ message, prevMessage, createDiff, onFileOpen, a
/>
)}
{/* Tool Result Section */}
{message.toolResult && !shouldHideToolResult(message.toolName || 'UnknownTool', message.toolResult) && (
{/* Tool Result Section — Bash renders its output inside the command row above. */}
{message.toolResult && message.toolName !== 'Bash' && !shouldHideToolResult(message.toolName || 'UnknownTool', message.toolResult) && (
message.toolResult.isError ? (
// Error results - red error box with content
<div
@@ -415,6 +416,9 @@ const MessageComponent = memo(({ message, prevMessage, createDiff, onFileOpen, a
{shouldShowAssistantCopyControl && (
<MessageCopyControl content={assistantCopyContent} messageType="assistant" />
)}
{shouldShowAssistantCopyControl && (
<MessageSpeakControl content={assistantCopyContent} />
)}
{!isGrouped && <span>{formattedTime}</span>}
</div>
)}

View File

@@ -0,0 +1,44 @@
import { Volume2, Loader2, Square } from 'lucide-react';
import { useTranslation } from 'react-i18next';
import { useTts } from '../../hooks/useTts';
import { useVoiceAvailable } from '../../hooks/useVoiceAvailable';
// Tap-to-speak button beside the copy control on assistant messages.
// Renders nothing unless the optional voice feature is enabled.
const MessageSpeakControl = ({ content }: { content: string }) => {
const { t } = useTranslation('chat');
const available = useVoiceAvailable();
const { state, toggle, error } = useTts(() => content);
if (!available) return null;
const title =
state === 'playing' ? t('voice.stopSpeaking') : state === 'loading' ? t('voice.loading') : t('voice.speak');
return (
<span className="relative inline-flex">
{error && (
<span className="absolute bottom-full left-1/2 z-10 mb-1 max-w-[240px] -translate-x-1/2 whitespace-normal rounded bg-red-600 px-2 py-1 text-center text-xs text-white shadow-lg">
{error}
</span>
)}
<button
type="button"
onClick={toggle}
title={title}
aria-label={title}
className="inline-flex items-center gap-1 rounded px-1 py-0.5 text-gray-400 transition-colors hover:text-gray-600 dark:text-gray-500 dark:hover:text-gray-300"
>
{state === 'playing' ? (
<Square className="h-3.5 w-3.5" />
) : state === 'loading' ? (
<Loader2 className="h-3.5 w-3.5 animate-spin" />
) : (
<Volume2 className="h-3.5 w-3.5" />
)}
</button>
</span>
);
};
export default MessageSpeakControl;

View File

@@ -0,0 +1,46 @@
import { useTranslation } from 'react-i18next';
import { Mic, Square, Loader2 } from 'lucide-react';
import { PromptInputButton } from '../../../../shared/view/ui';
import type { VoiceInputState } from '../../hooks/useVoiceInput';
type Props = {
state: VoiceInputState;
onToggle: () => void;
errorMsg?: string | null;
};
// Push-to-talk mic button (presentational). Recording state and the stop-and-send action
// are owned by the composer so the main Send button can drive them too. This button just
// starts recording and, while recording, stops and drops the transcript into the input box.
export default function VoiceInputButton({ state, onToggle, errorMsg }: Props) {
const { t } = useTranslation('chat');
const icon =
state === 'recording' ? (
<Square className="text-red-500" />
) : state === 'transcribing' ? (
<Loader2 className="animate-spin" />
) : (
<Mic />
);
return (
<span className="relative inline-flex">
{errorMsg && (
<span className="absolute bottom-full left-1/2 mb-1 -translate-x-1/2 whitespace-nowrap rounded bg-red-600 px-2 py-1 text-xs text-white shadow-lg">
{errorMsg}
</span>
)}
<PromptInputButton
tooltip={{ content: state === 'recording' ? t('voice.stopRecording') : t('voice.input') }}
onClick={(e: { preventDefault: () => void }) => {
e.preventDefault();
onToggle();
}}
>
{icon}
</PromptInputButton>
</span>
);
}

View File

@@ -11,6 +11,7 @@ import { useTaskMaster } from '../../../contexts/TaskMasterContext';
import { usePaletteOpsRegister } from '../../../contexts/PaletteOpsContext';
import { useTasksSettings } from '../../../contexts/TasksSettingsContext';
import { useUiPreferences } from '../../../hooks/useUiPreferences';
import { useFileOpenResolver } from '../../../hooks/useFileOpenResolver';
import { authenticatedFetch } from '../../../utils/api';
import { useEditorSidebar } from '../../code-editor/hooks/useEditorSidebar';
import EditorSidebar from '../../code-editor/view/EditorSidebar';
@@ -77,6 +78,10 @@ function MainContent({
isMobile,
});
// Resolves bare/partial file references (e.g. links inside chat messages) to
// real project files before opening them in the in-app editor.
const resolvedFileOpen = useFileOpenResolver(selectedProject, handleFileOpen);
useEffect(() => {
// Identify projects by DB `projectId`; the TaskMaster context uses the
// same identifier to key its internal maps.
@@ -121,6 +126,10 @@ function MainContent({
setActiveTab('files');
handleFileOpen(filePath);
},
// Opens the editor side panel in place, keeping the current tab (e.g. chat).
openFileInEditor: (filePath: string) => {
resolvedFileOpen(filePath);
},
});
if (isLoading) {

View File

@@ -4,6 +4,7 @@ import {
Eye,
Languages,
Maximize2,
Mic,
} from 'lucide-react';
import type { PreferenceToggleItem } from './types';
@@ -54,4 +55,9 @@ export const INPUT_SETTING_TOGGLES: PreferenceToggleItem[] = [
labelKey: 'quickSettings.sendByCtrlEnter',
icon: Languages,
},
{
key: 'voiceEnabled',
labelKey: 'quickSettings.voiceEnabled',
icon: Mic,
},
];

View File

@@ -6,7 +6,8 @@ export type PreferenceToggleKey =
| 'showRawParameters'
| 'showThinking'
| 'autoScrollToBottom'
| 'sendByCtrlEnter';
| 'sendByCtrlEnter'
| 'voiceEnabled';
export type QuickSettingsPreferences = Record<PreferenceToggleKey, boolean>;

View File

@@ -28,6 +28,9 @@ export default function QuickSettingsContent({
onPreferenceChange,
}: QuickSettingsContentProps) {
const { t } = useTranslation('settings');
const inputSettingToggles = preferences.voiceEnabled
? INPUT_SETTING_TOGGLES
: INPUT_SETTING_TOGGLES.filter(({ key }) => key !== 'voiceEnabled');
const renderToggleRows = (items: PreferenceToggleItem[]) => (
items.map(({ key, labelKey, icon }) => (
@@ -67,7 +70,7 @@ export default function QuickSettingsContent({
</QuickSettingsSection>
<QuickSettingsSection title={t('quickSettings.sections.inputSettings')}>
{renderToggleRows(INPUT_SETTING_TOGGLES)}
{renderToggleRows(inputSettingToggles)}
<p className="ml-3 text-xs text-gray-500 dark:text-gray-400">
{t('quickSettings.sendByCtrlEnterDescription')}
</p>

View File

@@ -27,12 +27,14 @@ export default function QuickSettingsPanelView() {
showThinking: preferences.showThinking,
autoScrollToBottom: preferences.autoScrollToBottom,
sendByCtrlEnter: preferences.sendByCtrlEnter,
voiceEnabled: preferences.voiceEnabled,
}), [
preferences.autoExpandTools,
preferences.autoScrollToBottom,
preferences.sendByCtrlEnter,
preferences.showRawParameters,
preferences.showThinking,
preferences.voiceEnabled,
]);
const handlePreferenceChange = useCallback(

View File

@@ -3,7 +3,7 @@ import type { Dispatch, SetStateAction } from 'react';
import type { LLMProvider } from '../../../types/app';
import type { ProviderAuthStatus } from '../../provider-auth/types';
export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'tasks' | 'browser' | 'notifications' | 'plugins' | 'about';
export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'voice' | 'tasks' | 'browser' | 'notifications' | 'plugins' | 'about';
export type AgentProvider = LLMProvider;
export type AgentCategory = 'account' | 'permissions' | 'mcp' | 'skills';
export type ProjectSortOrder = 'name' | 'date';

View File

@@ -7,6 +7,7 @@ import SettingsSidebar from '../view/SettingsSidebar';
import AgentsSettingsTab from '../view/tabs/agents-settings/AgentsSettingsTab';
import AppearanceSettingsTab from '../view/tabs/AppearanceSettingsTab';
import CredentialsSettingsTab from '../view/tabs/api-settings/CredentialsSettingsTab';
import VoiceSettingsTab from '../view/tabs/VoiceSettingsTab';
import GitSettingsTab from '../view/tabs/git-settings/GitSettingsTab';
import BrowserUseSettingsTab from '../view/tabs/browser-use-settings/BrowserUseSettingsTab';
import NotificationsSettingsTab from '../view/tabs/NotificationsSettingsTab';
@@ -157,6 +158,8 @@ function Settings({ isOpen, onClose, projects = [], initialTab = 'agents' }: Set
{activeTab === 'api' && <CredentialsSettingsTab />}
{activeTab === 'voice' && <VoiceSettingsTab />}
{activeTab === 'plugins' && <PluginSettingsTab />}
{activeTab === 'about' && <AboutTab />}

View File

@@ -1,5 +1,6 @@
import { Bell, Bot, GitBranch, Info, Key, ListChecks, MonitorPlay, Palette, Puzzle } from 'lucide-react';
import { Bell, Bot, GitBranch, Info, Key, ListChecks, Mic, MonitorPlay, Palette, Puzzle } from 'lucide-react';
import { useTranslation } from 'react-i18next';
import { cn } from '../../../lib/utils';
import { PillBar, Pill } from '../../../shared/view/ui';
import type { SettingsMainTab } from '../types/types';
@@ -20,6 +21,7 @@ const NAV_ITEMS: NavItem[] = [
{ id: 'appearance', labelKey: 'mainTabs.appearance', icon: Palette },
{ id: 'git', labelKey: 'mainTabs.git', icon: GitBranch },
{ id: 'api', labelKey: 'mainTabs.apiTokens', icon: Key },
{ id: 'voice', labelKey: 'mainTabs.voice', icon: Mic },
{ id: 'tasks', labelKey: 'mainTabs.tasks', icon: ListChecks },
{ id: 'browser', labelKey: 'mainTabs.browser', icon: MonitorPlay },
{ id: 'plugins', labelKey: 'mainTabs.plugins', icon: Puzzle },

View File

@@ -0,0 +1,91 @@
import type { InputHTMLAttributes } from 'react';
import { useTranslation } from 'react-i18next';
import SettingsSection from '../SettingsSection';
import SettingsToggle from '../SettingsToggle';
import { useUiPreferences } from '../../../../hooks/useUiPreferences';
import { useVoiceConfig } from '../../../../hooks/useVoiceConfig';
const inputClass =
'w-full rounded-md border border-border bg-background px-3 py-2 text-sm text-foreground placeholder:text-muted-foreground focus:outline-none focus:ring-2 focus:ring-ring';
function Field({ label, ...props }: { label: string } & InputHTMLAttributes<HTMLInputElement>) {
return (
<label className="block space-y-1">
<span className="text-sm font-medium text-foreground">{label}</span>
<input className={inputClass} {...props} />
</label>
);
}
export default function VoiceSettingsTab() {
const { t } = useTranslation('settings');
const { preferences, setPreference } = useUiPreferences();
const { config, update } = useVoiceConfig();
const voiceEnabled = preferences.voiceEnabled;
return (
<div className="space-y-8">
<SettingsSection title={t('voiceSettings.title')} description={t('voiceSettings.description')}>
<div className="flex items-center justify-between rounded-lg border border-border p-3">
<div className="pr-3">
<div className="text-sm font-medium text-foreground">{t('voiceSettings.enable')}</div>
<div className="text-xs text-muted-foreground">{t('voiceSettings.enableDescription')}</div>
</div>
<SettingsToggle
checked={voiceEnabled}
onChange={(v) => setPreference('voiceEnabled', v)}
ariaLabel={t('voiceSettings.enable')}
/>
</div>
</SettingsSection>
{voiceEnabled && (
<SettingsSection title={t('voiceSettings.backendTitle')} description={t('voiceSettings.backendDescription')}>
<div className="space-y-4">
<Field
label={t('voiceSettings.baseUrl')}
placeholder="https://api.openai.com/v1"
value={config.baseUrl}
onChange={(e) => update({ baseUrl: e.target.value })}
/>
<Field
label={t('voiceSettings.apiKey')}
type="password"
autoComplete="off"
placeholder="sk-…"
value={config.apiKey}
onChange={(e) => update({ apiKey: e.target.value })}
/>
<div className="grid grid-cols-1 gap-4 sm:grid-cols-4">
<Field
label={t('voiceSettings.sttModel')}
placeholder="whisper-1"
value={config.sttModel}
onChange={(e) => update({ sttModel: e.target.value })}
/>
<Field
label={t('voiceSettings.ttsModel')}
placeholder="tts-1"
value={config.ttsModel}
onChange={(e) => update({ ttsModel: e.target.value })}
/>
<Field
label={t('voiceSettings.voice')}
placeholder="alloy"
value={config.ttsVoice}
onChange={(e) => update({ ttsVoice: e.target.value })}
/>
<Field
label={t('voiceSettings.format')}
placeholder="mp3"
value={config.ttsFormat}
onChange={(e) => update({ ttsFormat: e.target.value })}
/>
</div>
<p className="text-xs text-muted-foreground">{t('voiceSettings.note')}</p>
</div>
</SettingsSection>
)}
</div>
);
}

View File

@@ -32,7 +32,7 @@ function HighlightedSnippet({ snippet, highlights }: { snippet: string; highligh
parts.push(snippet.slice(cursor));
}
return (
<span className="text-xs leading-relaxed text-muted-foreground">
<span className="min-w-0 flex-1 break-words text-xs leading-relaxed text-muted-foreground">
{parts}
</span>
);

View File

@@ -2,7 +2,7 @@ import { useEffect, useRef } from 'react';
import { Check, Edit2, Loader2, Trash2, X } from 'lucide-react';
import type { TFunction } from 'i18next';
import { Badge, Button, Tooltip } from '../../../../shared/view/ui';
import { Badge, Tooltip, buttonVariants } from '../../../../shared/view/ui';
import { cn } from '../../../../lib/utils';
import type { Project, ProjectSession, LLMProvider } from '../../../../types/app';
import type { SessionWithProvider } from '../../types/types';
@@ -195,9 +195,10 @@ export default function SidebarSessionItem({
</div>
<div className="hidden md:block">
<Button
variant="ghost"
<a
href={`/session/${session.id}`}
className={cn(
buttonVariants({ variant: 'ghost' }),
'h-auto w-full justify-start rounded-md border bg-card p-2 text-left font-normal transition-all duration-150',
isSelected ? 'border-primary/20 bg-primary/5' : 'border-border/30',
!isSelected && isProcessing
@@ -206,7 +207,13 @@ export default function SidebarSessionItem({
? 'border-green-500/30 bg-green-50/5 hover:bg-green-50/10 dark:bg-green-900/5 dark:hover:bg-green-900/10'
: 'hover:bg-accent/50',
)}
onClick={() => onSessionSelect(session, project.projectId)}
// Left-click keeps in-app navigation; Ctrl/Cmd/middle-click and the
// native right-click menu use the href to open a new tab/window.
onClick={(event) => {
if (event.metaKey || event.ctrlKey || event.shiftKey || event.altKey) return;
event.preventDefault();
onSessionSelect(session, project.projectId);
}}
>
<div className="flex w-full min-w-0 items-center gap-2">
<div
@@ -249,7 +256,7 @@ export default function SidebarSessionItem({
</div>
</div>
</div>
</Button>
</a>
<div
ref={editingContainerRef}

View File

@@ -3,6 +3,9 @@ import type { MutableRefObject, ReactNode } from 'react';
export type PaletteOps = {
openFile: (path: string) => void;
// Opens a file in the editor side panel without changing the active tab
// (used by in-chat file links so they behave like the inline edit view).
openFileInEditor: (path: string) => void;
openSettings: (tab?: string) => void;
refreshProjects: () => Promise<void> | void;
};
@@ -13,6 +16,7 @@ const PaletteOpsContext = createContext<Registry | null>(null);
const defaultOps: PaletteOps = {
openFile: () => undefined,
openFileInEditor: () => undefined,
openSettings: () => undefined,
refreshProjects: () => undefined,
};
@@ -27,6 +31,8 @@ export function usePaletteOps(): PaletteOps {
return useMemo<PaletteOps>(
() => ({
openFile: (path) => (ref?.current.openFile ?? defaultOps.openFile)(path),
openFileInEditor: (path) =>
(ref?.current.openFileInEditor ?? defaultOps.openFileInEditor)(path),
openSettings: (tab) => (ref?.current.openSettings ?? defaultOps.openSettings)(tab),
refreshProjects: () => (ref?.current.refreshProjects ?? defaultOps.refreshProjects)(),
}),
@@ -36,18 +42,20 @@ export function usePaletteOps(): PaletteOps {
export function usePaletteOpsRegister(partial: Partial<PaletteOps>) {
const ref = useContext(PaletteOpsContext);
const { openFile, openSettings, refreshProjects } = partial;
const { openFile, openFileInEditor, openSettings, refreshProjects } = partial;
useEffect(() => {
if (!ref) return undefined;
const prev = { ...ref.current };
if (openFile) ref.current.openFile = openFile;
if (openFileInEditor) ref.current.openFileInEditor = openFileInEditor;
if (openSettings) ref.current.openSettings = openSettings;
if (refreshProjects) ref.current.refreshProjects = refreshProjects;
return () => {
if (openFile && ref.current.openFile === openFile) ref.current.openFile = prev.openFile;
if (openFileInEditor && ref.current.openFileInEditor === openFileInEditor) ref.current.openFileInEditor = prev.openFileInEditor;
if (openSettings && ref.current.openSettings === openSettings) ref.current.openSettings = prev.openSettings;
if (refreshProjects && ref.current.refreshProjects === refreshProjects) ref.current.refreshProjects = prev.refreshProjects;
};
}, [ref, openFile, openSettings, refreshProjects]);
}, [ref, openFile, openFileInEditor, openSettings, refreshProjects]);
}

View File

@@ -0,0 +1,108 @@
import { useCallback, useRef } from 'react';
import { api } from '../utils/api';
import type { Project } from '../types/app';
type FileNode = {
type: 'file' | 'directory';
name: string;
path: string;
children?: FileNode[];
};
type FlatFile = {
name: string;
path: string;
};
// `diffInfo` is intentionally `any` so this resolver can wrap editor handlers
// that expect a concrete diff payload type as well as generic callers.
type OnFileOpen = (filePath: string, diffInfo?: any) => void;
const normalize = (value: string): string => value.replace(/\\/g, '/');
const flatten = (nodes: FileNode[], out: FlatFile[]): void => {
for (const node of nodes) {
if (node.type === 'file') {
out.push({ name: node.name, path: node.path });
} else if (node.children && node.children.length > 0) {
flatten(node.children, out);
}
}
};
// References inside chat messages are often bare basenames (`foo.ts`) or partial
// paths (`utils/foo.ts`) rather than full paths, so match by path suffix and
// fall back to filename equality.
const findBestMatch = (files: FlatFile[], ref: string): string | null => {
const target = normalize(ref).replace(/^\.\//, '').replace(/^\/+/, '');
if (!target) {
return null;
}
const suffixMatch = files.find((file) => {
const filePath = normalize(file.path);
return filePath === target || filePath.endsWith(`/${target}`);
});
if (suffixMatch) {
return suffixMatch.path;
}
const base = target.split('/').pop() || target;
return files.find((file) => file.name === base)?.path ?? null;
};
/**
* Wraps an `onFileOpen` handler so a possibly bare/partial file reference is
* resolved against the project's file tree (cached per project) before the file
* is opened in the in-app editor.
*/
export function useFileOpenResolver(
selectedProject: Project | null | undefined,
onFileOpen: OnFileOpen,
): OnFileOpen {
const projectId = selectedProject?.projectId;
const cacheRef = useRef<{ projectId?: string; files: Promise<FlatFile[]> | null }>({
projectId: undefined,
files: null,
});
const loadFiles = useCallback((): Promise<FlatFile[]> => {
if (!projectId) {
return Promise.resolve([]);
}
if (cacheRef.current.projectId === projectId && cacheRef.current.files) {
return cacheRef.current.files;
}
const filesPromise = (async () => {
try {
const response = await api.getFiles(projectId);
if (!response.ok) {
return [];
}
const data = await response.json();
const tree: FileNode[] = Array.isArray(data) ? data : [];
const flat: FlatFile[] = [];
flatten(tree, flat);
return flat;
} catch {
return [];
}
})();
cacheRef.current = { projectId, files: filesPromise };
return filesPromise;
}, [projectId]);
return useCallback(
(filePath: string, diffInfo?: any) => {
const ref = normalize(filePath).trim();
void loadFiles().then((files) => {
const match = findBestMatch(files, ref);
onFileOpen(match ?? filePath, diffInfo);
});
},
[loadFiles, onFileOpen],
);
}

View File

@@ -7,6 +7,7 @@ type UiPreferences = {
autoScrollToBottom: boolean;
sendByCtrlEnter: boolean;
sidebarVisible: boolean;
voiceEnabled: boolean;
};
type UiPreferenceKey = keyof UiPreferences;
@@ -39,6 +40,7 @@ const DEFAULTS: UiPreferences = {
autoScrollToBottom: true,
sendByCtrlEnter: false,
sidebarVisible: true,
voiceEnabled: false,
};
const PREFERENCE_KEYS = Object.keys(DEFAULTS) as UiPreferenceKey[];

View File

@@ -0,0 +1,68 @@
import { useState } from 'react';
export type VoiceConfig = {
baseUrl: string;
apiKey: string;
sttModel: string;
ttsModel: string;
ttsVoice: string;
ttsFormat: string;
};
const STORAGE_KEY = 'voiceConfig';
export const VOICE_CONFIG_SYNC_EVENT = 'voice-config:sync';
const DEFAULTS: VoiceConfig = { baseUrl: '', apiKey: '', sttModel: '', ttsModel: '', ttsVoice: '', ttsFormat: '' };
export function readVoiceConfig(): VoiceConfig {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return { ...DEFAULTS };
const parsed = JSON.parse(raw);
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) return { ...DEFAULTS };
const config = { ...DEFAULTS };
for (const key of Object.keys(DEFAULTS) as (keyof VoiceConfig)[]) {
if (typeof parsed[key] === 'string') config[key] = parsed[key];
}
return config;
} catch {
return { ...DEFAULTS };
}
}
// Headers the voice proxy reads to target a per-user OpenAI-compatible backend.
// Empty fields are omitted so the server's env defaults apply.
export function voiceConfigHeaders(): Record<string, string> {
if (typeof window === 'undefined') return {};
const c = readVoiceConfig();
const h: Record<string, string> = {};
if (c.apiKey) h['x-voice-api-key'] = c.apiKey;
if (c.sttModel) h['x-voice-stt-model'] = c.sttModel;
if (c.ttsModel) h['x-voice-tts-model'] = c.ttsModel;
if (c.ttsVoice) h['x-voice-tts-voice'] = c.ttsVoice;
if (c.ttsFormat.trim()) h['x-voice-tts-format'] = c.ttsFormat.trim();
return h;
}
export function useVoiceConfig() {
const [config, setConfig] = useState<VoiceConfig>(() =>
typeof window === 'undefined' ? { ...DEFAULTS } : readVoiceConfig(),
);
const update = (patch: Partial<VoiceConfig>) => {
setConfig((prev) => {
const next = { ...prev, ...patch };
try {
const stored: Partial<VoiceConfig> = { ...next };
if (next.ttsFormat.trim()) stored.ttsFormat = next.ttsFormat.trim();
else delete stored.ttsFormat;
localStorage.setItem(STORAGE_KEY, JSON.stringify(stored));
window.dispatchEvent(new Event(VOICE_CONFIG_SYNC_EVENT));
} catch {
/* ignore persistence errors */
}
return next;
});
};
return { config, update };
}

View File

@@ -122,6 +122,14 @@
}
}
},
"voice": {
"input": "Voice input",
"stopRecording": "Stop recording",
"transcribing": "Transcribing…",
"speak": "Read aloud",
"stopSpeaking": "Stop",
"loading": "Loading…"
},
"input": {
"placeholder": "Type / for commands, @ for files, or ask {{provider}} anything...",
"placeholderDefault": "Type your message...",

View File

@@ -50,6 +50,21 @@
"resetToDefaults": "Reset to Defaults",
"cancelChanges": "Cancel Changes"
},
"voiceSettings": {
"title": "Voice",
"description": "Speech-to-text input and read-aloud, via an OpenAI-compatible audio backend.",
"enable": "Enable voice",
"enableDescription": "Show the mic button and the read-aloud button on messages.",
"backendTitle": "Backend",
"backendDescription": "Point at OpenAI, Groq, or a local server (LocalAI, Speaches, Kokoro-FastAPI). Leave blank to use the server default.",
"baseUrl": "Base URL",
"apiKey": "API key",
"sttModel": "Speech-to-text model",
"ttsModel": "Text-to-speech model",
"voice": "Voice",
"format": "Audio format",
"note": "A custom base URL is called directly by your browser and must allow browser CORS requests. Leave it blank to use the server-configured backend."
},
"quickSettings": {
"title": "Quick Settings",
"sections": {
@@ -64,6 +79,7 @@
"showThinking": "Show thinking",
"autoScrollToBottom": "Auto-scroll to bottom",
"sendByCtrlEnter": "Send by Ctrl+Enter",
"voiceEnabled": "Voice (mic + read aloud)",
"sendByCtrlEnterDescription": "When enabled, pressing Ctrl+Enter will send the message instead of just Enter. This is useful for IME users to avoid accidental sends.",
"dragHandle": {
"dragging": "Dragging handle",
@@ -94,6 +110,7 @@
"appearance": "Appearance",
"git": "Git",
"apiTokens": "API & Tokens",
"voice": "Voice",
"tasks": "Tasks",
"browser": "Browser",
"notifications": "Notifications",

60
src/lib/voiceApi.ts Normal file
View File

@@ -0,0 +1,60 @@
import { authenticatedFetch } from '../utils/api';
import { readVoiceConfig, voiceConfigHeaders } from '../hooks/useVoiceConfig';
function directUrl(baseUrl: string, path: string): string {
return `${baseUrl.replace(/\/$/, '')}${path}`;
}
export function voiceConfigSignature(): string {
return JSON.stringify(readVoiceConfig());
}
export function transcribeVoice(blob: Blob, filename: string): Promise<Response> {
const config = readVoiceConfig();
const body = new FormData();
if (config.baseUrl.trim()) {
body.append('file', blob, filename);
body.append('model', config.sttModel || 'whisper-1');
return fetch(directUrl(config.baseUrl.trim(), '/audio/transcriptions'), {
method: 'POST',
headers: config.apiKey ? { Authorization: `Bearer ${config.apiKey}` } : {},
body,
});
}
body.append('audio', blob, filename);
return authenticatedFetch('/api/voice/transcribe', {
method: 'POST',
headers: voiceConfigHeaders(),
body,
});
}
export function synthesizeVoice(text: string, signal: AbortSignal): Promise<Response> {
const config = readVoiceConfig();
if (config.baseUrl.trim()) {
return fetch(directUrl(config.baseUrl.trim(), '/audio/speech'), {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...(config.apiKey ? { Authorization: `Bearer ${config.apiKey}` } : {}),
},
body: JSON.stringify({
model: config.ttsModel || 'tts-1',
voice: config.ttsVoice || 'alloy',
input: text,
...(config.ttsFormat.trim() ? { response_format: config.ttsFormat.trim() } : {}),
}),
signal,
});
}
return authenticatedFetch('/api/voice/tts', {
method: 'POST',
body: JSON.stringify({ text }),
headers: voiceConfigHeaders(),
signal,
});
}

196
src/lib/voicePlayer.ts Normal file
View File

@@ -0,0 +1,196 @@
import { synthesizeVoice, voiceConfigSignature } from './voiceApi';
// A single app-level audio player for read-aloud. It owns one <audio> element, lives
// outside the React tree, and caches generated audio by content. Because playback is not
// tied to a component, switching chats or re-rendering a message can't revoke the blob URL
// out from under it (the cause of mid-play cutoffs). v1 plays one message at a time
// (a new play replaces the current one); the design leaves room for a queue later.
export type VoicePlayState = 'idle' | 'loading' | 'playing';
export type VoiceSnapshot = { state: VoicePlayState; error: string | null };
const IDLE: VoiceSnapshot = { state: 'idle', error: null };
const CACHE_MAX = 24;
const CLIENT_TIMEOUT_MS = 330000; // backstop; the server proxy already times out at 5 min
// Stable id / cache key from the text and voice settings that affect its audio (djb2).
export function voiceId(content: string, signature = voiceConfigSignature()): string {
const input = JSON.stringify([content, signature]);
let h = 5381;
for (let i = 0; i < input.length; i++) h = (((h << 5) + h) + input.charCodeAt(i)) | 0;
return (h >>> 0).toString(36);
}
class VoicePlayer {
private audio: HTMLAudioElement | null = null;
private unlocked = false;
private cache = new Map<string, string>(); // id -> blob URL (insertion order = LRU)
private currentId: string | null = null;
private state: VoicePlayState = 'idle';
private errorId: string | null = null;
private errorMsg: string | null = null;
private token = 0; // bumps to ignore stale in-flight results
private activeController: AbortController | null = null; // aborts the in-flight TTS fetch
private errorTimer: ReturnType<typeof setTimeout> | null = null;
private listeners = new Set<() => void>();
subscribe(listener: () => void): () => void {
this.listeners.add(listener);
return () => {
this.listeners.delete(listener);
};
}
private emit() {
this.listeners.forEach((l) => l());
}
getSnapshot(id: string): VoiceSnapshot {
const state = this.currentId === id ? this.state : 'idle';
const error = this.errorId === id ? this.errorMsg : null;
if (state === 'idle' && error === null) return IDLE;
return { state, error };
}
private ensureAudio(): HTMLAudioElement {
if (!this.audio) {
const audio = new Audio();
audio.addEventListener('ended', () => this.onEnded());
audio.addEventListener('error', () => {
// Only meaningful while we believe we're playing.
if (this.state === 'playing') this.onEnded();
});
this.audio = audio;
}
return this.audio;
}
// Call synchronously from the click handler so iOS grants the (reused) element playback.
unlock() {
if (this.unlocked) return;
const audio = this.ensureAudio();
try {
const p = audio.play();
if (p && typeof p.catch === 'function') p.catch(() => {});
audio.pause();
} catch {
/* priming attempt; ignore */
}
this.unlocked = true;
}
toggle(content: string) {
const id = voiceId(content);
if (this.currentId === id && (this.state === 'playing' || this.state === 'loading')) {
this.stop();
return;
}
void this.play(id, content);
}
stop() {
this.token++; // ignore any stale in-flight result
this.abortActive(); // and actually cancel the network request
if (this.audio) this.audio.pause();
this.state = 'idle';
this.currentId = null;
this.emit();
}
private abortActive() {
if (this.activeController) {
this.activeController.abort();
this.activeController = null;
}
}
private onEnded() {
this.state = 'idle';
this.currentId = null;
this.emit();
// (queue auto-advance would hook in here)
}
private setError(id: string, msg: string) {
this.state = 'idle';
this.currentId = id;
this.errorId = id;
this.errorMsg = msg;
this.emit();
if (this.errorTimer) clearTimeout(this.errorTimer);
this.errorTimer = setTimeout(() => {
if (this.errorId === id) {
this.errorId = null;
this.errorMsg = null;
if (this.currentId === id) this.currentId = null;
this.emit();
}
}, 6000);
}
private async play(id: string, content: string) {
const audio = this.ensureAudio();
audio.pause();
this.currentId = id;
this.errorId = null;
this.errorMsg = null;
this.state = 'loading';
this.emit();
const myToken = ++this.token;
this.abortActive(); // cancel any request this play supersedes
try {
let url = this.cache.get(id);
if (!url) {
const controller = new AbortController();
this.activeController = controller;
const timer = setTimeout(() => controller.abort(), CLIENT_TIMEOUT_MS);
const res = await synthesizeVoice(content, controller.signal).finally(() => {
clearTimeout(timer);
if (this.activeController === controller) this.activeController = null;
});
if (myToken !== this.token) return; // superseded by another play/stop
if (!res.ok) {
let msg = `Read-aloud failed (${res.status})`;
try {
const j = await res.json();
if (j?.error) msg = String(j.error);
} catch {
/* non-JSON error body */
}
throw new Error(msg);
}
const blob = await res.blob();
if (myToken !== this.token) return;
url = URL.createObjectURL(blob);
this.cacheSet(id, url);
}
if (myToken !== this.token) return;
audio.src = url;
audio.load();
await audio.play();
if (myToken !== this.token) return;
this.state = 'playing';
this.emit();
} catch (e) {
if (myToken !== this.token) return;
const aborted = e instanceof Error && e.name === 'AbortError';
this.setError(id, aborted ? 'Read-aloud timed out.' : e instanceof Error ? e.message : 'Read-aloud failed');
}
}
private cacheSet(id: string, url: string) {
this.cache.set(id, url);
while (this.cache.size > CACHE_MAX) {
const oldest = this.cache.keys().next().value as string | undefined;
if (oldest === undefined) break;
const oldUrl = this.cache.get(oldest);
this.cache.delete(oldest);
if (oldUrl && oldUrl !== this.audio?.src) URL.revokeObjectURL(oldUrl);
}
}
}
export const voicePlayer = new VoicePlayer();

View File

@@ -37,7 +37,7 @@ export default function LanguageSelector({ compact = false }: LanguageSelectorPr
<select
value={i18n.language}
onChange={handleLanguageChange}
className="w-[100px] rounded-lg border border-input bg-card p-2 text-sm text-foreground focus:border-primary focus:outline-none focus:ring-2 focus:ring-primary"
className="w-auto min-w-[120px] max-w-[160px] rounded-lg border border-input bg-card p-2 text-sm text-foreground focus:border-primary focus:outline-none focus:ring-2 focus:ring-primary"
>
{languages.map((lang) => (
<option key={lang.value} value={lang.value}>