Compare commits

..

8 Commits

Author SHA1 Message Date
Simos Mikelatos
fff89e6132 Merge branch 'main' into camoufox-novnc-browser-use 2026-06-26 16:07:22 +02:00
Haile
591e8e7642 fix: voice tts format settings (#919)
* feat(voice): add optional speech-to-text input and read-aloud TTS

Adds a push-to-talk mic button in the composer and a read-aloud button on
assistant messages. Both are opt-in and hidden unless a voice backend is
configured via VOICE_SIDECAR_URL.

The auth-gated /api/voice proxy forwards to a configurable backend exposing
/transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health
and hides the controls when disabled. Adds i18n keys and docs/voice.md.

Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper
for STT, Kokoro-82M for TTS, both CPU-capable).

* refactor(voice): provider-agnostic backend and in-app config

Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.

Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.

* fix(voice): relax backend timeout and surface timeout errors

Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can
synthesize long messages at roughly real-time, and returns a clear timed-out
message (504) instead of failing silently. The read-aloud button now shows
backend errors.

* fix(voice): play read-aloud through an app-level player to stop cutoffs

Read-aloud now runs in a single module-level player outside the React tree instead
of per-message component state. Switching chats or re-rendering a message no longer
revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so
re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once).

* fix(voice): address review (SSRF guard, auth mapping, client timeout)

Validates the user-supplied backend URL (http/https only, blocks the link-local
metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key
isn't read as the app's own auth failing; adds a client-side AbortController timeout
on the read-aloud request so the button can't sit in loading if a request stalls.

* docs(voice): provider-agnostic wording and jsdoc on proxy functions

drop leftover sidecar/faster-whisper references now that the backend is any
openai-compatible voice api, and add jsdoc to the voice-proxy functions so the
docstring coverage check passes.

* fix(voice): harden timeout parsing, tts input check, and player abort

- fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a
  bad override can't make the abort fire immediately
- type-check the tts `text` before calling .trim() so a non-string body returns
  400 instead of throwing
- abort the in-flight TTS fetch on stop() and on a superseding play, so tapping
  read-aloud repeatedly doesn't leave orphaned requests generating audio

* feat(voice): send transcript with the main send button while recording

while dictating, the main send button stops recording, transcribes, and sends
in one tap, matching the codex-style flow. the mic button still stops and drops
the transcript into the input box to edit before sending. voice recording state
is lifted into the composer so both buttons share it, and the send button is
enabled (not grayed) while recording. also fix a pre-existing type error: the
quick-settings preferences map was missing voiceEnabled.

* fix(voice): make stop() idempotent so a double tap can't throw

guard on the recorder's own state instead of react state, so a double tap or
the mic and send buttons both firing won't call stop() on an already-inactive
MediaRecorder.

* fix(voice): expose TTS format in user settings

* fix(voice): harden recording and backend behavior

Redirects could bypass the backend URL guard, and TTS playback waited for full buffering.

Recording could overlap or finish after teardown. Controls also ignored backend readiness.

Explicit formats and config-aware cache keys prevent stale audio after settings change.

* fix(voice): validate config and request boundaries

Malformed stored settings could break voice requests instead of using safe defaults.

Health results could outlive auth changes. URL checks also did not guard the fetch sink.

Remove constant recorder branches so lifecycle cancellation stays clear.

* fix(voice): separate client and server backends

User-selected backend URLs must remain usable without letting clients control server requests.

Call custom providers from the browser while keeping the server proxy bound to its configured host.

This restores voice controls for frontend settings without reopening the SSRF path.

* fix: hide voice options until enabled

---------

Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:06:40 +02:00
Haile
c947eaaee5 feat: play sound for pending tool requests (#918) 2026-06-25 14:57:10 +02:00
Simos Mikelatos
8adcdaa0e5 Merge branch 'main' into camoufox-novnc-browser-use 2026-06-24 23:16:13 +02:00
Simos Mikelatos
0610cc8333 fix: browser use set profile root folder 2026-06-24 19:21:52 +00:00
Haile
4a503b1dc8 fix(shell): prioritize user npm binaries (#913)
Interactive shells could resolve bundled or system CLIs before user-installed npm binaries.

Move existing user npm global directories to the front of PATH while preserving all other entries.

Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-24 20:15:52 +02:00
Simos Mikelatos
9457651fdd fix(browser-use): harden browser settings state 2026-06-24 15:36:25 +00:00
Simos Mikelatos
8c31ebcc63 feat(browser-use): add Camoufox noVNC session viewer 2026-06-24 14:39:41 +00:00
41 changed files with 2498 additions and 211 deletions

View File

@@ -69,7 +69,7 @@ const sessionIdSchema = {
const tools: ToolDefinition[] = [ const tools: ToolDefinition[] = [
{ {
name: 'browser_create_session', name: 'browser_create_session',
description: 'Create a temporary Browser session that the agent can control. Optionally provide a background profileName to reuse cookies and storage.', description: 'Create a Browser session that the agent can control. Provide profileName to use a specific persistent profile; when omitted, the configured persistent profile is used only if session persistence is enabled, otherwise a temporary session is created.',
inputSchema: { inputSchema: {
type: 'object', type: 'object',
properties: { properties: {

View File

@@ -61,9 +61,10 @@ import userRoutes from './routes/user.js';
import geminiRoutes from './routes/gemini.js'; import geminiRoutes from './routes/gemini.js';
import pluginsRoutes from './routes/plugins.js'; import pluginsRoutes from './routes/plugins.js';
import providerRoutes from './modules/providers/provider.routes.js'; import providerRoutes from './modules/providers/provider.routes.js';
import voiceRoutes from './voice-proxy.js';
import browserUseRoutes from './modules/browser-use/browser-use.routes.js'; import browserUseRoutes from './modules/browser-use/browser-use.routes.js';
import browserUseMcpRoutes from './modules/browser-use/browser-use-mcp.routes.js'; import browserUseMcpRoutes from './modules/browser-use/browser-use-mcp.routes.js';
import { browserUseService } from './modules/browser-use/browser-use.service.js'; import { browserUseService, VIEWER_COOKIE_NAME } from './modules/browser-use/index.js';
import { startEnabledPluginServers, stopAllPlugins, getPluginPort } from './utils/plugin-process-manager.js'; import { startEnabledPluginServers, stopAllPlugins, getPluginPort } from './utils/plugin-process-manager.js';
import { initializeDatabase, projectsDb, sessionsDb } from './modules/database/index.js'; import { initializeDatabase, projectsDb, sessionsDb } from './modules/database/index.js';
import { configureWebPush } from './services/vapid-keys.js'; import { configureWebPush } from './services/vapid-keys.js';
@@ -145,6 +146,8 @@ const wss = createWebSocketServer(server, {
shouldAutoOpenUrlFromOutput, shouldAutoOpenUrlFromOutput,
}, },
getPluginPort, getPluginPort,
browserUseViewer: (ws, pathname) => browserUseService.handleViewerWebSocket(ws, pathname),
authenticateBrowserUseViewer: authenticateBrowserUseViewerPath,
}); });
// Make WebSocket server available to routes // Make WebSocket server available to routes
@@ -210,11 +213,42 @@ app.use('/api/gemini', authenticateToken, geminiRoutes);
// Plugins API Routes (protected) // Plugins API Routes (protected)
app.use('/api/plugins', authenticateToken, pluginsRoutes); app.use('/api/plugins', authenticateToken, pluginsRoutes);
function readCookieValue(header, name) {
if (!header) return null;
const prefix = `${name}=`;
const cookie = String(header).split(';').map((part) => part.trim()).find((part) => part.startsWith(prefix));
return cookie ? decodeURIComponent(cookie.slice(prefix.length)) : null;
}
function authenticateBrowserUseViewerPath(pathname, token) {
const parts = String(pathname || '').split('/');
const sessionId = parts[4];
if (parts[1] !== 'api' || parts[2] !== 'browser-use' || parts[3] !== 'sessions' || parts[5] !== 'viewer' || parts[6] !== 'websockify') {
return false;
}
return browserUseService.validateViewerToken(decodeURIComponent(sessionId), token);
}
function authenticateBrowserUse(req, res, next) {
const match = /^\/sessions\/([^/]+)\/viewer(?:\/|$)/.exec(req.path || '');
if (match) {
const sessionId = decodeURIComponent(match[1]);
const token = typeof req.query.viewerToken === 'string'
? req.query.viewerToken
: readCookieValue(req.headers.cookie, VIEWER_COOKIE_NAME);
if (browserUseService.validateViewerToken(sessionId, token)) {
return next();
}
return res.status(401).json({ error: 'Browser viewer access requires a valid session token.' });
}
return authenticateToken(req, res, next);
}
// Browser MCP bridge API (local token protected) // Browser MCP bridge API (local token protected)
app.use('/api/browser-use-mcp', browserUseMcpRoutes); app.use('/api/browser-use-mcp', browserUseMcpRoutes);
// Browser API Routes (protected) // Browser API Routes (protected)
app.use('/api/browser-use', authenticateToken, browserUseRoutes); app.use('/api/browser-use', authenticateBrowserUse, browserUseRoutes);
// Unified provider MCP routes (protected) // Unified provider MCP routes (protected)
app.use('/api/providers', authenticateToken, providerRoutes); app.use('/api/providers', authenticateToken, providerRoutes);
@@ -222,6 +256,8 @@ app.use('/api/providers', authenticateToken, providerRoutes);
// Agent API Routes (uses API key authentication) // Agent API Routes (uses API key authentication)
app.use('/api/agent', agentRoutes); app.use('/api/agent', agentRoutes);
app.use('/api/voice', authenticateToken, voiceRoutes);
// Serve public files (like api-docs.html) // Serve public files (like api-docs.html)
app.use(express.static(path.join(APP_ROOT, 'public'))); app.use(express.static(path.join(APP_ROOT, 'public')));

View File

@@ -1,6 +1,7 @@
import express from 'express'; import express from 'express';
import { browserUseService } from '@/modules/browser-use/browser-use.service.js'; import { browserUseService } from '@/modules/browser-use/browser-use.service.js';
import { VIEWER_COOKIE_NAME, VIEWER_TOKEN_TTL_MS } from '@/modules/browser-use/browser-use.viewer.js';
const router = express.Router(); const router = express.Router();
@@ -8,6 +9,45 @@ function readParam(value: string | string[] | undefined): string {
return Array.isArray(value) ? value[0] || '' : value || ''; return Array.isArray(value) ? value[0] || '' : value || '';
} }
const SAFE_VIEWER_ROOT_FILES = new Set(['vnc.html', 'favicon.ico', 'manifest.json']);
const SAFE_VIEWER_ROOT_DIRS = new Set(['app', 'core', 'vendor', 'assets', 'images', 'utils']);
function isSafeViewerPath(viewerPath: string): boolean {
if (!viewerPath || viewerPath.startsWith('/') || viewerPath.includes('..') || viewerPath.includes('\\')) {
return false;
}
if (!/^[A-Za-z0-9][A-Za-z0-9._~/-]*$/.test(viewerPath)) {
return false;
}
if (SAFE_VIEWER_ROOT_FILES.has(viewerPath)) {
return true;
}
const [rootDir] = viewerPath.split('/');
return Boolean(rootDir && SAFE_VIEWER_ROOT_DIRS.has(rootDir));
}
function isSecureRequest(req: express.Request): boolean {
const forwardedProto = String(req.headers['x-forwarded-proto'] || '')
.split(',')[0]
.trim()
.toLowerCase();
return req.secure || forwardedProto === 'https';
}
function readQueryString(originalUrl: string): string {
const queryIndex = originalUrl.indexOf('?');
if (queryIndex < 0) {
return '';
}
const params = new URLSearchParams(originalUrl.slice(queryIndex + 1));
params.delete('viewerToken');
const nextQuery = params.toString();
return nextQuery ? `?${nextQuery}` : '';
}
router.get('/status', async (_req, res) => { router.get('/status', async (_req, res) => {
try { try {
res.json({ success: true, data: await browserUseService.getStatus() }); res.json({ success: true, data: await browserUseService.getStatus() });
@@ -62,13 +102,60 @@ router.get('/sessions', async (_req, res) => {
try { try {
res.json({ success: true, data: { sessions: await browserUseService.listSessions() } }); res.json({ success: true, data: { sessions: await browserUseService.listSessions() } });
} catch (error) { } catch (error) {
res.status(401).json({ res.status(500).json({
success: false, success: false,
error: error instanceof Error ? error.message : 'Failed to list browser sessions.', error: error instanceof Error ? error.message : 'Failed to list browser sessions.',
}); });
} }
}); });
router.get('/sessions/:sessionId/viewer/*', async (req, res) => {
try {
const sessionId = readParam(req.params.sessionId);
const originalPath = req.originalUrl.split('?')[0] || '';
const viewerMarker = `/sessions/${sessionId}/viewer/`;
const markerIndex = originalPath.indexOf(viewerMarker);
const rawViewerPath = markerIndex >= 0 ? originalPath.slice(markerIndex + viewerMarker.length) : 'vnc.html';
const viewerPath = decodeURIComponent(rawViewerPath).replace(/^\/+/, '') || 'vnc.html';
if (!isSafeViewerPath(viewerPath)) {
res.status(400).json({ success: false, error: 'Invalid Browser viewer path.' });
return;
}
const viewerToken = readParam(req.query.viewerToken as string | string[] | undefined);
if (viewerPath === 'vnc.html' && browserUseService.validateViewerToken(sessionId, viewerToken)) {
res.cookie(VIEWER_COOKIE_NAME, viewerToken, {
httpOnly: true,
sameSite: 'lax',
secure: isSecureRequest(req),
maxAge: VIEWER_TOKEN_TTL_MS,
path: '/api/browser-use/sessions/' + encodeURIComponent(sessionId) + '/viewer',
});
}
const target = browserUseService.getViewerProxyTarget(sessionId);
const query = readQueryString(req.originalUrl);
const upstream = await fetch(`http://127.0.0.1:${target.websockifyPort}/${viewerPath}${query}`, {
headers: {
accept: String(req.headers.accept || '*/*'),
},
});
const contentType = upstream.headers.get('content-type');
if (contentType) {
res.setHeader('content-type', contentType);
}
const cacheControl = viewerPath === 'vnc.html' ? 'no-store' : 'public, max-age=3600';
res.setHeader('cache-control', cacheControl);
res.status(upstream.status);
const body = Buffer.from(await upstream.arrayBuffer());
res.send(body);
} catch (error) {
res.status(404).json({
success: false,
error: error instanceof Error ? error.message : 'Browser viewer is not available.',
});
}
});
router.post('/sessions/:sessionId/stop', async (req, res) => { router.post('/sessions/:sessionId/stop', async (req, res) => {
try { try {
const result = await browserUseService.stopSession(readParam(req.params.sessionId)); const result = await browserUseService.stopSession(readParam(req.params.sessionId));

View File

@@ -1,126 +1,84 @@
import { createRequire } from 'node:module'; import { createRequire } from 'node:module';
import { randomBytes, randomUUID } from 'node:crypto'; import { randomUUID } from 'node:crypto';
import { spawn } from 'node:child_process'; import { execFileSync, spawn } from 'node:child_process';
import fs from 'node:fs'; import fs from 'node:fs';
import net from 'node:net';
import os from 'node:os'; import os from 'node:os';
import path from 'node:path'; import path from 'node:path';
import { appConfigDb } from '@/modules/database/index.js'; import { WebSocket } from 'ws';
import { providerMcpService } from '@/modules/providers/index.js'; import { providerMcpService } from '@/modules/providers/index.js';
import { getModuleDir } from '@/utils/runtime-paths.js'; import { getModuleDir } from '@/utils/runtime-paths.js';
import {
getOrCreateMcpToken,
getProfilePath,
normalizeBrowserBackend,
PROFILE_ROOT,
readSettings,
resolveSessionProfileName,
useVisibleCamoufoxBackend,
writeSettings,
} from './browser-use.settings.js';
import type {
BrowserUseSession,
BrowserUseSettings,
PublicBrowserUseSession,
RuntimeHandle,
RuntimeProbe,
RuntimeReadiness,
} from './browser-use.types.js';
import { getViewerUrl, handleViewerWebSocket, VIEWER_TOKEN_TTL_MS } from './browser-use.viewer.js';
const require = createRequire(import.meta.url); const require = createRequire(import.meta.url);
const __dirname = getModuleDir(import.meta.url); const __dirname = getModuleDir(import.meta.url);
const IS_PLATFORM = process.env.VITE_IS_PLATFORM === 'true'; const IS_PLATFORM = process.env.VITE_IS_PLATFORM === 'true';
const MAX_SESSIONS_PER_OWNER = Number.parseInt(process.env.CLOUDCLI_BROWSER_USE_MAX_SESSIONS_PER_OWNER || '3', 10); const MAX_SESSIONS_PER_OWNER = Number.parseInt(process.env.CLOUDCLI_BROWSER_USE_MAX_SESSIONS_PER_OWNER || '3', 10);
const SESSION_TTL_MS = Number.parseInt(process.env.CLOUDCLI_BROWSER_USE_SESSION_TTL_MS || String(30 * 60 * 1000), 10); const SESSION_TTL_MS = Number.parseInt(process.env.CLOUDCLI_BROWSER_USE_SESSION_TTL_MS || String(30 * 60 * 1000), 10);
const BROWSER_USE_SETTINGS_KEY = 'browser_use_settings';
const BROWSER_USE_MCP_TOKEN_KEY = 'browser_use_mcp_token';
type BrowserUseRuntime = 'cloud' | 'local';
type BrowserUseSessionStatus = 'ready' | 'stopped' | 'unavailable';
type BrowserUseSession = {
id: string;
ownerId: string;
createdBy: 'agent';
runtime: BrowserUseRuntime;
status: BrowserUseSessionStatus;
url: string | null;
title: string | null;
screenshotDataUrl: string | null;
createdAt: string;
updatedAt: string;
lastAction: string | null;
message: string | null;
profileName: string | null;
viewport: {
width: number;
height: number;
} | null;
cursor: {
x: number;
y: number;
actor: 'agent';
} | null;
};
type PublicBrowserUseSession = Omit<BrowserUseSession, 'ownerId'>;
type RuntimeHandle = {
browser?: any;
context?: any;
page?: any;
};
type BrowserUseSettings = {
enabled: boolean;
};
type RuntimeReadiness = {
playwright: any | null;
playwrightInstalled: boolean;
chromiumInstalled: boolean;
chromiumExecutablePath: string | null;
installInProgress: boolean;
installMessage: string | null;
};
type RuntimeProbe = Omit<RuntimeReadiness, 'installInProgress' | 'installMessage'>;
const sessions = new Map<string, BrowserUseSession>(); const sessions = new Map<string, BrowserUseSession>();
const handles = new Map<string, RuntimeHandle>(); const handles = new Map<string, RuntimeHandle>();
const reservedDisplays = new Set<string>();
const viewerTokens = new Map<string, { token: string; expiresAt: number }>();
let installPromise: Promise<{ success: boolean; message: string }> | null = null; let installPromise: Promise<{ success: boolean; message: string }> | null = null;
let lastInstallMessage: string | null = null; let lastInstallMessage: string | null = null;
let runtimeProbeCache: { value: RuntimeProbe; updatedAt: number } | null = null; let runtimeProbeCache: { value: RuntimeProbe; updatedAt: number } | null = null;
const DEFAULT_SETTINGS: BrowserUseSettings = {
enabled: false,
};
const AGENT_OWNER_ID = 'agent'; const AGENT_OWNER_ID = 'agent';
const PROFILE_ROOT = path.join(os.homedir(), '.cloudcli', 'browser-use', 'profiles');
const MCP_SERVER_NAME = 'cloudcli-browser'; const MCP_SERVER_NAME = 'cloudcli-browser';
const LEGACY_MCP_SERVER_NAMES = ['cloudcli-browser-use']; const LEGACY_MCP_SERVER_NAMES = ['cloudcli-browser-use'];
const RUNTIME_READINESS_CACHE_TTL_MS = 30_000; const RUNTIME_READINESS_CACHE_TTL_MS = 30_000;
const VISIBLE_BROWSER_ENABLED = process.env.CLOUDCLI_BROWSER_USE_VISIBLE !== 'false';
const RUNTIME_ROOT = process.env.CLOUDCLI_BROWSER_USE_RUNTIME_ROOT || '/opt/claudecodeui/.runtime-browser';
const NOVNC_ROOT = process.env.CLOUDCLI_BROWSER_USE_NOVNC_ROOT || path.join(RUNTIME_ROOT, 'novnc');
const X11VNC_BIN = process.env.CLOUDCLI_BROWSER_USE_X11VNC_BIN || path.join(RUNTIME_ROOT, 'rootfs/usr/bin/x11vnc');
const X11VNC_LIB_DIR = process.env.CLOUDCLI_BROWSER_USE_X11VNC_LIB_DIR || path.join(RUNTIME_ROOT, 'rootfs/usr/lib/x86_64-linux-gnu');
const X11VNC_EXTRA_LIB_DIR = process.env.CLOUDCLI_BROWSER_USE_X11VNC_EXTRA_LIB_DIR || path.join(RUNTIME_ROOT, 'rootfs/lib/x86_64-linux-gnu');
const LOG_RUNTIME_PROCESS_OUTPUT = process.env.CLOUDCLI_BROWSER_USE_RUNTIME_LOGS === 'true';
function getRuntime(): BrowserUseRuntime { function getRuntime(): 'cloud' | 'local' {
return IS_PLATFORM ? 'cloud' : 'local'; return IS_PLATFORM ? 'cloud' : 'local';
} }
function readSettings(): BrowserUseSettings { function getCamoufoxExecutablePath(): string | null {
const configured = process.env.CLOUDCLI_BROWSER_USE_CAMOUFOX_EXECUTABLE;
if (configured && fs.existsSync(configured)) {
return configured;
}
try { try {
const raw = appConfigDb.get(BROWSER_USE_SETTINGS_KEY); const output = execFileSync(path.join(os.homedir(), '.local/bin/camoufox'), ['path'], {
if (!raw) { encoding: 'utf8',
return DEFAULT_SETTINGS; stdio: ['ignore', 'pipe', 'ignore'],
}).trim();
const executablePath = fs.statSync(output).isDirectory()
? path.join(output, 'camoufox')
: output;
return fs.existsSync(executablePath) ? executablePath : null;
} catch {
return null;
} }
const parsed = JSON.parse(raw) as Partial<BrowserUseSettings>;
return {
enabled: parsed.enabled === true,
};
} catch (error: any) {
console.warn('[Browser] Failed to read settings:', error?.message || error);
return DEFAULT_SETTINGS;
}
}
function writeSettings(settings: BrowserUseSettings): BrowserUseSettings {
const normalized = {
enabled: settings.enabled === true,
};
appConfigDb.set(BROWSER_USE_SETTINGS_KEY, JSON.stringify(normalized));
return normalized;
}
function getOrCreateMcpToken(): string {
const existing = appConfigDb.get(BROWSER_USE_MCP_TOKEN_KEY);
if (existing) {
return existing;
}
const token = randomBytes(32).toString('hex');
appConfigDb.set(BROWSER_USE_MCP_TOKEN_KEY, token);
return token;
} }
function getSetupMessage(settings: BrowserUseSettings, readiness: RuntimeReadiness): string { function getSetupMessage(settings: BrowserUseSettings, readiness: RuntimeReadiness): string {
@@ -132,6 +90,26 @@ function getSetupMessage(settings: BrowserUseSettings, readiness: RuntimeReadine
return 'Install Playwright and Chromium to use browser sessions.'; return 'Install Playwright and Chromium to use browser sessions.';
} }
if (settings.browserBackend === 'camoufox-vnc' && !getCamoufoxExecutablePath()) {
return 'Camoufox is selected, but Camoufox is not installed.';
}
if (useVisibleCamoufoxBackend(settings)) {
if (!VISIBLE_BROWSER_ENABLED) {
return 'Camoufox is selected, but visible browser sessions are disabled.';
}
if (!getCamoufoxExecutablePath()) {
return 'Camoufox is selected, but Camoufox is not installed.';
}
if (!fs.existsSync(X11VNC_BIN)) {
return 'Camoufox is selected, but x11vnc is missing.';
}
if (!fs.existsSync(path.join(NOVNC_ROOT, 'vnc.html'))) {
return 'Camoufox is selected, but noVNC is missing.';
}
return readiness.installMessage || 'Camoufox runtime is not ready.';
}
if (!readiness.chromiumInstalled) { if (!readiness.chromiumInstalled) {
return 'Playwright is installed, but Chromium is missing. Install the Chromium runtime to continue.'; return 'Playwright is installed, but Chromium is missing. Install the Chromium runtime to continue.';
} }
@@ -176,24 +154,6 @@ async function removeMcpServerFromAllProviders(name: string) {
return results.map((result) => ({ ...result, name })); return results.map((result) => ({ ...result, name }));
} }
function normalizeProfileName(profileName?: string | null): string | null {
const normalized = String(profileName || '').trim();
if (!normalized) {
return null;
}
return normalized.slice(0, 80);
}
function getProfilePath(profileName: string): string {
const safeName = profileName
.toLowerCase()
.replace(/[^a-z0-9._-]+/g, '-')
.replace(/^-+|-+$/g, '')
.slice(0, 80) || 'default';
return path.join(PROFILE_ROOT, safeName);
}
function probeRuntime(): RuntimeProbe { function probeRuntime(): RuntimeProbe {
const playwright = getPlaywright(); const playwright = getPlaywright();
const readiness: RuntimeProbe = { const readiness: RuntimeProbe = {
@@ -238,6 +198,175 @@ function getRuntimeReadiness(options: { force?: boolean } = {}): RuntimeReadines
}; };
} }
function findAvailablePort(): Promise<number> {
return new Promise((resolve, reject) => {
const server = net.createServer();
server.on('error', reject);
server.listen(0, '127.0.0.1', () => {
const address = server.address();
server.close(() => {
if (typeof address === 'object' && address?.port) {
resolve(address.port);
} else {
reject(new Error('Failed to reserve a browser runtime port.'));
}
});
});
});
}
function delay(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function isRuntimeProcessAlive(child: ReturnType<typeof spawn>): boolean {
return child.exitCode === null && child.signalCode === null && !child.killed;
}
function assertRuntimeProcessesAlive(processes: Array<ReturnType<typeof spawn>>, label: string) {
const exited = processes.find((child) => !isRuntimeProcessAlive(child));
if (exited) {
throw new Error(`${label} exited before the Browser viewer runtime was ready.`);
}
}
async function isPortListening(port: number): Promise<boolean> {
return new Promise((resolve) => {
const socket = net.createConnection({ host: '127.0.0.1', port });
let settled = false;
const finish = (listening: boolean) => {
if (settled) {
return;
}
settled = true;
socket.destroy();
resolve(listening);
};
socket.setTimeout(250);
socket.once('connect', () => finish(true));
socket.once('timeout', () => finish(false));
socket.once('error', () => finish(false));
});
}
async function waitForRuntimePort(
port: number,
label: string,
processes: Array<ReturnType<typeof spawn>>,
timeoutMs = 5_000,
) {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
assertRuntimeProcessesAlive(processes, label);
if (await isPortListening(port)) {
return;
}
await delay(100);
}
assertRuntimeProcessesAlive(processes, label);
throw new Error(`${label} did not start listening on 127.0.0.1:${port}.`);
}
function killRuntimeProcesses(processes?: Array<ReturnType<typeof spawn>>) {
processes?.forEach((child) => child.kill('SIGTERM'));
}
function reserveDisplay(): string {
for (let index = 90; index < 140; index += 1) {
const display = `:${index}`;
if (!reservedDisplays.has(display)) {
reservedDisplays.add(display);
return display;
}
}
throw new Error('No browser display slots are available.');
}
function spawnRuntimeProcess(command: string, args: string[], options: { env?: NodeJS.ProcessEnv } = {}) {
const child = spawn(command, args, {
env: { ...process.env, ...options.env },
stdio: ['ignore', 'ignore', 'pipe'],
});
child.stderr?.on('data', (chunk) => {
if (!LOG_RUNTIME_PROCESS_OUTPUT) {
return;
}
const text = String(chunk).trim();
if (text) {
console.warn(`[Browser runtime] ${path.basename(command)}: ${text}`);
}
});
child.on('error', (error) => {
console.warn(`[Browser runtime] ${path.basename(command)} failed:`, error.message);
});
return child;
}
async function startVisibleRuntime(): Promise<NonNullable<RuntimeHandle['viewer']> & { processes: Array<ReturnType<typeof spawn>> }> {
const display = reserveDisplay();
const vncPort = await findAvailablePort();
const websockifyPort = await findAvailablePort();
const processes: Array<ReturnType<typeof spawn>> = [];
try {
processes.push(spawnRuntimeProcess('Xvfb', [
display,
'-screen',
'0',
'1440x900x24',
'-ac',
'-nolisten',
'tcp',
]));
await delay(700);
assertRuntimeProcessesAlive(processes, 'Xvfb');
if (!fs.existsSync(X11VNC_BIN)) {
throw new Error(`x11vnc is missing at ${X11VNC_BIN}.`);
}
processes.push(spawnRuntimeProcess(X11VNC_BIN, [
'-display',
display,
'-localhost',
'-forever',
'-shared',
'-rfbport',
String(vncPort),
'-nopw',
'-quiet',
], {
env: {
LD_LIBRARY_PATH: `${X11VNC_LIB_DIR}:${X11VNC_EXTRA_LIB_DIR}:${process.env.LD_LIBRARY_PATH || ''}`,
},
}));
await waitForRuntimePort(vncPort, 'x11vnc', processes);
if (!fs.existsSync(path.join(NOVNC_ROOT, 'vnc.html'))) {
throw new Error(`noVNC is missing at ${NOVNC_ROOT}.`);
}
processes.push(spawnRuntimeProcess(path.join(os.homedir(), '.local/bin/websockify'), [
'--web',
NOVNC_ROOT,
`127.0.0.1:${websockifyPort}`,
`127.0.0.1:${vncPort}`,
]));
await waitForRuntimePort(websockifyPort, 'websockify', processes);
return {
display,
vncPort,
websockifyPort,
noVncRoot: NOVNC_ROOT,
processes,
};
} catch (error) {
killRuntimeProcesses(processes);
reservedDisplays.delete(display);
throw error;
}
}
const INSTALL_COMMAND_TIMEOUT_MS = Number.parseInt( const INSTALL_COMMAND_TIMEOUT_MS = Number.parseInt(
process.env.CLOUDCLI_BROWSER_USE_INSTALL_TIMEOUT_MS || String(10 * 60 * 1000), process.env.CLOUDCLI_BROWSER_USE_INSTALL_TIMEOUT_MS || String(10 * 60 * 1000),
10, 10,
@@ -350,6 +479,45 @@ function publicSession(session: BrowserUseSession): PublicBrowserUseSession {
return publicFields; return publicFields;
} }
function getSessionViewer(sessionId: string): RuntimeHandle['viewer'] | null {
const session = sessions.get(sessionId);
if (!session || session.ownerId !== AGENT_OWNER_ID || session.status !== 'ready') {
return null;
}
return handles.get(sessionId)?.viewer || null;
}
function createViewerToken(sessionId: string): string {
const token = randomUUID();
viewerTokens.set(sessionId, {
token,
expiresAt: Date.now() + VIEWER_TOKEN_TTL_MS,
});
return token;
}
function deleteViewerToken(sessionId: string) {
viewerTokens.delete(sessionId);
}
function validateViewerTokenForSession(sessionId: string, token: string | null | undefined): boolean {
if (!token) {
return false;
}
const session = sessions.get(sessionId);
const viewer = session?.ownerId === AGENT_OWNER_ID && session.status === 'ready'
? handles.get(sessionId)?.viewer || null
: null;
const stored = viewerTokens.get(sessionId);
if (!viewer || !stored || stored.token !== token || stored.expiresAt < Date.now()) {
if (stored?.expiresAt && stored.expiresAt < Date.now()) {
viewerTokens.delete(sessionId);
}
return false;
}
return true;
}
function ownerSessions(ownerId: string): BrowserUseSession[] { function ownerSessions(ownerId: string): BrowserUseSession[] {
return [...sessions.values()].filter((session) => session.ownerId === ownerId); return [...sessions.values()].filter((session) => session.ownerId === ownerId);
} }
@@ -357,8 +525,13 @@ function ownerSessions(ownerId: string): BrowserUseSession[] {
async function closeHandle(sessionId: string): Promise<void> { async function closeHandle(sessionId: string): Promise<void> {
const handle = handles.get(sessionId); const handle = handles.get(sessionId);
handles.delete(sessionId); handles.delete(sessionId);
deleteViewerToken(sessionId);
await handle?.context?.close?.().catch(() => undefined); await handle?.context?.close?.().catch(() => undefined);
await handle?.browser?.close().catch(() => undefined); await handle?.browser?.close().catch(() => undefined);
killRuntimeProcesses(handle?.processes);
if (handle?.viewer?.display) {
reservedDisplays.delete(handle.viewer.display);
}
} }
async function expireStaleSessions(now = Date.now()): Promise<void> { async function expireStaleSessions(now = Date.now()): Promise<void> {
@@ -424,6 +597,11 @@ export const browserUseService = {
const current = readSettings(); const current = readSettings();
const nextSettings = { const nextSettings = {
enabled: typeof settings.enabled === 'boolean' ? settings.enabled : current.enabled, enabled: typeof settings.enabled === 'boolean' ? settings.enabled : current.enabled,
persistSessions: typeof settings.persistSessions === 'boolean' ? settings.persistSessions : current.persistSessions,
defaultProfileName: typeof settings.defaultProfileName === 'string'
? settings.defaultProfileName
: current.defaultProfileName,
browserBackend: settings.browserBackend ? normalizeBrowserBackend(settings.browserBackend) : current.browserBackend,
}; };
const next = writeSettings(nextSettings); const next = writeSettings(nextSettings);
@@ -439,14 +617,28 @@ export const browserUseService = {
async getStatus() { async getStatus() {
const settings = readSettings(); const settings = readSettings();
const readiness = getRuntimeReadiness(); const readiness = getRuntimeReadiness();
const available = settings.enabled && readiness.playwrightInstalled && readiness.chromiumInstalled; const useVisibleBackend = useVisibleCamoufoxBackend(settings);
const visibleCamoufoxReady = useVisibleBackend
&& VISIBLE_BROWSER_ENABLED
&& readiness.playwrightInstalled
&& Boolean(getCamoufoxExecutablePath())
&& fs.existsSync(X11VNC_BIN)
&& fs.existsSync(path.join(NOVNC_ROOT, 'vnc.html'));
const available = settings.enabled
&& readiness.playwrightInstalled
&& (useVisibleBackend ? visibleCamoufoxReady : readiness.chromiumInstalled);
return { return {
enabled: settings.enabled, enabled: settings.enabled,
runtime: getRuntime(), runtime: getRuntime(),
backend: useVisibleBackend ? 'camoufox-vnc' : 'playwright',
browserBackend: settings.browserBackend,
available, available,
playwrightInstalled: readiness.playwrightInstalled, playwrightInstalled: readiness.playwrightInstalled,
chromiumInstalled: readiness.chromiumInstalled, chromiumInstalled: readiness.chromiumInstalled,
camoufoxInstalled: Boolean(getCamoufoxExecutablePath()),
noVncInstalled: fs.existsSync(path.join(NOVNC_ROOT, 'vnc.html')),
x11vncInstalled: fs.existsSync(X11VNC_BIN),
installInProgress: readiness.installInProgress, installInProgress: readiness.installInProgress,
sessionCount: sessions.size, sessionCount: sessions.size,
message: available message: available
@@ -505,7 +697,7 @@ export const browserUseService = {
} }
await expireStaleSessions(); await expireStaleSessions();
const profileName = normalizeProfileName(options?.profileName); const profileName = resolveSessionProfileName(settings, options?.profileName);
const now = new Date().toISOString(); const now = new Date().toISOString();
const session: BrowserUseSession = { const session: BrowserUseSession = {
@@ -521,6 +713,9 @@ export const browserUseService = {
updatedAt: now, updatedAt: now,
lastAction: 'create', lastAction: 'create',
message: null, message: null,
backend: useVisibleCamoufoxBackend(settings) ? 'camoufox-vnc' : 'playwright',
viewerUrl: null,
viewerEmbedUrl: null,
profileName, profileName,
viewport: { width: 1440, height: 900 }, viewport: { width: 1440, height: 900 },
cursor: null, cursor: null,
@@ -532,7 +727,13 @@ export const browserUseService = {
} }
const readiness = getRuntimeReadiness(); const readiness = getRuntimeReadiness();
if (!settings.enabled || !readiness.playwrightInstalled || !readiness.chromiumInstalled || !readiness.playwright) { const useVisibleBackend = useVisibleCamoufoxBackend(settings);
const visibleCamoufoxReady = useVisibleBackend
&& VISIBLE_BROWSER_ENABLED
&& Boolean(getCamoufoxExecutablePath())
&& fs.existsSync(X11VNC_BIN)
&& fs.existsSync(path.join(NOVNC_ROOT, 'vnc.html'));
if (!settings.enabled || !readiness.playwrightInstalled || !readiness.playwright || (useVisibleBackend ? !visibleCamoufoxReady : !readiness.chromiumInstalled)) {
session.message = getSetupMessage(settings, readiness); session.message = getSetupMessage(settings, readiness);
sessions.set(session.id, session); sessions.set(session.id, session);
return publicSession(session); return publicSession(session);
@@ -541,31 +742,73 @@ export const browserUseService = {
let browser: any | undefined; let browser: any | undefined;
let context: any | undefined; let context: any | undefined;
let page: any; let page: any;
const launchOptions = { let viewer: RuntimeHandle['viewer'];
headless: true, let processes: RuntimeHandle['processes'];
const launchOptions: Record<string, unknown> = {
headless: !useVisibleBackend,
args: ['--disable-dev-shm-usage'], args: ['--disable-dev-shm-usage'],
}; };
const contextOptions = { const contextOptions = useVisibleBackend
? { viewport: null }
: {
viewport: { width: 1440, height: 900 }, viewport: { width: 1440, height: 900 },
serviceWorkers: 'block', serviceWorkers: 'block',
}; };
try {
if (useVisibleBackend) {
const camoufoxExecutable = getCamoufoxExecutablePath();
if (!camoufoxExecutable) {
throw new Error('Camoufox is not installed.');
}
const runtime = await startVisibleRuntime();
viewer = {
display: runtime.display,
vncPort: runtime.vncPort,
websockifyPort: runtime.websockifyPort,
noVncRoot: runtime.noVncRoot,
};
processes = runtime.processes;
launchOptions.executablePath = camoufoxExecutable;
launchOptions.env = {
...process.env,
DISPLAY: runtime.display,
LD_LIBRARY_PATH: `${X11VNC_LIB_DIR}:${X11VNC_EXTRA_LIB_DIR}:${process.env.LD_LIBRARY_PATH || ''}`,
};
launchOptions.args = [];
session.backend = 'camoufox-vnc';
const viewerToken = createViewerToken(session.id);
session.viewerUrl = getViewerUrl(session.id, viewerToken);
session.viewerEmbedUrl = session.viewerUrl;
}
if (profileName) { if (profileName) {
fs.mkdirSync(PROFILE_ROOT, { recursive: true }); fs.mkdirSync(PROFILE_ROOT, { recursive: true });
context = await readiness.playwright.chromium.launchPersistentContext(getProfilePath(profileName), { const browserType = useVisibleBackend ? readiness.playwright.firefox : readiness.playwright.chromium;
context = await browserType.launchPersistentContext(getProfilePath(profileName), {
...launchOptions, ...launchOptions,
...contextOptions, ...contextOptions,
}); });
page = context.pages()[0] || await context.newPage(); page = context.pages()[0] || await context.newPage();
} else { } else {
browser = await readiness.playwright.chromium.launch(launchOptions); const browserType = useVisibleBackend ? readiness.playwright.firefox : readiness.playwright.chromium;
browser = await browserType.launch(launchOptions);
context = await browser.newContext(contextOptions); context = await browser.newContext(contextOptions);
page = await context.newPage(); page = await context.newPage();
} }
} catch (error) {
await context?.close?.().catch(() => undefined);
await browser?.close?.().catch(() => undefined);
killRuntimeProcesses(processes);
if (viewer?.display) {
reservedDisplays.delete(viewer.display);
}
throw error;
}
session.status = 'ready'; session.status = 'ready';
session.message = 'Browser session is ready.'; session.message = 'Browser session is ready.';
sessions.set(session.id, session); sessions.set(session.id, session);
handles.set(session.id, { browser, context, page }); handles.set(session.id, { browser, context, page, processes, viewer });
await captureSession(session, page); await captureSession(session, page);
return publicSession(session); return publicSession(session);
}, },
@@ -812,6 +1055,25 @@ export const browserUseService = {
return { deleted: true, sessionId }; return { deleted: true, sessionId };
}, },
getViewerProxyTarget(sessionId: string) {
const viewer = getSessionViewer(sessionId);
if (!viewer) {
throw new Error('Browser viewer is not available for this session.');
}
return {
websockifyPort: viewer.websockifyPort,
noVncRoot: viewer.noVncRoot,
};
},
validateViewerToken(sessionId: string, token: string | null | undefined) {
return validateViewerTokenForSession(sessionId, token);
},
handleViewerWebSocket(clientWs: WebSocket, pathname: string) {
handleViewerWebSocket(clientWs, pathname, getSessionViewer);
},
async agentStopSession(sessionId: string) { async agentStopSession(sessionId: string) {
await this.getAgentSession(sessionId); await this.getAgentSession(sessionId);
return this.stopSession(sessionId); return this.stopSession(sessionId);

View File

@@ -0,0 +1,147 @@
import { randomBytes } from 'node:crypto';
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import { appConfigDb } from '@/modules/database/index.js';
import type { BrowserUseBackend, BrowserUseSettings } from './browser-use.types.js';
const IS_PLATFORM = process.env.VITE_IS_PLATFORM === 'true';
const BROWSER_USE_SETTINGS_KEY = 'browser_use_settings';
const BROWSER_USE_MCP_TOKEN_KEY = 'browser_use_mcp_token';
const MAX_PROFILE_NAME_LENGTH = 80;
export const DEFAULT_BROWSER_USE_SETTINGS: BrowserUseSettings = {
enabled: false,
persistSessions: false,
defaultProfileName: 'default',
browserBackend: IS_PLATFORM ? 'camoufox-vnc' : 'playwright',
};
export const PROFILE_ROOT = process.env.CLOUDCLI_BROWSER_USE_PROFILE_ROOT
|| path.join(os.homedir(), '.cloudcli', 'browser-use', 'profiles');
export function normalizeBrowserBackend(value: unknown): BrowserUseBackend {
return value === 'playwright' || value === 'camoufox-vnc'
? value
: DEFAULT_BROWSER_USE_SETTINGS.browserBackend;
}
function trimEdgeDashes(value: string): string {
let start = 0;
let end = value.length;
while (start < end && value[start] === '-') {
start += 1;
}
while (end > start && value[end - 1] === '-') {
end -= 1;
}
return value.slice(start, end);
}
export function normalizeProfileName(profileName?: string | null): string | null {
const sanitized = trimEdgeDashes(String(profileName || '')
.trim()
.toLowerCase()
.replace(/[^a-z0-9._-]+/g, '-'));
const normalized = sanitized
.slice(0, MAX_PROFILE_NAME_LENGTH)
.replace(/^-+|-+$/g, '');
if (!normalized) {
return null;
}
return /[a-z0-9]/.test(normalized) ? normalized : null;
}
export function normalizeDefaultProfileName(profileName?: string | null): string {
return normalizeProfileName(profileName) || DEFAULT_BROWSER_USE_SETTINGS.defaultProfileName;
}
export function resolveSessionProfileName(settings: BrowserUseSettings, profileName?: string | null): string | null {
const requestedProfileName = normalizeProfileName(profileName);
if (String(profileName || '').trim() && !requestedProfileName) {
throw new Error('Browser profile name must include at least one letter or number.');
}
if (requestedProfileName) {
validateRequestedProfileName(profileName, requestedProfileName);
return requestedProfileName;
}
return settings.persistSessions ? normalizeDefaultProfileName(settings.defaultProfileName) : null;
}
export function getProfilePath(profileName: string): string {
return path.join(PROFILE_ROOT, normalizeDefaultProfileName(profileName));
}
function validateRequestedProfileName(profileName: string | null | undefined, normalizedProfileName: string): void {
const requestedProfileName = String(profileName || '').trim();
const existingProfileName = findExistingProfileName(normalizedProfileName);
if (
existingProfileName
&& (requestedProfileName !== normalizedProfileName || existingProfileName !== normalizedProfileName)
) {
throw new Error(`Browser profile "${requestedProfileName}" resolves to existing profile "${existingProfileName}". Use "${normalizedProfileName}" instead.`);
}
}
function findExistingProfileName(normalizedProfileName: string): string | null {
try {
if (!fs.existsSync(PROFILE_ROOT)) {
return null;
}
const entries = fs.readdirSync(PROFILE_ROOT, { withFileTypes: true });
const match = entries.find((entry) => entry.isDirectory() && normalizeProfileName(entry.name) === normalizedProfileName);
return match?.name || null;
} catch {
return null;
}
}
export function useVisibleCamoufoxBackend(settings: BrowserUseSettings): boolean {
return settings.browserBackend === 'camoufox-vnc';
}
export function readSettings(): BrowserUseSettings {
try {
const raw = appConfigDb.get(BROWSER_USE_SETTINGS_KEY);
if (!raw) {
return DEFAULT_BROWSER_USE_SETTINGS;
}
const parsed = JSON.parse(raw) as Partial<BrowserUseSettings>;
return {
enabled: parsed.enabled === true,
persistSessions: parsed.persistSessions === true,
defaultProfileName: normalizeDefaultProfileName(parsed.defaultProfileName),
browserBackend: normalizeBrowserBackend(parsed.browserBackend),
};
} catch (error: any) {
console.warn('[Browser] Failed to read settings:', error?.message || error);
return DEFAULT_BROWSER_USE_SETTINGS;
}
}
export function writeSettings(settings: BrowserUseSettings): BrowserUseSettings {
const normalized = {
enabled: settings.enabled === true,
persistSessions: settings.persistSessions === true,
defaultProfileName: normalizeDefaultProfileName(settings.defaultProfileName),
browserBackend: normalizeBrowserBackend(settings.browserBackend),
};
appConfigDb.set(BROWSER_USE_SETTINGS_KEY, JSON.stringify(normalized));
return normalized;
}
export function getOrCreateMcpToken(): string {
const existing = appConfigDb.get(BROWSER_USE_MCP_TOKEN_KEY);
if (existing) {
return existing;
}
const token = randomBytes(32).toString('hex');
appConfigDb.set(BROWSER_USE_MCP_TOKEN_KEY, token);
return token;
}

View File

@@ -0,0 +1,66 @@
import type { spawn } from 'node:child_process';
export type BrowserUseRuntime = 'cloud' | 'local';
export type BrowserUseBackend = 'playwright' | 'camoufox-vnc';
export type BrowserUseSessionStatus = 'ready' | 'stopped' | 'unavailable';
export type BrowserUseSession = {
id: string;
ownerId: string;
createdBy: 'agent';
runtime: BrowserUseRuntime;
status: BrowserUseSessionStatus;
url: string | null;
title: string | null;
screenshotDataUrl: string | null;
createdAt: string;
updatedAt: string;
lastAction: string | null;
message: string | null;
backend: BrowserUseBackend;
viewerUrl: string | null;
viewerEmbedUrl: string | null;
profileName: string | null;
viewport: {
width: number;
height: number;
} | null;
cursor: {
x: number;
y: number;
actor: 'agent';
} | null;
};
export type PublicBrowserUseSession = Omit<BrowserUseSession, 'ownerId'>;
export type RuntimeHandle = {
browser?: any;
context?: any;
page?: any;
processes?: Array<ReturnType<typeof spawn>>;
viewer?: {
display: string;
vncPort: number;
websockifyPort: number;
noVncRoot: string;
};
};
export type BrowserUseSettings = {
enabled: boolean;
persistSessions: boolean;
defaultProfileName: string;
browserBackend: BrowserUseBackend;
};
export type RuntimeReadiness = {
playwright: any | null;
playwrightInstalled: boolean;
chromiumInstalled: boolean;
chromiumExecutablePath: string | null;
installInProgress: boolean;
installMessage: string | null;
};
export type RuntimeProbe = Omit<RuntimeReadiness, 'installInProgress' | 'installMessage'>;

View File

@@ -0,0 +1,76 @@
import { WebSocket } from 'ws';
import type { RuntimeHandle } from './browser-use.types.js';
type BrowserUseViewer = NonNullable<RuntimeHandle['viewer']>;
export const VIEWER_COOKIE_NAME = 'browser_use_viewer_token';
const DEFAULT_VIEWER_TOKEN_TTL_MS = 30 * 60 * 1000;
const parsedViewerTokenTtlMs = Number.parseInt(
process.env.CLOUDCLI_BROWSER_USE_VIEWER_TOKEN_TTL_MS || String(DEFAULT_VIEWER_TOKEN_TTL_MS),
10,
);
export const VIEWER_TOKEN_TTL_MS =
Number.isFinite(parsedViewerTokenTtlMs) && parsedViewerTokenTtlMs > 0
? parsedViewerTokenTtlMs
: DEFAULT_VIEWER_TOKEN_TTL_MS;
export function getViewerUrl(sessionId: string, viewerToken?: string): string {
const basePath = `/api/browser-use/sessions/${encodeURIComponent(sessionId)}/viewer`;
const websockifyPath = viewerToken
? `${basePath}/websockify?viewerToken=${encodeURIComponent(viewerToken)}`
: `${basePath}/websockify`;
const params = new URLSearchParams({
autoconnect: '1',
resize: 'scale',
reconnect: '1',
path: websockifyPath,
});
if (viewerToken) {
params.set('viewerToken', viewerToken);
}
return `${basePath}/vnc.html?${params.toString()}`;
}
export function handleViewerWebSocket(
clientWs: WebSocket,
pathname: string,
getSessionViewer: (sessionId: string) => BrowserUseViewer | null | undefined,
) {
const match = /^\/api\/browser-use\/sessions\/([^/]+)\/viewer\/websockify\/?$/.exec(pathname);
const sessionId = match ? decodeURIComponent(match[1]) : '';
const viewer = sessionId ? getSessionViewer(sessionId) : null;
if (!viewer) {
clientWs.close(4404, 'Browser viewer not found');
return;
}
const upstream = new WebSocket(`ws://127.0.0.1:${viewer.websockifyPort}`);
upstream.on('open', () => {
clientWs.on('message', (data) => {
if (upstream.readyState === WebSocket.OPEN) {
upstream.send(data);
}
});
upstream.on('message', (data) => {
if (clientWs.readyState === WebSocket.OPEN) {
clientWs.send(data);
}
});
});
upstream.on('close', (code, reason) => {
if (clientWs.readyState === WebSocket.OPEN) {
clientWs.close(code, reason);
}
});
upstream.on('error', () => {
if (clientWs.readyState === WebSocket.OPEN) {
clientWs.close(4502, 'Browser viewer upstream error');
}
});
clientWs.on('close', () => {
if (upstream.readyState === WebSocket.OPEN) {
upstream.close();
}
});
}

View File

@@ -0,0 +1,2 @@
export { browserUseService } from './browser-use.service.js';
export { VIEWER_COOKIE_NAME } from './browser-use.viewer.js';

View File

@@ -0,0 +1,73 @@
import assert from 'node:assert/strict';
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import test from 'node:test';
const originalProfileRoot = process.env.CLOUDCLI_BROWSER_USE_PROFILE_ROOT;
const testProfileRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'browser-use-profiles-'));
process.env.CLOUDCLI_BROWSER_USE_PROFILE_ROOT = testProfileRoot;
const {
getProfilePath,
normalizeDefaultProfileName,
normalizeProfileName,
PROFILE_ROOT,
resolveSessionProfileName,
} = await import('@/modules/browser-use/browser-use.settings.js');
test.after(() => {
if (originalProfileRoot === undefined) {
delete process.env.CLOUDCLI_BROWSER_USE_PROFILE_ROOT;
} else {
process.env.CLOUDCLI_BROWSER_USE_PROFILE_ROOT = originalProfileRoot;
}
fs.rmSync(testProfileRoot, { recursive: true, force: true });
});
test('browser profile names are canonicalized before storage and path resolution', () => {
assert.equal(normalizeProfileName(' Work Profile!! '), 'work-profile');
assert.equal(normalizeProfileName(`${'-'.repeat(100)}Work Profile`), 'work-profile');
assert.equal(normalizeDefaultProfileName(' Work Profile!! '), 'work-profile');
assert.equal(
getProfilePath(' Work Profile!! '),
`${PROFILE_ROOT}/work-profile`,
);
assert.equal(
resolveSessionProfileName({
enabled: true,
persistSessions: true,
defaultProfileName: ' Work Profile!! ',
browserBackend: 'playwright',
}),
'work-profile',
);
});
test('browser profile aliases are rejected when the normalized profile already exists', () => {
const profileName = `alias-test-${Date.now()}`;
fs.mkdirSync(getProfilePath(profileName), { recursive: true });
try {
assert.throws(
() => resolveSessionProfileName({
enabled: true,
persistSessions: false,
defaultProfileName: 'default',
browserBackend: 'playwright',
}, profileName.toUpperCase()),
/resolves to existing profile/,
);
assert.equal(
resolveSessionProfileName({
enabled: true,
persistSessions: false,
defaultProfileName: 'default',
browserBackend: 'playwright',
}, profileName),
profileName,
);
} finally {
fs.rmSync(getProfilePath(profileName), { recursive: true, force: true });
}
});

View File

@@ -1,8 +1,9 @@
import type { Server as HttpServer } from 'node:http'; import type { Server as HttpServer } from 'node:http';
import { WebSocketServer, type VerifyClientCallbackSync } from 'ws'; import { WebSocket, WebSocketServer, type VerifyClientCallbackSync } from 'ws';
import { handleChatConnection } from '@/modules/websocket/services/chat-websocket.service.js'; import { handleChatConnection } from '@/modules/websocket/services/chat-websocket.service.js';
import { VIEWER_COOKIE_NAME } from '@/modules/browser-use/index.js';
import { verifyWebSocketClient } from '@/modules/websocket/services/websocket-auth.service.js'; import { verifyWebSocketClient } from '@/modules/websocket/services/websocket-auth.service.js';
import { handlePluginWsProxy } from '@/modules/websocket/services/plugin-websocket-proxy.service.js'; import { handlePluginWsProxy } from '@/modules/websocket/services/plugin-websocket-proxy.service.js';
import { handleShellConnection } from '@/modules/websocket/services/shell-websocket.service.js'; import { handleShellConnection } from '@/modules/websocket/services/shell-websocket.service.js';
@@ -13,8 +14,21 @@ type WebSocketServerDependencies = {
chat: Parameters<typeof handleChatConnection>[2]; chat: Parameters<typeof handleChatConnection>[2];
shell: Parameters<typeof handleShellConnection>[1]; shell: Parameters<typeof handleShellConnection>[1];
getPluginPort: Parameters<typeof handlePluginWsProxy>[2]; getPluginPort: Parameters<typeof handlePluginWsProxy>[2];
browserUseViewer?: (ws: WebSocket, pathname: string) => void;
authenticateBrowserUseViewer?: (pathname: string, token: string | null) => boolean;
}; };
function readCookieValue(header: unknown, name: string): string | null {
if (!header) return null;
const prefix = `${name}=`;
const cookie = String(header).split(';').map((part) => part.trim()).find((part) => part.startsWith(prefix));
return cookie ? decodeURIComponent(cookie.slice(prefix.length)) : null;
}
function getBrowserUseViewerToken(url: URL, headers: Record<string, unknown>): string | null {
return url.searchParams.get('viewerToken') || readCookieValue(headers.cookie, VIEWER_COOKIE_NAME);
}
/** /**
* Creates and wires the server-wide websocket gateway used for chat, shell, and * Creates and wires the server-wide websocket gateway used for chat, shell, and
* plugin proxy routes. * plugin proxy routes.
@@ -27,7 +41,17 @@ export function createWebSocketServer(
server, server,
verifyClient: (( verifyClient: ((
info: Parameters<VerifyClientCallbackSync<AuthenticatedWebSocketRequest>>[0] info: Parameters<VerifyClientCallbackSync<AuthenticatedWebSocketRequest>>[0]
) => verifyWebSocketClient(info, dependencies.verifyClient)), ) => {
const requestUrl = new URL(info.req.url ?? '/', 'http://localhost');
if (
requestUrl.pathname.startsWith('/api/browser-use/sessions/')
&& requestUrl.pathname.endsWith('/viewer/websockify')
) {
const token = getBrowserUseViewerToken(requestUrl, info.req.headers as Record<string, unknown>);
return Boolean(dependencies.authenticateBrowserUseViewer?.(requestUrl.pathname, token));
}
return verifyWebSocketClient(info, dependencies.verifyClient);
}),
}); });
wss.on('connection', (ws, request) => { wss.on('connection', (ws, request) => {
@@ -68,6 +92,11 @@ export function createWebSocketServer(
return; return;
} }
if (pathname.startsWith('/api/browser-use/sessions/') && pathname.endsWith('/viewer/websockify')) {
dependencies.browserUseViewer?.(ws, pathname);
return;
}
console.log('[WARN] Unknown WebSocket path:', pathname); console.log('[WARN] Unknown WebSocket path:', pathname);
ws.close(); ws.close();
}); });

224
server/voice-proxy.js Normal file
View File

@@ -0,0 +1,224 @@
// Optional voice proxy — forwards STT/TTS to an OpenAI-compatible audio backend.
//
// The backend is whatever the user points at: OpenAI, Groq, or a local server
// (LocalAI / Speaches / Kokoro-FastAPI / openedai-speech / etc.). It must expose the
// standard OpenAI audio endpoints:
// POST {base}/audio/transcriptions (multipart 'file' + 'model') -> { text }
// POST {base}/audio/speech ({ model, voice, input }) -> audio bytes
//
// Config is resolved per-request from headers (set by the client's voice settings),
// falling back to server env defaults. Mounted at /api/voice behind authenticateToken.
import { Readable } from 'node:stream';
import express from 'express';
const ENV = {
baseUrl: (process.env.VOICE_API_BASE_URL || '').replace(/\/$/, ''),
apiKey: process.env.VOICE_API_KEY || '',
sttModel: process.env.VOICE_STT_MODEL || 'whisper-1',
ttsModel: process.env.VOICE_TTS_MODEL || 'tts-1',
ttsVoice: process.env.VOICE_TTS_VOICE || 'alloy',
};
/**
* Resolve the voice backend config for a request. Client headers (set from the
* user's in-app voice settings) take precedence over the server env defaults.
* @param {import('express').Request} req
* @returns {{baseUrl: string, apiKey: string, sttModel: string, ttsModel: string, ttsVoice: string, ttsFormat: string}}
*/
function resolveConfig(req) {
const h = req.headers;
return {
// Security: do not allow clients to control the outbound backend host.
// Always use the server-side configured base URL.
baseUrl: ENV.baseUrl,
apiKey: String(h['x-voice-api-key'] || '') || ENV.apiKey,
sttModel: String(h['x-voice-stt-model'] || '') || ENV.sttModel,
ttsModel: String(h['x-voice-tts-model'] || '') || ENV.ttsModel,
ttsVoice: String(h['x-voice-tts-voice'] || '') || ENV.ttsVoice,
ttsFormat: String(h['x-voice-tts-format'] || '').trim(),
};
}
const router = express.Router();
// Generous by default — local TTS can synthesize long messages at ~real-time on CPU.
// Guard against a non-numeric/zero override that would make setTimeout fire immediately.
const DEFAULT_VOICE_TIMEOUT_MS = 300000;
const _parsedTimeout = Number(process.env.VOICE_TIMEOUT_MS);
const VOICE_TIMEOUT_MS = Number.isFinite(_parsedTimeout) && _parsedTimeout > 0
? _parsedTimeout
: DEFAULT_VOICE_TIMEOUT_MS;
/**
* fetch() with an AbortController timeout so a stalled backend can't hold the
* request open indefinitely. Aborts after VOICE_TIMEOUT_MS.
* @param {string} url
* @param {RequestInit} [options]
* @returns {Promise<Response>}
*/
async function fetchWithTimeout(url, options = {}) {
const parsed = new URL(url);
if (!['http:', 'https:'].includes(parsed.protocol) || !isAllowedBackendUrl(parsed.origin)) {
throw new Error('Blocked outbound voice backend URL');
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), VOICE_TIMEOUT_MS);
try {
return await fetch(parsed.toString(), { redirect: 'manual', ...options, signal: controller.signal });
} finally {
clearTimeout(timer);
}
}
/**
* Turn a backend fetch failure into a clear, actionable client response:
* 504 on timeout (AbortError), 502 otherwise.
* @param {import('express').Response} res
* @param {Error} e
*/
function backendError(res, e) {
if (e && e.name === 'AbortError') {
return res.status(504).json({
error: `Voice backend timed out after ${Math.round(VOICE_TIMEOUT_MS / 1000)}s. Check your voice backend.`,
});
}
return res.status(502).json({ error: `Voice backend unreachable: ${e.message}` });
}
/**
* SSRF guard for the user-configurable backend URL: allow http/https only and
* block the link-local / cloud-metadata range (169.254.x). localhost and private
* ranges are allowed on purpose so users can point at a local voice server
* (LocalAI, Speaches, Kokoro-FastAPI, etc.).
* @param {string} raw
* @returns {boolean}
*/
function isAllowedBackendUrl(raw) {
let u;
try {
u = new URL(raw);
} catch {
return false;
}
if (u.protocol !== 'http:' && u.protocol !== 'https:') return false;
if (u.hostname === '169.254.169.254' || u.hostname.startsWith('169.254.')) return false;
return true;
}
/**
* Relay an upstream (backend) error to the client without making an upstream
* 401/403 look like the user's own app login failed.
* @param {import('express').Response} res
* @param {number} status
* @param {string} [text]
*/
function upstreamError(res, status, text) {
if (status === 401 || status === 403) {
return res.status(502).json({ error: 'Voice backend rejected the request (check the API key).' });
}
return res.status(status).json({ error: text || 'voice backend error' });
}
let _upload = null;
/**
* Lazily build a memory-storage multer instance (25 MB cap) for audio uploads,
* so multer is only imported when the voice feature is actually used.
* @returns {Promise<import('multer').Multer>}
*/
async function getUpload() {
if (!_upload) {
const multer = (await import('multer')).default;
_upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 25 * 1024 * 1024 } });
}
return _upload;
}
/**
* Build the Authorization header for the backend, or an empty object when no
* key is configured (e.g. a local server that needs none).
* @param {string} apiKey
* @returns {Record<string, string>}
*/
function authHeader(apiKey) {
return apiKey ? { Authorization: `Bearer ${apiKey}` } : {};
}
/**
* GET /api/voice/health -> { configured } (true when a backend base URL is set).
*/
router.get('/health', (req, res) => {
res.json({ configured: Boolean(resolveConfig(req).baseUrl) });
});
/**
* POST /api/voice/transcribe (multipart 'audio') -> { text }.
* Forwards the uploaded audio to the backend's /audio/transcriptions endpoint.
*/
router.post('/transcribe', async (req, res) => {
const cfg = resolveConfig(req);
if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
if (!isAllowedBackendUrl(cfg.baseUrl)) return res.status(400).json({ error: 'Invalid voice backend URL.' });
const upload = await getUpload();
upload.single('audio')(req, res, async (err) => {
if (err) return res.status(400).json({ error: err.message });
if (!req.file) return res.status(400).json({ error: 'No audio uploaded' });
try {
const fd = new FormData();
fd.append(
'file',
new Blob([req.file.buffer], { type: req.file.mimetype || 'audio/webm' }),
req.file.originalname || 'recording.webm',
);
fd.append('model', cfg.sttModel);
const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/transcriptions`, {
method: 'POST',
headers: authHeader(cfg.apiKey),
body: fd,
});
const text = await r.text();
if (!r.ok) return upstreamError(res, r.status, text);
let data;
try { data = JSON.parse(text); } catch { data = { text }; }
res.json({ text: data.text ?? '' });
} catch (e) {
backendError(res, e);
}
});
});
/**
* POST /api/voice/tts { text } -> audio bytes.
* Forwards the text to the backend's /audio/speech endpoint and streams the audio back.
*/
router.post('/tts', async (req, res) => {
const cfg = resolveConfig(req);
if (!cfg.baseUrl) return res.status(503).json({ error: 'No voice backend configured' });
if (!isAllowedBackendUrl(cfg.baseUrl)) return res.status(400).json({ error: 'Invalid voice backend URL.' });
const text = req.body?.text;
if (typeof text !== 'string' || !text.trim()) return res.status(400).json({ error: 'text required' });
try {
const r = await fetchWithTimeout(`${cfg.baseUrl}/audio/speech`, {
method: 'POST',
headers: { 'Content-Type': 'application/json', ...authHeader(cfg.apiKey) },
body: JSON.stringify({
model: cfg.ttsModel,
voice: cfg.ttsVoice,
input: text,
...(cfg.ttsFormat ? { response_format: cfg.ttsFormat } : {}),
}),
});
if (!r.ok) {
const errText = await r.text().catch(() => 'tts failed');
return upstreamError(res, r.status, errText);
}
res.setHeader('Content-Type', r.headers.get('content-type') || 'audio/mpeg');
res.setHeader('Cache-Control', 'no-store');
if (!r.body) return res.end();
Readable.fromWeb(r.body).on('error', (error) => res.destroy(error)).pipe(res);
} catch (e) {
backendError(res, e);
}
});
export default router;

View File

@@ -1,4 +1,4 @@
import { useCallback, useEffect, useMemo, useState } from 'react'; import { useCallback, useEffect, useMemo, useRef, useState } from 'react';
import { import {
Bot, Bot,
Clock3, Clock3,
@@ -7,6 +7,7 @@ import {
ExternalLink, ExternalLink,
Loader2, Loader2,
MonitorPlay, MonitorPlay,
MousePointer2,
RefreshCw, RefreshCw,
Settings, Settings,
Square, Square,
@@ -19,9 +20,14 @@ import { Badge, Button } from '../../../shared/view/ui';
import { authenticatedFetch } from '../../../utils/api'; import { authenticatedFetch } from '../../../utils/api';
import type { SettingsMainTab } from '../../settings/types/types'; import type { SettingsMainTab } from '../../settings/types/types';
const BROWSER_USE_GUIDE_URL = 'https://cloudcli.ai/docs/browser-use';
const BROWSER_USE_CACHE_TTL_MS = 30_000;
type BrowserUseStatus = { type BrowserUseStatus = {
enabled: boolean; enabled: boolean;
available: boolean; available: boolean;
backend: 'playwright' | 'camoufox-vnc';
browserBackend: 'playwright' | 'camoufox-vnc';
playwrightInstalled: boolean; playwrightInstalled: boolean;
chromiumInstalled: boolean; chromiumInstalled: boolean;
installInProgress: boolean; installInProgress: boolean;
@@ -39,6 +45,9 @@ type BrowserUseSession = {
updatedAt: string; updatedAt: string;
lastAction: string | null; lastAction: string | null;
message: string | null; message: string | null;
backend?: 'playwright' | 'camoufox-vnc';
viewerUrl?: string | null;
viewerEmbedUrl?: string | null;
createdBy: 'agent'; createdBy: 'agent';
profileName: string | null; profileName: string | null;
viewport: { viewport: {
@@ -54,17 +63,48 @@ type BrowserUseSession = {
type BrowserUsePanelProps = { type BrowserUsePanelProps = {
isVisible: boolean; isVisible: boolean;
projectId?: string | null;
onShowSettings?: (tab?: SettingsMainTab) => void; onShowSettings?: (tab?: SettingsMainTab) => void;
}; };
type BrowserUsePanelCacheEntry = {
status: BrowserUseStatus | null;
sessions: BrowserUseSession[];
selectedSessionId: string | null;
updatedAt: number;
};
const browserUsePanelCache = new Map<string, BrowserUsePanelCacheEntry>();
async function readJson<T>(response: Response): Promise<T> { async function readJson<T>(response: Response): Promise<T> {
const data = await response.json(); const text = await response.text();
let data: any = {};
if (text) {
try {
data = JSON.parse(text);
} catch {
throw new Error(response.ok ? 'Received an invalid Browser response.' : `Browser request failed (${response.status}).`);
}
}
if (!response.ok || data.success === false) { if (!response.ok || data.success === false) {
throw new Error(data.error || data.details || `Request failed (${response.status})`); throw new Error(data.error || data.details || `Request failed (${response.status})`);
} }
return data as T; return data as T;
} }
async function fetchBrowserPanelData() {
const [statusResponse, sessionsResponse] = await Promise.all([
authenticatedFetch('/api/browser-use/status'),
authenticatedFetch('/api/browser-use/sessions'),
]);
const statusData = await readJson<{ data: BrowserUseStatus }>(statusResponse);
const sessionsData = await readJson<{ data: { sessions: BrowserUseSession[] } }>(sessionsResponse);
return {
status: statusData.data,
sessions: [...sessionsData.data.sessions].sort((a, b) => Date.parse(b.createdAt) - Date.parse(a.createdAt)),
};
}
function formatRelativeTime(value: string | null): string { function formatRelativeTime(value: string | null): string {
if (!value) return 'Never'; if (!value) return 'Never';
@@ -119,20 +159,42 @@ function getStatusDot(status: BrowserUseSession['status']): string {
return 'bg-border'; return 'bg-border';
} }
function getEngineLabel(backend?: BrowserUseStatus['backend'] | BrowserUseSession['backend']): string {
return backend === 'camoufox-vnc' ? 'Visible browser' : 'Playwright';
}
const PROMPTS = [ const PROMPTS = [
'Use Browser to inspect the checkout flow and report any broken UI states.', 'Use Browser to inspect the checkout flow and report any broken UI states.',
'Open <url> with Browser, interact with the page, and summarize what changed after each step.', 'Open <url> with Browser, interact with the page, and summarize what changed after each step.',
]; ];
export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUsePanelProps) { function getBrowserUseCacheKey(projectId?: string | null): string {
const [status, setStatus] = useState<BrowserUseStatus | null>(null); return projectId ? `browser-use:project:${projectId}` : 'browser-use:global';
const [sessions, setSessions] = useState<BrowserUseSession[]>([]); }
const [selectedSessionId, setSelectedSessionId] = useState<string | null>(null);
function getFreshCacheEntry(cacheKey: string): BrowserUsePanelCacheEntry | null {
const entry = browserUsePanelCache.get(cacheKey);
if (!entry || Date.now() - entry.updatedAt > BROWSER_USE_CACHE_TTL_MS) {
return null;
}
return entry;
}
export default function BrowserUsePanel({ isVisible, projectId, onShowSettings }: BrowserUsePanelProps) {
const cacheKey = getBrowserUseCacheKey(projectId);
const initialCacheEntry = getFreshCacheEntry(cacheKey);
const [status, setStatus] = useState<BrowserUseStatus | null>(() => initialCacheEntry?.status ?? null);
const [sessions, setSessions] = useState<BrowserUseSession[]>(() => initialCacheEntry?.sessions ?? []);
const [selectedSessionId, setSelectedSessionId] = useState<string | null>(() => (
initialCacheEntry?.selectedSessionId || initialCacheEntry?.sessions[0]?.id || null
));
const [hasLoadedOnce, setHasLoadedOnce] = useState(Boolean(initialCacheEntry));
const [isRefreshing, setIsRefreshing] = useState(false); const [isRefreshing, setIsRefreshing] = useState(false);
const [isBusy, setIsBusy] = useState(false); const [isBusy, setIsBusy] = useState(false);
const [isInstalling, setIsInstalling] = useState(false); const [isInstalling, setIsInstalling] = useState(false);
const [isFullscreen, setIsFullscreen] = useState(false); const [isFullscreen, setIsFullscreen] = useState(false);
const [error, setError] = useState<string | null>(null); const [error, setError] = useState<string | null>(null);
const activeLoadIdRef = useRef(0);
const selectedSession = useMemo( const selectedSession = useMemo(
() => sessions.find((session) => session.id === selectedSessionId) || sessions[0] || null, () => sessions.find((session) => session.id === selectedSessionId) || sessions[0] || null,
@@ -140,8 +202,12 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
); );
const activeSessions = sessions.filter((session) => session.status === 'ready'); const activeSessions = sessions.filter((session) => session.status === 'ready');
const needsBrowserBinaries = Boolean(status?.enabled && (!status.playwrightInstalled || !status.chromiumInstalled)); const isInitialLoading = isRefreshing && !hasLoadedOnce && sessions.length === 0;
const runtimeLabel = !status?.enabled const isBackgroundRefreshing = isRefreshing && !isInitialLoading;
const needsBrowserBinaries = Boolean(status?.enabled && !status.available);
const runtimeLabel = isInitialLoading
? 'Loading'
: !status?.enabled
? 'Disabled' ? 'Disabled'
: status.available : status.available
? 'Ready' ? 'Ready'
@@ -157,29 +223,72 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
: null; : null;
const refresh = useCallback(async () => { const refresh = useCallback(async () => {
const loadId = activeLoadIdRef.current + 1;
activeLoadIdRef.current = loadId;
setIsRefreshing(true); setIsRefreshing(true);
try { try {
const [statusResponse, sessionsResponse] = await Promise.all([ let nextData: Awaited<ReturnType<typeof fetchBrowserPanelData>>;
authenticatedFetch('/api/browser-use/status'), try {
authenticatedFetch('/api/browser-use/sessions'), nextData = await fetchBrowserPanelData();
]); } catch (error) {
const statusData = await readJson<{ data: BrowserUseStatus }>(statusResponse); if (loadId !== activeLoadIdRef.current) {
const sessionsData = await readJson<{ data: { sessions: BrowserUseSession[] } }>(sessionsResponse); return;
const nextSessions = sessionsData.data.sessions; }
setStatus(statusData.data); await new Promise((resolve) => setTimeout(resolve, 350));
nextData = await fetchBrowserPanelData();
}
if (activeLoadIdRef.current !== loadId) {
return;
}
const nextSessions = nextData.sessions;
setStatus(nextData.status);
setSessions(nextSessions); setSessions(nextSessions);
setSelectedSessionId((current) => ( setHasLoadedOnce(true);
current && nextSessions.some((session) => session.id === current) let nextSelectedSessionId: string | null = null;
setSelectedSessionId((current) => {
nextSelectedSessionId = current && nextSessions.some((session) => session.id === current)
? current ? current
: nextSessions[0]?.id || null : nextSessions[0]?.id || null;
)); return nextSelectedSessionId;
});
browserUsePanelCache.set(cacheKey, {
status: nextData.status,
sessions: nextSessions,
selectedSessionId: nextSelectedSessionId,
updatedAt: Date.now(),
});
setError(null); setError(null);
} catch (err) { } catch (err) {
if (activeLoadIdRef.current !== loadId) {
return;
}
setHasLoadedOnce(true);
setError(err instanceof Error ? err.message : 'Failed to load Browser'); setError(err instanceof Error ? err.message : 'Failed to load Browser');
} finally { } finally {
if (activeLoadIdRef.current === loadId) {
setIsRefreshing(false); setIsRefreshing(false);
} }
}, []); }
}, [cacheKey]);
useEffect(() => {
const cachedEntry = browserUsePanelCache.get(cacheKey);
if (!cachedEntry) return;
browserUsePanelCache.set(cacheKey, {
...cachedEntry,
selectedSessionId,
});
}, [cacheKey, selectedSessionId]);
useEffect(() => {
const cachedEntry = getFreshCacheEntry(cacheKey);
setStatus(cachedEntry?.status ?? null);
setSessions(cachedEntry?.sessions ?? []);
setSelectedSessionId(cachedEntry?.selectedSessionId || cachedEntry?.sessions[0]?.id || null);
setHasLoadedOnce(Boolean(cachedEntry));
setError(null);
activeLoadIdRef.current += 1;
}, [cacheKey]);
useEffect(() => { useEffect(() => {
if (!isVisible) return; if (!isVisible) return;
@@ -253,6 +362,10 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<span>{formatRelativeTime(session.updatedAt)}</span> <span>{formatRelativeTime(session.updatedAt)}</span>
<span className="truncate">- {formatAction(session.lastAction)}</span> <span className="truncate">- {formatAction(session.lastAction)}</span>
</div> </div>
<div className="mt-2 flex flex-wrap gap-1.5 pl-3.5 text-[10px] text-muted-foreground">
<span className="rounded border border-border/70 bg-background/70 px-1.5 py-0.5">{getEngineLabel(session.backend)}</span>
<span className="rounded border border-border/70 bg-background/70 px-1.5 py-0.5">{session.profileName || 'Temporary'}</span>
</div>
</button> </button>
); );
}; };
@@ -270,9 +383,18 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
</div> </div>
<p className="mt-1 max-w-xl text-sm leading-6 text-muted-foreground"> <p className="mt-1 max-w-xl text-sm leading-6 text-muted-foreground">
{status?.enabled {status?.enabled
? 'Agent browser sessions appear here while an AI task is using Browser.' ? 'When an agent opens a browser, you can watch the latest screenshot, take control in a new tab, or end the running session.'
: 'Enable Browser in settings to let agents open monitored browser sessions.'} : 'Enable Browser to let agents open websites, test flows, capture screenshots, and debug UI from a real page.'}
</p> </p>
<a
href={BROWSER_USE_GUIDE_URL}
target="_blank"
rel="noopener noreferrer"
className="mt-2 inline-flex items-center gap-1.5 text-sm font-medium text-primary hover:underline"
>
Read the Browser guide
<ExternalLink className="h-3.5 w-3.5" />
</a>
</div> </div>
</div> </div>
@@ -312,10 +434,19 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
</div> </div>
); );
const renderLoadingState = () => (
<div className="flex min-h-0 flex-1 items-center justify-center p-6">
<div className="flex items-center gap-3 rounded-md border border-border bg-card/40 px-4 py-3 text-sm text-muted-foreground shadow-sm">
<Loader2 className="h-4 w-4 animate-spin text-primary" />
Loading browser sessions...
</div>
</div>
);
const renderBrowserSurface = (fullscreen = false) => ( const renderBrowserSurface = (fullscreen = false) => (
<div className={cn('flex flex-1 items-center justify-center bg-neutral-950', fullscreen ? 'min-h-[80vh]' : 'min-h-[420px]')}> <div className={cn('flex flex-1 items-center justify-center bg-neutral-950', fullscreen ? 'min-h-[80vh]' : 'min-h-[420px]')}>
{selectedSession?.screenshotDataUrl ? ( {selectedSession?.screenshotDataUrl ? (
<div className="relative inline-block max-h-full"> <div className="group relative inline-block max-h-full">
<img <img
src={selectedSession.screenshotDataUrl} src={selectedSession.screenshotDataUrl}
alt="Browser session screenshot" alt="Browser session screenshot"
@@ -329,6 +460,18 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<div className="absolute left-1/2 top-1/2 h-2 w-2 -translate-x-1/2 -translate-y-1/2 rounded-full bg-white" /> <div className="absolute left-1/2 top-1/2 h-2 w-2 -translate-x-1/2 -translate-y-1/2 rounded-full bg-white" />
</div> </div>
)} )}
{selectedSession?.viewerEmbedUrl && selectedSession.status === 'ready' && (
<button
type="button"
onClick={() => window.open(selectedSession.viewerUrl || selectedSession.viewerEmbedUrl || '', '_blank', 'noopener,noreferrer')}
className="absolute inset-0 flex items-center justify-center bg-black/0 opacity-0 transition focus-visible:bg-black/30 focus-visible:opacity-100 focus-visible:outline-none group-hover:bg-black/30 group-hover:opacity-100"
>
<span className="inline-flex items-center gap-2 rounded-md border border-white/20 bg-black/80 px-3 py-2 text-sm font-medium text-white shadow-lg">
<MousePointer2 className="h-4 w-4" />
Take control
</span>
</button>
)}
</div> </div>
) : ( ) : (
<div className="px-6 text-center"> <div className="px-6 text-center">
@@ -350,10 +493,29 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<Badge variant="outline" className={cn('text-[10px]', getRuntimeTone(status, isInstalling))}> <Badge variant="outline" className={cn('text-[10px]', getRuntimeTone(status, isInstalling))}>
{runtimeLabel} {runtimeLabel}
</Badge> </Badge>
<Badge variant="outline" className="border-border bg-background text-[10px] text-muted-foreground">
{getEngineLabel(status?.backend)}
</Badge>
</div> </div>
<p className="mt-0.5 text-xs text-muted-foreground">Monitor browser sessions opened by AI agents.</p> <p className="mt-0.5 text-xs text-muted-foreground">Watch and manage browser sessions agents use to test real websites.</p>
{isBackgroundRefreshing && (
<div className="mt-1 flex items-center gap-1.5 text-xs text-muted-foreground">
<RefreshCw className="h-3 w-3 animate-spin" />
Refreshing sessions...
</div>
)}
</div> </div>
<div className="flex items-center gap-1.5"> <div className="flex items-center gap-1.5">
<Button
variant="ghost"
size="sm"
className="h-7 w-7 p-0"
onClick={() => window.open(BROWSER_USE_GUIDE_URL, '_blank', 'noopener,noreferrer')}
title="Open Browser guide"
aria-label="Open Browser guide"
>
<ExternalLink className="h-3.5 w-3.5" />
</Button>
{onShowSettings && ( {onShowSettings && (
<Button <Button
variant="ghost" variant="ghost"
@@ -425,7 +587,7 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
</div> </div>
{sessions.length === 0 ? ( {sessions.length === 0 ? (
renderEmptyState() isInitialLoading ? renderLoadingState() : renderEmptyState()
) : ( ) : (
<div className="min-h-0 flex-1 overflow-auto bg-muted/20 p-4"> <div className="min-h-0 flex-1 overflow-auto bg-muted/20 p-4">
<div className="mx-auto flex min-h-[500px] max-w-7xl flex-col overflow-hidden rounded-md border border-border bg-background shadow-sm"> <div className="mx-auto flex min-h-[500px] max-w-7xl flex-col overflow-hidden rounded-md border border-border bg-background shadow-sm">
@@ -441,14 +603,32 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<ExternalLink className="h-3.5 w-3.5 shrink-0" /> <ExternalLink className="h-3.5 w-3.5 shrink-0" />
<span className="truncate">{selectedSession?.url || 'No page loaded'}</span> <span className="truncate">{selectedSession?.url || 'No page loaded'}</span>
</div> </div>
<div className="mt-1 flex flex-wrap gap-1.5 text-[10px] text-muted-foreground">
<span className="rounded border border-border/70 bg-muted/30 px-1.5 py-0.5">{getEngineLabel(selectedSession?.backend || status?.backend)}</span>
<span className="rounded border border-border/70 bg-muted/30 px-1.5 py-0.5">Profile: {selectedSession?.profileName || 'Temporary'}</span>
<span className="rounded border border-border/70 bg-muted/30 px-1.5 py-0.5">Updated {formatRelativeTime(selectedSession?.updatedAt || null)}</span>
</div>
</div> </div>
<div className="hidden text-xs text-muted-foreground md:block"> <div className="hidden text-xs text-muted-foreground md:block">
{formatAction(selectedSession?.lastAction || null)} {formatAction(selectedSession?.lastAction || null)}
</div> </div>
{selectedSession?.viewerUrl && selectedSession.status === 'ready' && (
<Button
variant="outline"
size="sm"
className="h-8"
onClick={() => window.open(selectedSession.viewerUrl || '', '_blank', 'noopener,noreferrer')}
title="Open live browser control in a new tab"
aria-label="Open live browser control in a new tab"
>
<MousePointer2 className="h-4 w-4" />
Take control
</Button>
)}
<Button variant="ghost" size="sm" className="h-8 w-8 p-0" onClick={() => setIsFullscreen(true)} disabled={!selectedSession?.screenshotDataUrl} title="Full screen" aria-label="Full screen"> <Button variant="ghost" size="sm" className="h-8 w-8 p-0" onClick={() => setIsFullscreen(true)} disabled={!selectedSession?.screenshotDataUrl} title="Full screen" aria-label="Full screen">
<Expand className="h-4 w-4" /> <Expand className="h-4 w-4" />
</Button> </Button>
<Button variant="ghost" size="sm" className="h-8 w-8 p-0 lg:hidden" onClick={stopSession} disabled={isBusy || !selectedSession || selectedSession.status !== 'ready'} title="Stop session" aria-label="Stop session"> <Button variant="ghost" size="sm" className="h-8 w-8 p-0 lg:hidden" onClick={stopSession} disabled={isBusy || !selectedSession || selectedSession.status !== 'ready'} title="End session" aria-label="End session">
<Square className="h-4 w-4" /> <Square className="h-4 w-4" />
</Button> </Button>
<Button variant="ghost" size="sm" className="h-8 w-8 p-0 lg:hidden" onClick={deleteSession} disabled={isBusy || !selectedSession} title="Delete session" aria-label="Delete session"> <Button variant="ghost" size="sm" className="h-8 w-8 p-0 lg:hidden" onClick={deleteSession} disabled={isBusy || !selectedSession} title="Delete session" aria-label="Delete session">
@@ -475,6 +655,11 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<div className="min-h-0 flex-1 overflow-y-auto p-3"> <div className="min-h-0 flex-1 overflow-y-auto p-3">
{sessions.length > 0 ? ( {sessions.length > 0 ? (
<div className="space-y-2">{sessions.map(renderSessionItem)}</div> <div className="space-y-2">{sessions.map(renderSessionItem)}</div>
) : isInitialLoading ? (
<div className="flex items-center justify-center gap-2 rounded-md border border-dashed border-border/70 px-3 py-8 text-center text-xs text-muted-foreground">
<Loader2 className="h-3.5 w-3.5 animate-spin" />
Loading sessions...
</div>
) : ( ) : (
<div className="rounded-md border border-dashed border-border/70 px-3 py-8 text-center text-xs text-muted-foreground"> <div className="rounded-md border border-dashed border-border/70 px-3 py-8 text-center text-xs text-muted-foreground">
No agent browser sessions. No agent browser sessions.
@@ -505,7 +690,7 @@ export default function BrowserUsePanel({ isVisible, onShowSettings }: BrowserUs
<div className="mt-3 grid grid-cols-2 gap-2"> <div className="mt-3 grid grid-cols-2 gap-2">
<Button variant="outline" size="sm" onClick={stopSession} disabled={isBusy || !selectedSession || selectedSession.status !== 'ready'}> <Button variant="outline" size="sm" onClick={stopSession} disabled={isBusy || !selectedSession || selectedSession.status !== 'ready'}>
<Square className="h-4 w-4" /> <Square className="h-4 w-4" />
Stop End
</Button> </Button>
<Button variant="outline" size="sm" onClick={deleteSession} disabled={isBusy || !selectedSession}> <Button variant="outline" size="sm" onClick={deleteSession} disabled={isBusy || !selectedSession}>
<Trash2 className="h-4 w-4" /> <Trash2 className="h-4 w-4" />

View File

@@ -775,6 +775,17 @@ export function useChatComposerState({
handleSubmitRef.current = handleSubmit; handleSubmitRef.current = handleSubmit;
}, [handleSubmit]); }, [handleSubmit]);
// A voice transcript either fills the input (to edit before sending) or, when the
// user tapped "stop and send", is submitted straight away. Mirror the value into
// inputValueRef synchronously so handleSubmit reads the new text, not the stale state.
const handleVoiceTranscript = useCallback((text: string, send?: boolean) => {
const base = inputValueRef.current.trim();
const next = base ? `${base} ${text}` : text;
setInput(next);
inputValueRef.current = next;
if (send) handleSubmitRef.current?.(createFakeSubmitEvent());
}, [setInput]);
useEffect(() => { useEffect(() => {
inputValueRef.current = input; inputValueRef.current = input;
}, [input]); }, [input]);
@@ -1013,6 +1024,7 @@ export function useChatComposerState({
isDragActive, isDragActive,
openImagePicker: open, openImagePicker: open,
handleSubmit, handleSubmit,
handleVoiceTranscript,
handleInputChange, handleInputChange,
handleKeyDown, handleKeyDown,
handlePaste, handlePaste,

View File

@@ -114,7 +114,6 @@ export function useChatProviderState({ selectedSession, selectedProject }: UseCh
const [providerModelsLoading, setProviderModelsLoading] = useState(true); const [providerModelsLoading, setProviderModelsLoading] = useState(true);
const [providerModelsRefreshing, setProviderModelsRefreshing] = useState(false); const [providerModelsRefreshing, setProviderModelsRefreshing] = useState(false);
const lastProviderRef = useRef(provider);
const providerModelsRequestIdRef = useRef(0); const providerModelsRequestIdRef = useRef(0);
const setStoredProviderModel = useCallback((targetProvider: LLMProvider, model: string) => { const setStoredProviderModel = useCallback((targetProvider: LLMProvider, model: string) => {
@@ -344,14 +343,8 @@ export function useChatProviderState({ selectedSession, selectedProject }: UseCh
localStorage.setItem('selected-provider', selectedSession.__provider); localStorage.setItem('selected-provider', selectedSession.__provider);
}, [provider, selectedSession]); }, [provider, selectedSession]);
useEffect(() => { // Permission prompts belong to a session, not to the transient provider
if (lastProviderRef.current === provider) { // selection that is synchronized after navigation.
return;
}
setPendingPermissionRequests([]);
lastProviderRef.current = provider;
}, [provider]);
useEffect(() => { useEffect(() => {
setPendingPermissionRequests((previous) => setPendingPermissionRequests((previous) =>
previous.filter((request) => !request.sessionId || request.sessionId === selectedSession?.id), previous.filter((request) => !request.sessionId || request.sessionId === selectedSession?.id),

View File

@@ -1,20 +1,29 @@
import { useEffect } from 'react'; import { useEffect, useRef } from 'react';
import type { Dispatch, MutableRefObject, SetStateAction } from 'react'; import type { Dispatch, MutableRefObject, SetStateAction } from 'react';
import type { ServerEvent } from '../../../contexts/WebSocketContext'; import type { ServerEvent } from '../../../contexts/WebSocketContext';
import { showCompletionTitleIndicator } from '../../../utils/pageTitleNotification'; import { showCompletionTitleIndicator } from '../../../utils/pageTitleNotification';
import { playChatCompletionSound } from '../../../utils/notificationSound'; import { playChatCompletionSound, playNotificationSound } from '../../../utils/notificationSound';
import type { MarkSessionIdle, MarkSessionProcessing } from '../../../hooks/useSessionProtection'; import type { MarkSessionIdle, MarkSessionProcessing } from '../../../hooks/useSessionProtection';
import type { PendingPermissionRequest } from '../types/types'; import type { PendingPermissionRequest } from '../types/types';
import type { ProjectSession, LLMProvider } from '../../../types/app'; import type { ProjectSession, LLMProvider } from '../../../types/app';
import type { SessionStore, NormalizedMessage } from '../../../stores/useSessionStore'; import type { SessionStore, NormalizedMessage } from '../../../stores/useSessionStore';
const isActionablePermissionRequest = (request: { toolName?: unknown } | null | undefined): boolean => {
return request?.toolName !== 'ExitPlanMode' && request?.toolName !== 'exit_plan_mode';
};
const hasActionablePermissionRequests = (requests: Array<{ toolName?: unknown }> | null | undefined): boolean => {
return Array.isArray(requests) && requests.some((request) => isActionablePermissionRequest(request));
};
interface UseChatRealtimeHandlersArgs { interface UseChatRealtimeHandlersArgs {
subscribe: (listener: (event: ServerEvent) => void) => () => void; subscribe: (listener: (event: ServerEvent) => void) => () => void;
provider: LLMProvider; provider: LLMProvider;
selectedSession: ProjectSession | null; selectedSession: ProjectSession | null;
currentSessionId: string | null; currentSessionId: string | null;
setTokenBudget: (budget: Record<string, unknown> | null) => void; setTokenBudget: (budget: Record<string, unknown> | null) => void;
pendingPermissionRequests: PendingPermissionRequest[];
setPendingPermissionRequests: Dispatch<SetStateAction<PendingPermissionRequest[]>>; setPendingPermissionRequests: Dispatch<SetStateAction<PendingPermissionRequest[]>>;
streamTimerRef: MutableRefObject<number | null>; streamTimerRef: MutableRefObject<number | null>;
accumulatedStreamRef: MutableRefObject<string>; accumulatedStreamRef: MutableRefObject<string>;
@@ -52,6 +61,7 @@ export function useChatRealtimeHandlers({
selectedSession, selectedSession,
currentSessionId, currentSessionId,
setTokenBudget, setTokenBudget,
pendingPermissionRequests,
setPendingPermissionRequests, setPendingPermissionRequests,
streamTimerRef, streamTimerRef,
accumulatedStreamRef, accumulatedStreamRef,
@@ -62,13 +72,29 @@ export function useChatRealtimeHandlers({
onWebSocketReconnect, onWebSocketReconnect,
sessionStore, sessionStore,
}: UseChatRealtimeHandlersArgs) { }: UseChatRealtimeHandlersArgs) {
// Session switches can send `chat.subscribe` before this effect has a chance
// to rebind the websocket listener. Read the visible session id from a ref
// so a fast `chat_subscribed` ack is matched against the current view, not
// the previous render's closed-over selection.
const activeViewSessionIdRef = useRef<string | null>(selectedSession?.id || currentSessionId || null);
activeViewSessionIdRef.current = selectedSession?.id || currentSessionId || null;
// Keep the latest pending-permission snapshot available to the websocket
// listener so back-to-back permission events can dedupe and re-arm the
// notification sound before React finishes a rerender.
const pendingPermissionRequestsRef = useRef(pendingPermissionRequests);
useEffect(() => {
pendingPermissionRequestsRef.current = pendingPermissionRequests;
}, [pendingPermissionRequests]);
useEffect(() => { useEffect(() => {
const handleEvent = (msg: ServerEvent) => { const handleEvent = (msg: ServerEvent) => {
if (!msg.kind) { if (!msg.kind) {
return; return;
} }
const activeViewSessionId = selectedSession?.id || currentSessionId || null; const activeViewSessionId = activeViewSessionIdRef.current;
const sid = (typeof msg.sessionId === 'string' && msg.sessionId) || activeViewSessionId; const sid = (typeof msg.sessionId === 'string' && msg.sessionId) || activeViewSessionId;
// Record replay progress for every sequenced live event. // Record replay progress for every sequenced live event.
@@ -101,7 +127,16 @@ export function useChatRealtimeHandlers({
const isViewedSession = sid === activeViewSessionId; const isViewedSession = sid === activeViewSessionId;
if (isViewedSession && Array.isArray(msg.pendingPermissions)) { if (isViewedSession && Array.isArray(msg.pendingPermissions)) {
setPendingPermissionRequests(msg.pendingPermissions as PendingPermissionRequest[]); const nextPendingPermissionRequests = msg.pendingPermissions as PendingPermissionRequest[];
const hadActionablePermissionRequests = hasActionablePermissionRequests(pendingPermissionRequestsRef.current);
const hasPendingActionablePermissionRequests = hasActionablePermissionRequests(nextPendingPermissionRequests);
pendingPermissionRequestsRef.current = nextPendingPermissionRequests;
setPendingPermissionRequests(nextPendingPermissionRequests);
if (hasPendingActionablePermissionRequests && !hadActionablePermissionRequests) {
void playNotificationSound();
}
} }
return; return;
} }
@@ -203,6 +238,7 @@ export function useChatRealtimeHandlers({
// hides it immediately and atomically. // hides it immediately and atomically.
onSessionIdle?.(sid); onSessionIdle?.(sid);
if (sid === activeViewSessionId) { if (sid === activeViewSessionId) {
pendingPermissionRequestsRef.current = [];
setPendingPermissionRequests([]); setPendingPermissionRequests([]);
} }
@@ -234,10 +270,14 @@ export function useChatRealtimeHandlers({
case 'permission_request': { case 'permission_request': {
if (!msg.requestId) break; if (!msg.requestId) break;
if (isActionablePermissionRequest({ toolName: msg.toolName })) {
void playNotificationSound();
}
if (sid === activeViewSessionId) { if (sid === activeViewSessionId) {
setPendingPermissionRequests((prev) => { const previousPendingPermissionRequests = pendingPermissionRequestsRef.current;
if (prev.some((r: PendingPermissionRequest) => r.requestId === msg.requestId)) return prev; if (!previousPendingPermissionRequests.some((request) => request.requestId === msg.requestId)) {
return [...prev, { const nextPendingPermissionRequests = [...previousPendingPermissionRequests, {
requestId: msg.requestId as string, requestId: msg.requestId as string,
toolName: (msg.toolName as string) || 'UnknownTool', toolName: (msg.toolName as string) || 'UnknownTool',
input: msg.input, input: msg.input,
@@ -245,7 +285,10 @@ export function useChatRealtimeHandlers({
sessionId: sid || null, sessionId: sid || null,
receivedAt: new Date(), receivedAt: new Date(),
}]; }];
});
pendingPermissionRequestsRef.current = nextPendingPermissionRequests;
setPendingPermissionRequests(nextPendingPermissionRequests);
}
} }
if (sid) { if (sid) {
onSessionProcessing?.(sid); onSessionProcessing?.(sid);
@@ -255,7 +298,12 @@ export function useChatRealtimeHandlers({
case 'permission_cancelled': { case 'permission_cancelled': {
if (msg.requestId && sid === activeViewSessionId) { if (msg.requestId && sid === activeViewSessionId) {
setPendingPermissionRequests((prev) => prev.filter((r: PendingPermissionRequest) => r.requestId !== msg.requestId)); const nextPendingPermissionRequests = pendingPermissionRequestsRef.current.filter(
(request: PendingPermissionRequest) => request.requestId !== msg.requestId,
);
pendingPermissionRequestsRef.current = nextPendingPermissionRequests;
setPendingPermissionRequests(nextPendingPermissionRequests);
} }
break; break;
} }
@@ -286,6 +334,7 @@ export function useChatRealtimeHandlers({
selectedSession, selectedSession,
currentSessionId, currentSessionId,
setTokenBudget, setTokenBudget,
pendingPermissionRequests,
setPendingPermissionRequests, setPendingPermissionRequests,
streamTimerRef, streamTimerRef,
accumulatedStreamRef, accumulatedStreamRef,

View File

@@ -0,0 +1,33 @@
import { useCallback, useEffect, useState } from 'react';
import { voicePlayer, voiceId, type VoiceSnapshot } from '../../../lib/voicePlayer';
export type TtsState = VoiceSnapshot['state'];
/**
* Thin adapter over the app-level voicePlayer. Playback lives outside React (see
* lib/voicePlayer), so switching chats or re-rendering a message no longer cuts the
* audio off. This hook just reflects the player's state for one message and forwards taps.
*/
export function useTts(getText: () => string) {
const content = getText();
const id = voiceId(content);
const [snap, setSnap] = useState<VoiceSnapshot>(() => voicePlayer.getSnapshot(id));
useEffect(() => {
const update = () =>
setSnap((prev) => {
const next = voicePlayer.getSnapshot(id);
return prev.state === next.state && prev.error === next.error ? prev : next;
});
update();
return voicePlayer.subscribe(update);
}, [id]);
const toggle = useCallback(() => {
voicePlayer.unlock(); // synchronous, within the click gesture (iOS)
voicePlayer.toggle(content);
}, [content]);
return { state: snap.state, toggle, error: snap.error };
}

View File

@@ -0,0 +1,85 @@
import { useEffect, useState } from 'react';
import { authenticatedFetch } from '../../../utils/api';
import { readVoiceConfig, VOICE_CONFIG_SYNC_EVENT } from '../../../hooks/useVoiceConfig';
// Voice UI is gated on the `voiceEnabled` UI preference (toggled in Quick Settings /
// the Settings modal) and a configured voice backend.
const STORAGE_KEY = 'uiPreferences';
const SYNC_EVENT = 'ui-preferences:sync';
let healthRequest: Promise<boolean> | null = null;
function checkVoiceHealth(): Promise<boolean> {
if (healthRequest) return healthRequest;
const request = authenticatedFetch('/api/voice/health')
.then(async (response) => {
if (!response.ok) throw new Error(`Voice health check failed (${response.status})`);
const data = await response.json();
return data?.configured === true;
})
.finally(() => {
healthRequest = null;
});
healthRequest = request;
return request;
}
function readVoiceEnabled(): boolean {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return false;
const parsed = JSON.parse(raw);
return parsed?.voiceEnabled === true || parsed?.voiceEnabled === 'true';
} catch {
return false;
}
}
export function useVoiceAvailable(): boolean {
const [enabled, setEnabled] = useState<boolean>(() =>
typeof window === 'undefined' ? false : readVoiceEnabled(),
);
const [available, setAvailable] = useState(false);
useEffect(() => {
const update = () => setEnabled(readVoiceEnabled());
window.addEventListener('storage', update);
window.addEventListener(SYNC_EVENT, update as EventListener);
return () => {
window.removeEventListener('storage', update);
window.removeEventListener(SYNC_EVENT, update as EventListener);
};
}, []);
useEffect(() => {
let active = true;
let requestId = 0;
const check = async () => {
if (!enabled) {
setAvailable(false);
return;
}
if (readVoiceConfig().baseUrl.trim()) {
setAvailable(true);
return;
}
const id = ++requestId;
try {
const result = await checkVoiceHealth();
if (active && id === requestId) setAvailable(result);
} catch {
if (active && id === requestId) setAvailable(false);
}
};
void check();
window.addEventListener(VOICE_CONFIG_SYNC_EVENT, check);
return () => {
active = false;
window.removeEventListener(VOICE_CONFIG_SYNC_EVENT, check);
};
}, [enabled]);
return enabled && available;
}

View File

@@ -0,0 +1,149 @@
import { useCallback, useEffect, useRef, useState } from 'react';
import { transcribeVoice } from '../../../lib/voiceApi';
// Mobile-safe recording: iOS Safari 18.4+ supports webm/opus; older iOS needs mp4.
const MIME_CANDIDATES = [
'audio/webm;codecs=opus',
'audio/webm',
'audio/mp4',
'audio/ogg;codecs=opus',
'audio/ogg',
];
function pickMime(): string {
for (const t of MIME_CANDIDATES) {
try {
if (typeof MediaRecorder !== 'undefined' && MediaRecorder.isTypeSupported(t)) return t;
} catch {
/* isTypeSupported can throw on some iOS versions */
}
}
return '';
}
export type VoiceInputState = 'idle' | 'recording' | 'transcribing';
/**
* Push-to-talk dictation. Records the mic, uploads to /api/voice/transcribe
* (an OpenAI-compatible speech-to-text backend via the Express proxy), and
* returns the transcript through onTranscript.
*/
export function useVoiceInput(
onTranscript: (text: string, send?: boolean) => void,
onError?: (msg: string) => void,
) {
const [state, setState] = useState<VoiceInputState>('idle');
const recorderRef = useRef<MediaRecorder | null>(null);
const chunksRef = useRef<Blob[]>([]);
const streamRef = useRef<MediaStream | null>(null);
const cancelledRef = useRef(false);
const startingRef = useRef(false);
// Whether the in-progress stop should auto-send the transcript (vs just fill the box).
const sendRef = useRef(false);
const stopTracks = () => {
streamRef.current?.getTracks().forEach((t) => t.stop());
streamRef.current = null;
};
// Stop the mic if the component unmounts mid-recording.
useEffect(() => {
cancelledRef.current = false;
return () => {
cancelledRef.current = true;
startingRef.current = false;
streamRef.current?.getTracks().forEach((t) => t.stop());
streamRef.current = null;
recorderRef.current = null;
};
}, []);
const start = useCallback(async () => {
if (startingRef.current || (recorderRef.current && recorderRef.current.state !== 'inactive')) return;
startingRef.current = true;
try {
const stream = await navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true },
});
if (cancelledRef.current) {
stream.getTracks().forEach((t) => t.stop());
return;
}
streamRef.current = stream;
const mimeType = pickMime();
const rec = mimeType ? new MediaRecorder(stream, { mimeType }) : new MediaRecorder(stream);
recorderRef.current = rec;
chunksRef.current = [];
rec.ondataavailable = (e) => {
if (e.data.size > 0) chunksRef.current.push(e.data);
};
rec.onstop = async () => {
stopTracks();
if (cancelledRef.current) return;
// Capture and clear the send intent for this stop before any async work.
const shouldSend = sendRef.current;
sendRef.current = false;
const type = rec.mimeType || 'audio/webm';
const blob = new Blob(chunksRef.current, { type });
if (blob.size < 800) {
setState('idle');
onError?.('Recording too short');
return;
}
setState('transcribing');
try {
const ext = type.includes('mp4') ? 'm4a' : type.includes('ogg') ? 'ogg' : 'webm';
const res = await transcribeVoice(blob, `recording.${ext}`);
if (!res.ok) throw new Error(`transcribe ${res.status}`);
const data = await res.json();
if (cancelledRef.current) return;
const text = String(data?.text || '').trim();
if (text) onTranscript(text, shouldSend);
else onError?.('No speech detected');
} catch (e) {
if (!cancelledRef.current) {
onError?.(`Transcription failed: ${e instanceof Error ? e.message : String(e)}`);
}
} finally {
if (!cancelledRef.current) setState('idle');
}
};
rec.start();
setState('recording');
} catch (e) {
recorderRef.current = null;
stopTracks();
if (cancelledRef.current) return;
const err = e as { name?: string; message?: string };
let msg = `Mic error: ${err?.message || e}`;
if (err?.name === 'NotAllowedError') msg = 'Microphone access denied.';
else if (err?.name === 'NotFoundError') msg = 'No microphone found.';
onError?.(msg);
setState('idle');
} finally {
startingRef.current = false;
}
}, [onTranscript, onError]);
// Stop recording. Pass { send: true } to auto-send the transcript once it's ready.
// Guard on the recorder's own state (not React state) so a double tap, or the mic
// and Send buttons both firing, can't call stop() on an already-inactive recorder.
const stop = useCallback((opts?: { send?: boolean }) => {
const rec = recorderRef.current;
if (rec && rec.state !== 'inactive') {
sendRef.current = opts?.send ?? false;
rec.stop();
}
}, []);
const toggle = useCallback(() => {
if (state === 'recording') stop();
else if (state === 'idle') start();
}, [state, start, stop]);
return { state, toggle, stop };
}

View File

@@ -173,6 +173,7 @@ function ChatInterface({
isDragActive, isDragActive,
openImagePicker, openImagePicker,
handleSubmit, handleSubmit,
handleVoiceTranscript,
handleInputChange, handleInputChange,
handleKeyDown, handleKeyDown,
handlePaste, handlePaste,
@@ -239,6 +240,7 @@ function ChatInterface({
selectedSession, selectedSession,
currentSessionId, currentSessionId,
setTokenBudget, setTokenBudget,
pendingPermissionRequests,
setPendingPermissionRequests, setPendingPermissionRequests,
streamTimerRef, streamTimerRef,
accumulatedStreamRef, accumulatedStreamRef,
@@ -405,6 +407,7 @@ function ChatInterface({
renderInputWithMentions={renderInputWithMentions} renderInputWithMentions={renderInputWithMentions}
textareaRef={textareaRef} textareaRef={textareaRef}
input={input} input={input}
onVoiceTranscript={handleVoiceTranscript}
onInputChange={handleInputChange} onInputChange={handleInputChange}
onTextareaClick={handleTextareaClick} onTextareaClick={handleTextareaClick}
onTextareaKeyDown={handleKeyDown} onTextareaKeyDown={handleKeyDown}

View File

@@ -1,4 +1,5 @@
import { useTranslation } from 'react-i18next'; import { useTranslation } from 'react-i18next';
import { useCallback, useEffect, useRef, useState } from 'react';
import type { import type {
ChangeEvent, ChangeEvent,
ClipboardEvent, ClipboardEvent,
@@ -9,8 +10,10 @@ import type {
RefObject, RefObject,
TouchEvent, TouchEvent,
} from 'react'; } from 'react';
import { ImageIcon, MessageSquareIcon, XIcon, ArrowDownIcon } from 'lucide-react'; import { ImageIcon, MessageSquareIcon, XIcon, ArrowDownIcon, Loader2 } from 'lucide-react';
import { useVoiceInput } from '../../hooks/useVoiceInput';
import { useVoiceAvailable } from '../../hooks/useVoiceAvailable';
import type { SessionActivity } from '../../../../hooks/useSessionProtection'; import type { SessionActivity } from '../../../../hooks/useSessionProtection';
import type { PendingPermissionRequest, PermissionMode } from '../../types/types'; import type { PendingPermissionRequest, PermissionMode } from '../../types/types';
import { import {
@@ -27,6 +30,7 @@ import {
import CommandMenu from './CommandMenu'; import CommandMenu from './CommandMenu';
import ActivityIndicator from './ActivityIndicator'; import ActivityIndicator from './ActivityIndicator';
import ImageAttachment from './ImageAttachment'; import ImageAttachment from './ImageAttachment';
import VoiceInputButton from './VoiceInputButton';
import PermissionRequestsBanner from './PermissionRequestsBanner'; import PermissionRequestsBanner from './PermissionRequestsBanner';
import TokenUsageSummary from './TokenUsageSummary'; import TokenUsageSummary from './TokenUsageSummary';
@@ -89,6 +93,7 @@ interface ChatComposerProps {
renderInputWithMentions: (text: string) => ReactNode; renderInputWithMentions: (text: string) => ReactNode;
textareaRef: RefObject<HTMLTextAreaElement>; textareaRef: RefObject<HTMLTextAreaElement>;
input: string; input: string;
onVoiceTranscript?: (text: string, send?: boolean) => void;
onInputChange: (event: ChangeEvent<HTMLTextAreaElement>) => void; onInputChange: (event: ChangeEvent<HTMLTextAreaElement>) => void;
onTextareaClick: (event: MouseEvent<HTMLTextAreaElement>) => void; onTextareaClick: (event: MouseEvent<HTMLTextAreaElement>) => void;
onTextareaKeyDown: (event: KeyboardEvent<HTMLTextAreaElement>) => void; onTextareaKeyDown: (event: KeyboardEvent<HTMLTextAreaElement>) => void;
@@ -142,6 +147,7 @@ export default function ChatComposer({
renderInputWithMentions, renderInputWithMentions,
textareaRef, textareaRef,
input, input,
onVoiceTranscript,
onInputChange, onInputChange,
onTextareaClick, onTextareaClick,
onTextareaKeyDown, onTextareaKeyDown,
@@ -154,6 +160,28 @@ export default function ChatComposer({
sendByCtrlEnter, sendByCtrlEnter,
}: ChatComposerProps) { }: ChatComposerProps) {
const { t } = useTranslation('chat'); const { t } = useTranslation('chat');
// Voice state is hosted here (not in the mic button) so the main Send button can stop
// recording and send the transcript in one tap, the way the mic button drops it in the box.
const voiceAvailable = useVoiceAvailable();
const [voiceError, setVoiceError] = useState<string | null>(null);
const voiceErrorTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
const handleVoiceError = useCallback((msg: string) => {
setVoiceError(msg);
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
voiceErrorTimer.current = setTimeout(() => setVoiceError(null), 4000);
}, []);
useEffect(() => () => {
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
}, []);
const noopTranscript = useCallback(() => {}, []);
const { state: voiceState, toggle: voiceToggle, stop: voiceStop } = useVoiceInput(
onVoiceTranscript ?? noopTranscript,
handleVoiceError,
);
const isRecording = voiceState === 'recording';
const isTranscribing = voiceState === 'transcribing';
const textareaRect = textareaRef.current?.getBoundingClientRect(); const textareaRect = textareaRef.current?.getBoundingClientRect();
const commandMenuPosition = { const commandMenuPosition = {
top: textareaRect ? Math.max(16, textareaRect.top - 316) : 0, top: textareaRect ? Math.max(16, textareaRect.top - 316) : 0,
@@ -309,6 +337,10 @@ export default function ChatComposer({
<ImageIcon /> <ImageIcon />
</PromptInputButton> </PromptInputButton>
{onVoiceTranscript && voiceAvailable && (
<VoiceInputButton state={voiceState} onToggle={voiceToggle} errorMsg={voiceError} />
)}
<button <button
type="button" type="button"
onClick={onModeSwitch} onClick={onModeSwitch}
@@ -387,10 +419,21 @@ export default function ChatComposer({
{sendByCtrlEnter ? t('input.hintText.ctrlEnter') : t('input.hintText.enter')} {sendByCtrlEnter ? t('input.hintText.ctrlEnter') : t('input.hintText.enter')}
</div> </div>
<PromptInputSubmit <PromptInputSubmit
onClick={isLoading ? onAbortSession : undefined} onClick={
disabled={!isLoading && !input.trim()} isLoading
? onAbortSession
: isRecording
? (e: MouseEvent<HTMLButtonElement>) => {
e.preventDefault();
voiceStop({ send: true });
}
: undefined
}
disabled={isLoading ? false : isRecording ? false : isTranscribing ? true : !input.trim()}
className="h-10 w-10 sm:h-10 sm:w-10" className="h-10 w-10 sm:h-10 sm:w-10"
/> >
{isTranscribing ? <Loader2 className="h-4 w-4 animate-spin" /> : undefined}
</PromptInputSubmit>
</div> </div>
</PromptInputFooter> </PromptInputFooter>
</PromptInput> </PromptInput>

View File

@@ -15,6 +15,7 @@ import { Reasoning, ReasoningTrigger, ReasoningContent } from '../../../../share
import { Markdown } from './Markdown'; import { Markdown } from './Markdown';
import MessageCopyControl from './MessageCopyControl'; import MessageCopyControl from './MessageCopyControl';
import MessageSpeakControl from './MessageSpeakControl';
type DiffLine = { type DiffLine = {
type: string; type: string;
@@ -415,6 +416,9 @@ const MessageComponent = memo(({ message, prevMessage, createDiff, onFileOpen, a
{shouldShowAssistantCopyControl && ( {shouldShowAssistantCopyControl && (
<MessageCopyControl content={assistantCopyContent} messageType="assistant" /> <MessageCopyControl content={assistantCopyContent} messageType="assistant" />
)} )}
{shouldShowAssistantCopyControl && (
<MessageSpeakControl content={assistantCopyContent} />
)}
{!isGrouped && <span>{formattedTime}</span>} {!isGrouped && <span>{formattedTime}</span>}
</div> </div>
)} )}

View File

@@ -0,0 +1,44 @@
import { Volume2, Loader2, Square } from 'lucide-react';
import { useTranslation } from 'react-i18next';
import { useTts } from '../../hooks/useTts';
import { useVoiceAvailable } from '../../hooks/useVoiceAvailable';
// Tap-to-speak button beside the copy control on assistant messages.
// Renders nothing unless the optional voice feature is enabled.
const MessageSpeakControl = ({ content }: { content: string }) => {
const { t } = useTranslation('chat');
const available = useVoiceAvailable();
const { state, toggle, error } = useTts(() => content);
if (!available) return null;
const title =
state === 'playing' ? t('voice.stopSpeaking') : state === 'loading' ? t('voice.loading') : t('voice.speak');
return (
<span className="relative inline-flex">
{error && (
<span className="absolute bottom-full left-1/2 z-10 mb-1 max-w-[240px] -translate-x-1/2 whitespace-normal rounded bg-red-600 px-2 py-1 text-center text-xs text-white shadow-lg">
{error}
</span>
)}
<button
type="button"
onClick={toggle}
title={title}
aria-label={title}
className="inline-flex items-center gap-1 rounded px-1 py-0.5 text-gray-400 transition-colors hover:text-gray-600 dark:text-gray-500 dark:hover:text-gray-300"
>
{state === 'playing' ? (
<Square className="h-3.5 w-3.5" />
) : state === 'loading' ? (
<Loader2 className="h-3.5 w-3.5 animate-spin" />
) : (
<Volume2 className="h-3.5 w-3.5" />
)}
</button>
</span>
);
};
export default MessageSpeakControl;

View File

@@ -0,0 +1,46 @@
import { useTranslation } from 'react-i18next';
import { Mic, Square, Loader2 } from 'lucide-react';
import { PromptInputButton } from '../../../../shared/view/ui';
import type { VoiceInputState } from '../../hooks/useVoiceInput';
type Props = {
state: VoiceInputState;
onToggle: () => void;
errorMsg?: string | null;
};
// Push-to-talk mic button (presentational). Recording state and the stop-and-send action
// are owned by the composer so the main Send button can drive them too. This button just
// starts recording and, while recording, stops and drops the transcript into the input box.
export default function VoiceInputButton({ state, onToggle, errorMsg }: Props) {
const { t } = useTranslation('chat');
const icon =
state === 'recording' ? (
<Square className="text-red-500" />
) : state === 'transcribing' ? (
<Loader2 className="animate-spin" />
) : (
<Mic />
);
return (
<span className="relative inline-flex">
{errorMsg && (
<span className="absolute bottom-full left-1/2 mb-1 -translate-x-1/2 whitespace-nowrap rounded bg-red-600 px-2 py-1 text-xs text-white shadow-lg">
{errorMsg}
</span>
)}
<PromptInputButton
tooltip={{ content: state === 'recording' ? t('voice.stopRecording') : t('voice.input') }}
onClick={(e: { preventDefault: () => void }) => {
e.preventDefault();
onToggle();
}}
>
{icon}
</PromptInputButton>
</span>
);
}

View File

@@ -200,7 +200,11 @@ function MainContent({
{shouldShowBrowserTab && activeTab === 'browser' && ( {shouldShowBrowserTab && activeTab === 'browser' && (
<div className="h-full overflow-hidden"> <div className="h-full overflow-hidden">
<BrowserUsePanel isVisible={activeTab === 'browser'} onShowSettings={onShowSettings} /> <BrowserUsePanel
isVisible={activeTab === 'browser'}
projectId={selectedProject.projectId}
onShowSettings={onShowSettings}
/>
</div> </div>
)} )}

View File

@@ -73,7 +73,15 @@ export default function MainContentTitle({
<h2 className="scrollbar-hide overflow-x-auto whitespace-nowrap text-sm font-semibold leading-tight text-foreground"> <h2 className="scrollbar-hide overflow-x-auto whitespace-nowrap text-sm font-semibold leading-tight text-foreground">
{getSessionTitle(selectedSession)} {getSessionTitle(selectedSession)}
</h2> </h2>
<div className="truncate text-[11px] leading-tight text-muted-foreground">{selectedProject.displayName}</div> <div className="flex min-w-0 items-center gap-2 text-[11px] leading-tight text-muted-foreground">
<span className="min-w-0 truncate">{selectedProject.displayName}</span>
<span
className="hidden min-w-0 max-w-[45%] flex-shrink truncate border-l border-border/60 pl-2 font-mono text-[10px] sm:block"
title={selectedSession.id}
>
{selectedSession.id}
</span>
</div>
</div> </div>
) : showChatNewSession ? ( ) : showChatNewSession ? (
<div className="min-w-0"> <div className="min-w-0">

View File

@@ -4,6 +4,7 @@ import {
Eye, Eye,
Languages, Languages,
Maximize2, Maximize2,
Mic,
} from 'lucide-react'; } from 'lucide-react';
import type { PreferenceToggleItem } from './types'; import type { PreferenceToggleItem } from './types';
@@ -54,4 +55,9 @@ export const INPUT_SETTING_TOGGLES: PreferenceToggleItem[] = [
labelKey: 'quickSettings.sendByCtrlEnter', labelKey: 'quickSettings.sendByCtrlEnter',
icon: Languages, icon: Languages,
}, },
{
key: 'voiceEnabled',
labelKey: 'quickSettings.voiceEnabled',
icon: Mic,
},
]; ];

View File

@@ -6,7 +6,8 @@ export type PreferenceToggleKey =
| 'showRawParameters' | 'showRawParameters'
| 'showThinking' | 'showThinking'
| 'autoScrollToBottom' | 'autoScrollToBottom'
| 'sendByCtrlEnter'; | 'sendByCtrlEnter'
| 'voiceEnabled';
export type QuickSettingsPreferences = Record<PreferenceToggleKey, boolean>; export type QuickSettingsPreferences = Record<PreferenceToggleKey, boolean>;

View File

@@ -28,6 +28,9 @@ export default function QuickSettingsContent({
onPreferenceChange, onPreferenceChange,
}: QuickSettingsContentProps) { }: QuickSettingsContentProps) {
const { t } = useTranslation('settings'); const { t } = useTranslation('settings');
const inputSettingToggles = preferences.voiceEnabled
? INPUT_SETTING_TOGGLES
: INPUT_SETTING_TOGGLES.filter(({ key }) => key !== 'voiceEnabled');
const renderToggleRows = (items: PreferenceToggleItem[]) => ( const renderToggleRows = (items: PreferenceToggleItem[]) => (
items.map(({ key, labelKey, icon }) => ( items.map(({ key, labelKey, icon }) => (
@@ -67,7 +70,7 @@ export default function QuickSettingsContent({
</QuickSettingsSection> </QuickSettingsSection>
<QuickSettingsSection title={t('quickSettings.sections.inputSettings')}> <QuickSettingsSection title={t('quickSettings.sections.inputSettings')}>
{renderToggleRows(INPUT_SETTING_TOGGLES)} {renderToggleRows(inputSettingToggles)}
<p className="ml-3 text-xs text-gray-500 dark:text-gray-400"> <p className="ml-3 text-xs text-gray-500 dark:text-gray-400">
{t('quickSettings.sendByCtrlEnterDescription')} {t('quickSettings.sendByCtrlEnterDescription')}
</p> </p>

View File

@@ -27,12 +27,14 @@ export default function QuickSettingsPanelView() {
showThinking: preferences.showThinking, showThinking: preferences.showThinking,
autoScrollToBottom: preferences.autoScrollToBottom, autoScrollToBottom: preferences.autoScrollToBottom,
sendByCtrlEnter: preferences.sendByCtrlEnter, sendByCtrlEnter: preferences.sendByCtrlEnter,
voiceEnabled: preferences.voiceEnabled,
}), [ }), [
preferences.autoExpandTools, preferences.autoExpandTools,
preferences.autoScrollToBottom, preferences.autoScrollToBottom,
preferences.sendByCtrlEnter, preferences.sendByCtrlEnter,
preferences.showRawParameters, preferences.showRawParameters,
preferences.showThinking, preferences.showThinking,
preferences.voiceEnabled,
]); ]);
const handlePreferenceChange = useCallback( const handlePreferenceChange = useCallback(

View File

@@ -3,7 +3,7 @@ import type { Dispatch, SetStateAction } from 'react';
import type { LLMProvider } from '../../../types/app'; import type { LLMProvider } from '../../../types/app';
import type { ProviderAuthStatus } from '../../provider-auth/types'; import type { ProviderAuthStatus } from '../../provider-auth/types';
export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'tasks' | 'browser' | 'notifications' | 'plugins' | 'about'; export type SettingsMainTab = 'agents' | 'appearance' | 'git' | 'api' | 'voice' | 'tasks' | 'browser' | 'notifications' | 'plugins' | 'about';
export type AgentProvider = LLMProvider; export type AgentProvider = LLMProvider;
export type AgentCategory = 'account' | 'permissions' | 'mcp' | 'skills'; export type AgentCategory = 'account' | 'permissions' | 'mcp' | 'skills';
export type ProjectSortOrder = 'name' | 'date'; export type ProjectSortOrder = 'name' | 'date';

View File

@@ -7,6 +7,7 @@ import SettingsSidebar from '../view/SettingsSidebar';
import AgentsSettingsTab from '../view/tabs/agents-settings/AgentsSettingsTab'; import AgentsSettingsTab from '../view/tabs/agents-settings/AgentsSettingsTab';
import AppearanceSettingsTab from '../view/tabs/AppearanceSettingsTab'; import AppearanceSettingsTab from '../view/tabs/AppearanceSettingsTab';
import CredentialsSettingsTab from '../view/tabs/api-settings/CredentialsSettingsTab'; import CredentialsSettingsTab from '../view/tabs/api-settings/CredentialsSettingsTab';
import VoiceSettingsTab from '../view/tabs/VoiceSettingsTab';
import GitSettingsTab from '../view/tabs/git-settings/GitSettingsTab'; import GitSettingsTab from '../view/tabs/git-settings/GitSettingsTab';
import BrowserUseSettingsTab from '../view/tabs/browser-use-settings/BrowserUseSettingsTab'; import BrowserUseSettingsTab from '../view/tabs/browser-use-settings/BrowserUseSettingsTab';
import NotificationsSettingsTab from '../view/tabs/NotificationsSettingsTab'; import NotificationsSettingsTab from '../view/tabs/NotificationsSettingsTab';
@@ -157,6 +158,8 @@ function Settings({ isOpen, onClose, projects = [], initialTab = 'agents' }: Set
{activeTab === 'api' && <CredentialsSettingsTab />} {activeTab === 'api' && <CredentialsSettingsTab />}
{activeTab === 'voice' && <VoiceSettingsTab />}
{activeTab === 'plugins' && <PluginSettingsTab />} {activeTab === 'plugins' && <PluginSettingsTab />}
{activeTab === 'about' && <AboutTab />} {activeTab === 'about' && <AboutTab />}

View File

@@ -1,5 +1,6 @@
import { Bell, Bot, GitBranch, Info, Key, ListChecks, MonitorPlay, Palette, Puzzle } from 'lucide-react'; import { Bell, Bot, GitBranch, Info, Key, ListChecks, Mic, MonitorPlay, Palette, Puzzle } from 'lucide-react';
import { useTranslation } from 'react-i18next'; import { useTranslation } from 'react-i18next';
import { cn } from '../../../lib/utils'; import { cn } from '../../../lib/utils';
import { PillBar, Pill } from '../../../shared/view/ui'; import { PillBar, Pill } from '../../../shared/view/ui';
import type { SettingsMainTab } from '../types/types'; import type { SettingsMainTab } from '../types/types';
@@ -20,6 +21,7 @@ const NAV_ITEMS: NavItem[] = [
{ id: 'appearance', labelKey: 'mainTabs.appearance', icon: Palette }, { id: 'appearance', labelKey: 'mainTabs.appearance', icon: Palette },
{ id: 'git', labelKey: 'mainTabs.git', icon: GitBranch }, { id: 'git', labelKey: 'mainTabs.git', icon: GitBranch },
{ id: 'api', labelKey: 'mainTabs.apiTokens', icon: Key }, { id: 'api', labelKey: 'mainTabs.apiTokens', icon: Key },
{ id: 'voice', labelKey: 'mainTabs.voice', icon: Mic },
{ id: 'tasks', labelKey: 'mainTabs.tasks', icon: ListChecks }, { id: 'tasks', labelKey: 'mainTabs.tasks', icon: ListChecks },
{ id: 'browser', labelKey: 'mainTabs.browser', icon: MonitorPlay }, { id: 'browser', labelKey: 'mainTabs.browser', icon: MonitorPlay },
{ id: 'plugins', labelKey: 'mainTabs.plugins', icon: Puzzle }, { id: 'plugins', labelKey: 'mainTabs.plugins', icon: Puzzle },

View File

@@ -0,0 +1,91 @@
import type { InputHTMLAttributes } from 'react';
import { useTranslation } from 'react-i18next';
import SettingsSection from '../SettingsSection';
import SettingsToggle from '../SettingsToggle';
import { useUiPreferences } from '../../../../hooks/useUiPreferences';
import { useVoiceConfig } from '../../../../hooks/useVoiceConfig';
const inputClass =
'w-full rounded-md border border-border bg-background px-3 py-2 text-sm text-foreground placeholder:text-muted-foreground focus:outline-none focus:ring-2 focus:ring-ring';
function Field({ label, ...props }: { label: string } & InputHTMLAttributes<HTMLInputElement>) {
return (
<label className="block space-y-1">
<span className="text-sm font-medium text-foreground">{label}</span>
<input className={inputClass} {...props} />
</label>
);
}
export default function VoiceSettingsTab() {
const { t } = useTranslation('settings');
const { preferences, setPreference } = useUiPreferences();
const { config, update } = useVoiceConfig();
const voiceEnabled = preferences.voiceEnabled;
return (
<div className="space-y-8">
<SettingsSection title={t('voiceSettings.title')} description={t('voiceSettings.description')}>
<div className="flex items-center justify-between rounded-lg border border-border p-3">
<div className="pr-3">
<div className="text-sm font-medium text-foreground">{t('voiceSettings.enable')}</div>
<div className="text-xs text-muted-foreground">{t('voiceSettings.enableDescription')}</div>
</div>
<SettingsToggle
checked={voiceEnabled}
onChange={(v) => setPreference('voiceEnabled', v)}
ariaLabel={t('voiceSettings.enable')}
/>
</div>
</SettingsSection>
{voiceEnabled && (
<SettingsSection title={t('voiceSettings.backendTitle')} description={t('voiceSettings.backendDescription')}>
<div className="space-y-4">
<Field
label={t('voiceSettings.baseUrl')}
placeholder="https://api.openai.com/v1"
value={config.baseUrl}
onChange={(e) => update({ baseUrl: e.target.value })}
/>
<Field
label={t('voiceSettings.apiKey')}
type="password"
autoComplete="off"
placeholder="sk-…"
value={config.apiKey}
onChange={(e) => update({ apiKey: e.target.value })}
/>
<div className="grid grid-cols-1 gap-4 sm:grid-cols-4">
<Field
label={t('voiceSettings.sttModel')}
placeholder="whisper-1"
value={config.sttModel}
onChange={(e) => update({ sttModel: e.target.value })}
/>
<Field
label={t('voiceSettings.ttsModel')}
placeholder="tts-1"
value={config.ttsModel}
onChange={(e) => update({ ttsModel: e.target.value })}
/>
<Field
label={t('voiceSettings.voice')}
placeholder="alloy"
value={config.ttsVoice}
onChange={(e) => update({ ttsVoice: e.target.value })}
/>
<Field
label={t('voiceSettings.format')}
placeholder="mp3"
value={config.ttsFormat}
onChange={(e) => update({ ttsFormat: e.target.value })}
/>
</div>
<p className="text-xs text-muted-foreground">{t('voiceSettings.note')}</p>
</div>
</SettingsSection>
)}
</div>
);
}

View File

@@ -1,22 +1,32 @@
import { useCallback, useEffect, useState } from 'react'; import { useCallback, useEffect, useState } from 'react';
import { Download, Loader2 } from 'lucide-react'; import { Download, ExternalLink, Eye, Loader2, Zap } from 'lucide-react';
import { Button } from '../../../../../shared/view/ui'; import { Button, Input } from '../../../../../shared/view/ui';
import { authenticatedFetch } from '../../../../../utils/api'; import { authenticatedFetch } from '../../../../../utils/api';
import SettingsCard from '../../SettingsCard'; import SettingsCard from '../../SettingsCard';
import SettingsRow from '../../SettingsRow'; import SettingsRow from '../../SettingsRow';
import SettingsSection from '../../SettingsSection'; import SettingsSection from '../../SettingsSection';
import SettingsToggle from '../../SettingsToggle'; import SettingsToggle from '../../SettingsToggle';
const BROWSER_USE_GUIDE_URL = 'https://cloudcli.ai/docs/browser-use';
type BrowserUseSettings = { type BrowserUseSettings = {
enabled: boolean; enabled: boolean;
persistSessions: boolean;
defaultProfileName: string;
browserBackend: 'playwright' | 'camoufox-vnc';
}; };
type BrowserUseStatus = { type BrowserUseStatus = {
enabled: boolean; enabled: boolean;
available: boolean; available: boolean;
backend: 'playwright' | 'camoufox-vnc';
browserBackend: 'playwright' | 'camoufox-vnc';
playwrightInstalled: boolean; playwrightInstalled: boolean;
chromiumInstalled: boolean; chromiumInstalled: boolean;
camoufoxInstalled: boolean;
noVncInstalled: boolean;
x11vncInstalled: boolean;
installInProgress: boolean; installInProgress: boolean;
message: string; message: string;
}; };
@@ -32,16 +42,20 @@ async function readJson<T>(response: Response): Promise<T> {
export default function BrowserUseSettingsTab() { export default function BrowserUseSettingsTab() {
const [settings, setSettings] = useState<BrowserUseSettings | null>(null); const [settings, setSettings] = useState<BrowserUseSettings | null>(null);
const [status, setStatus] = useState<BrowserUseStatus | null>(null); const [status, setStatus] = useState<BrowserUseStatus | null>(null);
const [hasLoadedSettings, setHasLoadedSettings] = useState(false);
const [isSettingsLoading, setIsSettingsLoading] = useState(true); const [isSettingsLoading, setIsSettingsLoading] = useState(true);
const [isStatusLoading, setIsStatusLoading] = useState(true); const [isStatusLoading, setIsStatusLoading] = useState(true);
const [isSaving, setIsSaving] = useState(false); const [isSaving, setIsSaving] = useState(false);
const [isInstalling, setIsInstalling] = useState(false); const [isInstalling, setIsInstalling] = useState(false);
const [error, setError] = useState<string | null>(null); const [error, setError] = useState<string | null>(null);
const [profileNameDraft, setProfileNameDraft] = useState('default');
const loadSettings = useCallback(async () => { const loadSettings = useCallback(async () => {
const settingsResponse = await authenticatedFetch('/api/browser-use/settings'); const settingsResponse = await authenticatedFetch('/api/browser-use/settings');
const settingsData = await readJson<{ data: { settings: BrowserUseSettings } }>(settingsResponse); const settingsData = await readJson<{ data: { settings: BrowserUseSettings } }>(settingsResponse);
setSettings(settingsData.data.settings); setSettings(settingsData.data.settings);
setHasLoadedSettings(true);
setProfileNameDraft(settingsData.data.settings.defaultProfileName || 'default');
}, []); }, []);
const loadStatus = useCallback(async () => { const loadStatus = useCallback(async () => {
@@ -52,6 +66,7 @@ export default function BrowserUseSettingsTab() {
useEffect(() => { useEffect(() => {
setError(null); setError(null);
setHasLoadedSettings(false);
setIsSettingsLoading(true); setIsSettingsLoading(true);
setIsStatusLoading(true); setIsStatusLoading(true);
@@ -74,6 +89,7 @@ export default function BrowserUseSettingsTab() {
}); });
const data = await readJson<{ data: { settings: BrowserUseSettings } }>(response); const data = await readJson<{ data: { settings: BrowserUseSettings } }>(response);
setSettings(data.data.settings); setSettings(data.data.settings);
setHasLoadedSettings(true);
window.dispatchEvent(new Event('browserUseSettingsChanged')); window.dispatchEvent(new Event('browserUseSettingsChanged'));
setIsStatusLoading(true); setIsStatusLoading(true);
await loadStatus(); await loadStatus();
@@ -101,8 +117,21 @@ export default function BrowserUseSettingsTab() {
} }
}; };
const saveProfileName = async () => {
const nextName = profileNameDraft.trim() || 'default';
setProfileNameDraft(nextName);
if (nextName === settings?.defaultProfileName) {
return;
}
await updateSettings({ defaultProfileName: nextName });
};
const browserEnabled = settings?.enabled === true; const browserEnabled = settings?.enabled === true;
const needsBrowserBinaries = Boolean(browserEnabled && status && (!status.playwrightInstalled || !status.chromiumInstalled)); const browserDisabled = hasLoadedSettings && settings?.enabled === false;
const persistSessions = settings?.persistSessions === true;
const selectedBackend = settings?.browserBackend || 'playwright';
const effectiveBackend = status?.backend || 'playwright';
const needsBrowserBinaries = Boolean(browserEnabled && status && !status.available);
const runtimeLabel = (installed?: boolean) => { const runtimeLabel = (installed?: boolean) => {
if (isStatusLoading && !status) { if (isStatusLoading && !status) {
return 'checking...'; return 'checking...';
@@ -114,33 +143,165 @@ export default function BrowserUseSettingsTab() {
<div className="space-y-8"> <div className="space-y-8">
<SettingsSection <SettingsSection
title="Browser" title="Browser"
description="Allow agents to create guarded Playwright browser sessions that you can monitor from the Browser tab." description="Give coding agents a working browser so they can open websites, test flows, capture screenshots, and help debug what users actually see."
> >
<SettingsCard divided> <SettingsCard divided>
<SettingsRow <SettingsRow
label="Enable Browser" label="Give Agents Browser Access"
description="Registers Browser for supported agents. Agents can create browser sessions; you can watch, stop, and delete them." description="Let agents use a browser during coding tasks while you can watch live sessions, open them in a tab, and stop them at any time."
>
{isSettingsLoading && !hasLoadedSettings ? (
<Loader2 className="h-4 w-4 animate-spin text-muted-foreground" />
) : hasLoadedSettings ? (
<SettingsToggle
checked={browserEnabled}
onChange={(value) => void updateSettings({ enabled: value })}
ariaLabel="Give Agents Browser Access"
disabled={isSaving}
/>
) : (
<span className="text-sm text-muted-foreground">Unavailable</span>
)}
</SettingsRow>
{browserDisabled && (
<div className="px-4 py-4">
<a
href={BROWSER_USE_GUIDE_URL}
target="_blank"
rel="noopener noreferrer"
className="inline-flex items-center gap-1.5 text-sm font-medium text-primary hover:underline"
>
Read the Browser guide
<ExternalLink className="h-3.5 w-3.5" />
</a>
</div>
)}
{error && (
<div className="px-4 py-4">
<div className="rounded-md border border-red-200 bg-red-50 px-3 py-2 text-sm text-red-700 dark:border-red-900/50 dark:bg-red-950/30 dark:text-red-200">
{error}
</div>
</div>
)}
{browserEnabled && (
<>
<div className="space-y-3 px-4 py-4">
<div className="min-w-0">
<div className="text-sm font-medium text-foreground">Browser Engine</div>
<div className="mt-0.5 text-sm text-muted-foreground">
Pick the kind of browser experience agents should use for new sessions.
</div>
</div>
<div className="grid gap-2 sm:grid-cols-2">
{([
{
value: 'playwright' as const,
label: 'Playwright',
description: 'Best for quick checks, screenshots, and automated page interaction when no manual login is needed.',
icon: Zap,
},
{
value: 'camoufox-vnc' as const,
label: 'Camoufox + noVNC',
description: 'Best when a person may need to log in, approve a step, or watch the browser session live.',
icon: Eye,
},
]).map((option) => {
const Icon = option.icon;
const selected = selectedBackend === option.value;
return (
<button
key={option.value}
type="button"
onClick={() => void updateSettings({ browserBackend: option.value })}
disabled={isSaving || isSettingsLoading}
className={[
'group flex min-h-[88px] items-start gap-3 rounded-lg border px-3 py-3 text-left transition-colors',
'focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:ring-offset-background',
selected
? 'border-primary bg-primary/5 text-foreground shadow-sm'
: 'border-border bg-background hover:border-foreground/20 hover:bg-muted/40',
(isSaving || isSettingsLoading) ? 'cursor-not-allowed opacity-60' : '',
].join(' ')}
aria-pressed={selected}
>
<span className={[
'mt-0.5 flex h-8 w-8 flex-shrink-0 items-center justify-center rounded-md border',
selected ? 'border-primary/30 bg-primary/10 text-primary' : 'border-border bg-muted/40 text-muted-foreground',
].join(' ')}
>
<Icon className="h-4 w-4" />
</span>
<span className="min-w-0">
<span className="block text-sm font-medium">{option.label}</span>
<span className="mt-1 block text-xs leading-relaxed text-muted-foreground">{option.description}</span>
</span>
</button>
);
})}
</div>
</div>
<SettingsRow
label="Remember Browser Logins"
description="Keep cookies and site storage in a named profile so agents can reuse signed-in sessions instead of starting from scratch."
> >
{isSettingsLoading && !settings ? ( {isSettingsLoading && !settings ? (
<Loader2 className="h-4 w-4 animate-spin text-muted-foreground" /> <Loader2 className="h-4 w-4 animate-spin text-muted-foreground" />
) : ( ) : (
<SettingsToggle <SettingsToggle
checked={browserEnabled} checked={persistSessions}
onChange={(value) => void updateSettings({ enabled: value })} onChange={(value) => void updateSettings({ persistSessions: value })}
ariaLabel="Enable Browser" ariaLabel="Remember Browser Logins"
disabled={isSaving} disabled={isSaving}
/> />
)} )}
</SettingsRow> </SettingsRow>
{persistSessions && (
<SettingsRow
label="Default Browser Profile"
description="New browser sessions use this profile by default, so saved logins stay tied to a predictable workspace."
>
<Input
value={profileNameDraft}
onChange={(event) => setProfileNameDraft(event.target.value)}
onBlur={() => void saveProfileName()}
onKeyDown={(event) => {
if (event.key === 'Enter') {
event.currentTarget.blur();
}
}}
disabled={isSaving || isSettingsLoading}
className="w-40"
aria-label="Default Browser Profile"
/>
</SettingsRow>
)}
</>
)}
{browserEnabled && (
<div className="space-y-4 px-4 py-4"> <div className="space-y-4 px-4 py-4">
<div className="flex flex-wrap gap-2 text-xs text-muted-foreground"> <div className="flex flex-wrap gap-2 text-xs text-muted-foreground">
<span className="rounded-md border border-border px-2 py-1">
Backend: {effectiveBackend === 'camoufox-vnc' ? 'Camoufox + noVNC' : 'Playwright'}
</span>
<span className="rounded-md border border-border px-2 py-1"> <span className="rounded-md border border-border px-2 py-1">
Playwright: {runtimeLabel(status?.playwrightInstalled)} Playwright: {runtimeLabel(status?.playwrightInstalled)}
</span> </span>
<span className="rounded-md border border-border px-2 py-1"> <span className="rounded-md border border-border px-2 py-1">
Chromium: {runtimeLabel(status?.chromiumInstalled)} Chromium: {runtimeLabel(status?.chromiumInstalled)}
</span> </span>
<span className="rounded-md border border-border px-2 py-1">
Camoufox: {runtimeLabel(status?.camoufoxInstalled)}
</span>
<span className="rounded-md border border-border px-2 py-1">
noVNC: {runtimeLabel(status?.noVncInstalled)}
</span>
<span className="rounded-md border border-border px-2 py-1"> <span className="rounded-md border border-border px-2 py-1">
Status: {isStatusLoading && !status ? 'checking...' : status?.available ? 'ready' : browserEnabled ? 'setup required' : 'disabled'} Status: {isStatusLoading && !status ? 'checking...' : status?.available ? 'ready' : browserEnabled ? 'setup required' : 'disabled'}
</span> </span>
@@ -172,12 +333,17 @@ export default function BrowserUseSettingsTab() {
</div> </div>
)} )}
{error && ( <a
<div className="rounded-md border border-red-200 bg-red-50 px-3 py-2 text-sm text-red-700 dark:border-red-900/50 dark:bg-red-950/30 dark:text-red-200"> href={BROWSER_USE_GUIDE_URL}
{error} target="_blank"
rel="noopener noreferrer"
className="inline-flex items-center gap-1.5 text-sm font-medium text-primary hover:underline"
>
Read the Browser guide
<ExternalLink className="h-3.5 w-3.5" />
</a>
</div> </div>
)} )}
</div>
</SettingsCard> </SettingsCard>
</SettingsSection> </SettingsSection>
</div> </div>

View File

@@ -7,6 +7,7 @@ type UiPreferences = {
autoScrollToBottom: boolean; autoScrollToBottom: boolean;
sendByCtrlEnter: boolean; sendByCtrlEnter: boolean;
sidebarVisible: boolean; sidebarVisible: boolean;
voiceEnabled: boolean;
}; };
type UiPreferenceKey = keyof UiPreferences; type UiPreferenceKey = keyof UiPreferences;
@@ -39,6 +40,7 @@ const DEFAULTS: UiPreferences = {
autoScrollToBottom: true, autoScrollToBottom: true,
sendByCtrlEnter: false, sendByCtrlEnter: false,
sidebarVisible: true, sidebarVisible: true,
voiceEnabled: false,
}; };
const PREFERENCE_KEYS = Object.keys(DEFAULTS) as UiPreferenceKey[]; const PREFERENCE_KEYS = Object.keys(DEFAULTS) as UiPreferenceKey[];

View File

@@ -0,0 +1,68 @@
import { useState } from 'react';
export type VoiceConfig = {
baseUrl: string;
apiKey: string;
sttModel: string;
ttsModel: string;
ttsVoice: string;
ttsFormat: string;
};
const STORAGE_KEY = 'voiceConfig';
export const VOICE_CONFIG_SYNC_EVENT = 'voice-config:sync';
const DEFAULTS: VoiceConfig = { baseUrl: '', apiKey: '', sttModel: '', ttsModel: '', ttsVoice: '', ttsFormat: '' };
export function readVoiceConfig(): VoiceConfig {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return { ...DEFAULTS };
const parsed = JSON.parse(raw);
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) return { ...DEFAULTS };
const config = { ...DEFAULTS };
for (const key of Object.keys(DEFAULTS) as (keyof VoiceConfig)[]) {
if (typeof parsed[key] === 'string') config[key] = parsed[key];
}
return config;
} catch {
return { ...DEFAULTS };
}
}
// Headers the voice proxy reads to target a per-user OpenAI-compatible backend.
// Empty fields are omitted so the server's env defaults apply.
export function voiceConfigHeaders(): Record<string, string> {
if (typeof window === 'undefined') return {};
const c = readVoiceConfig();
const h: Record<string, string> = {};
if (c.apiKey) h['x-voice-api-key'] = c.apiKey;
if (c.sttModel) h['x-voice-stt-model'] = c.sttModel;
if (c.ttsModel) h['x-voice-tts-model'] = c.ttsModel;
if (c.ttsVoice) h['x-voice-tts-voice'] = c.ttsVoice;
if (c.ttsFormat.trim()) h['x-voice-tts-format'] = c.ttsFormat.trim();
return h;
}
export function useVoiceConfig() {
const [config, setConfig] = useState<VoiceConfig>(() =>
typeof window === 'undefined' ? { ...DEFAULTS } : readVoiceConfig(),
);
const update = (patch: Partial<VoiceConfig>) => {
setConfig((prev) => {
const next = { ...prev, ...patch };
try {
const stored: Partial<VoiceConfig> = { ...next };
if (next.ttsFormat.trim()) stored.ttsFormat = next.ttsFormat.trim();
else delete stored.ttsFormat;
localStorage.setItem(STORAGE_KEY, JSON.stringify(stored));
window.dispatchEvent(new Event(VOICE_CONFIG_SYNC_EVENT));
} catch {
/* ignore persistence errors */
}
return next;
});
};
return { config, update };
}

View File

@@ -122,6 +122,14 @@
} }
} }
}, },
"voice": {
"input": "Voice input",
"stopRecording": "Stop recording",
"transcribing": "Transcribing…",
"speak": "Read aloud",
"stopSpeaking": "Stop",
"loading": "Loading…"
},
"input": { "input": {
"placeholder": "Type / for commands, @ for files, or ask {{provider}} anything...", "placeholder": "Type / for commands, @ for files, or ask {{provider}} anything...",
"placeholderDefault": "Type your message...", "placeholderDefault": "Type your message...",

View File

@@ -50,6 +50,21 @@
"resetToDefaults": "Reset to Defaults", "resetToDefaults": "Reset to Defaults",
"cancelChanges": "Cancel Changes" "cancelChanges": "Cancel Changes"
}, },
"voiceSettings": {
"title": "Voice",
"description": "Speech-to-text input and read-aloud, via an OpenAI-compatible audio backend.",
"enable": "Enable voice",
"enableDescription": "Show the mic button and the read-aloud button on messages.",
"backendTitle": "Backend",
"backendDescription": "Point at OpenAI, Groq, or a local server (LocalAI, Speaches, Kokoro-FastAPI). Leave blank to use the server default.",
"baseUrl": "Base URL",
"apiKey": "API key",
"sttModel": "Speech-to-text model",
"ttsModel": "Text-to-speech model",
"voice": "Voice",
"format": "Audio format",
"note": "A custom base URL is called directly by your browser and must allow browser CORS requests. Leave it blank to use the server-configured backend."
},
"quickSettings": { "quickSettings": {
"title": "Quick Settings", "title": "Quick Settings",
"sections": { "sections": {
@@ -64,6 +79,7 @@
"showThinking": "Show thinking", "showThinking": "Show thinking",
"autoScrollToBottom": "Auto-scroll to bottom", "autoScrollToBottom": "Auto-scroll to bottom",
"sendByCtrlEnter": "Send by Ctrl+Enter", "sendByCtrlEnter": "Send by Ctrl+Enter",
"voiceEnabled": "Voice (mic + read aloud)",
"sendByCtrlEnterDescription": "When enabled, pressing Ctrl+Enter will send the message instead of just Enter. This is useful for IME users to avoid accidental sends.", "sendByCtrlEnterDescription": "When enabled, pressing Ctrl+Enter will send the message instead of just Enter. This is useful for IME users to avoid accidental sends.",
"dragHandle": { "dragHandle": {
"dragging": "Dragging handle", "dragging": "Dragging handle",
@@ -94,6 +110,7 @@
"appearance": "Appearance", "appearance": "Appearance",
"git": "Git", "git": "Git",
"apiTokens": "API & Tokens", "apiTokens": "API & Tokens",
"voice": "Voice",
"tasks": "Tasks", "tasks": "Tasks",
"browser": "Browser", "browser": "Browser",
"notifications": "Notifications", "notifications": "Notifications",
@@ -114,7 +131,7 @@
}, },
"sound": { "sound": {
"title": "Sound", "title": "Sound",
"description": "Play a short tone when a chat run finishes.", "description": "Play a short tone when a chat run finishes or needs tool approval.",
"enabled": "Enabled", "enabled": "Enabled",
"test": "Test sound" "test": "Test sound"
}, },

60
src/lib/voiceApi.ts Normal file
View File

@@ -0,0 +1,60 @@
import { authenticatedFetch } from '../utils/api';
import { readVoiceConfig, voiceConfigHeaders } from '../hooks/useVoiceConfig';
function directUrl(baseUrl: string, path: string): string {
return `${baseUrl.replace(/\/$/, '')}${path}`;
}
export function voiceConfigSignature(): string {
return JSON.stringify(readVoiceConfig());
}
export function transcribeVoice(blob: Blob, filename: string): Promise<Response> {
const config = readVoiceConfig();
const body = new FormData();
if (config.baseUrl.trim()) {
body.append('file', blob, filename);
body.append('model', config.sttModel || 'whisper-1');
return fetch(directUrl(config.baseUrl.trim(), '/audio/transcriptions'), {
method: 'POST',
headers: config.apiKey ? { Authorization: `Bearer ${config.apiKey}` } : {},
body,
});
}
body.append('audio', blob, filename);
return authenticatedFetch('/api/voice/transcribe', {
method: 'POST',
headers: voiceConfigHeaders(),
body,
});
}
export function synthesizeVoice(text: string, signal: AbortSignal): Promise<Response> {
const config = readVoiceConfig();
if (config.baseUrl.trim()) {
return fetch(directUrl(config.baseUrl.trim(), '/audio/speech'), {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...(config.apiKey ? { Authorization: `Bearer ${config.apiKey}` } : {}),
},
body: JSON.stringify({
model: config.ttsModel || 'tts-1',
voice: config.ttsVoice || 'alloy',
input: text,
...(config.ttsFormat.trim() ? { response_format: config.ttsFormat.trim() } : {}),
}),
signal,
});
}
return authenticatedFetch('/api/voice/tts', {
method: 'POST',
body: JSON.stringify({ text }),
headers: voiceConfigHeaders(),
signal,
});
}

196
src/lib/voicePlayer.ts Normal file
View File

@@ -0,0 +1,196 @@
import { synthesizeVoice, voiceConfigSignature } from './voiceApi';
// A single app-level audio player for read-aloud. It owns one <audio> element, lives
// outside the React tree, and caches generated audio by content. Because playback is not
// tied to a component, switching chats or re-rendering a message can't revoke the blob URL
// out from under it (the cause of mid-play cutoffs). v1 plays one message at a time
// (a new play replaces the current one); the design leaves room for a queue later.
export type VoicePlayState = 'idle' | 'loading' | 'playing';
export type VoiceSnapshot = { state: VoicePlayState; error: string | null };
const IDLE: VoiceSnapshot = { state: 'idle', error: null };
const CACHE_MAX = 24;
const CLIENT_TIMEOUT_MS = 330000; // backstop; the server proxy already times out at 5 min
// Stable id / cache key from the text and voice settings that affect its audio (djb2).
export function voiceId(content: string, signature = voiceConfigSignature()): string {
const input = JSON.stringify([content, signature]);
let h = 5381;
for (let i = 0; i < input.length; i++) h = (((h << 5) + h) + input.charCodeAt(i)) | 0;
return (h >>> 0).toString(36);
}
class VoicePlayer {
private audio: HTMLAudioElement | null = null;
private unlocked = false;
private cache = new Map<string, string>(); // id -> blob URL (insertion order = LRU)
private currentId: string | null = null;
private state: VoicePlayState = 'idle';
private errorId: string | null = null;
private errorMsg: string | null = null;
private token = 0; // bumps to ignore stale in-flight results
private activeController: AbortController | null = null; // aborts the in-flight TTS fetch
private errorTimer: ReturnType<typeof setTimeout> | null = null;
private listeners = new Set<() => void>();
subscribe(listener: () => void): () => void {
this.listeners.add(listener);
return () => {
this.listeners.delete(listener);
};
}
private emit() {
this.listeners.forEach((l) => l());
}
getSnapshot(id: string): VoiceSnapshot {
const state = this.currentId === id ? this.state : 'idle';
const error = this.errorId === id ? this.errorMsg : null;
if (state === 'idle' && error === null) return IDLE;
return { state, error };
}
private ensureAudio(): HTMLAudioElement {
if (!this.audio) {
const audio = new Audio();
audio.addEventListener('ended', () => this.onEnded());
audio.addEventListener('error', () => {
// Only meaningful while we believe we're playing.
if (this.state === 'playing') this.onEnded();
});
this.audio = audio;
}
return this.audio;
}
// Call synchronously from the click handler so iOS grants the (reused) element playback.
unlock() {
if (this.unlocked) return;
const audio = this.ensureAudio();
try {
const p = audio.play();
if (p && typeof p.catch === 'function') p.catch(() => {});
audio.pause();
} catch {
/* priming attempt; ignore */
}
this.unlocked = true;
}
toggle(content: string) {
const id = voiceId(content);
if (this.currentId === id && (this.state === 'playing' || this.state === 'loading')) {
this.stop();
return;
}
void this.play(id, content);
}
stop() {
this.token++; // ignore any stale in-flight result
this.abortActive(); // and actually cancel the network request
if (this.audio) this.audio.pause();
this.state = 'idle';
this.currentId = null;
this.emit();
}
private abortActive() {
if (this.activeController) {
this.activeController.abort();
this.activeController = null;
}
}
private onEnded() {
this.state = 'idle';
this.currentId = null;
this.emit();
// (queue auto-advance would hook in here)
}
private setError(id: string, msg: string) {
this.state = 'idle';
this.currentId = id;
this.errorId = id;
this.errorMsg = msg;
this.emit();
if (this.errorTimer) clearTimeout(this.errorTimer);
this.errorTimer = setTimeout(() => {
if (this.errorId === id) {
this.errorId = null;
this.errorMsg = null;
if (this.currentId === id) this.currentId = null;
this.emit();
}
}, 6000);
}
private async play(id: string, content: string) {
const audio = this.ensureAudio();
audio.pause();
this.currentId = id;
this.errorId = null;
this.errorMsg = null;
this.state = 'loading';
this.emit();
const myToken = ++this.token;
this.abortActive(); // cancel any request this play supersedes
try {
let url = this.cache.get(id);
if (!url) {
const controller = new AbortController();
this.activeController = controller;
const timer = setTimeout(() => controller.abort(), CLIENT_TIMEOUT_MS);
const res = await synthesizeVoice(content, controller.signal).finally(() => {
clearTimeout(timer);
if (this.activeController === controller) this.activeController = null;
});
if (myToken !== this.token) return; // superseded by another play/stop
if (!res.ok) {
let msg = `Read-aloud failed (${res.status})`;
try {
const j = await res.json();
if (j?.error) msg = String(j.error);
} catch {
/* non-JSON error body */
}
throw new Error(msg);
}
const blob = await res.blob();
if (myToken !== this.token) return;
url = URL.createObjectURL(blob);
this.cacheSet(id, url);
}
if (myToken !== this.token) return;
audio.src = url;
audio.load();
await audio.play();
if (myToken !== this.token) return;
this.state = 'playing';
this.emit();
} catch (e) {
if (myToken !== this.token) return;
const aborted = e instanceof Error && e.name === 'AbortError';
this.setError(id, aborted ? 'Read-aloud timed out.' : e instanceof Error ? e.message : 'Read-aloud failed');
}
}
private cacheSet(id: string, url: string) {
this.cache.set(id, url);
while (this.cache.size > CACHE_MAX) {
const oldest = this.cache.keys().next().value as string | undefined;
if (oldest === undefined) break;
const oldUrl = this.cache.get(oldest);
this.cache.delete(oldest);
if (oldUrl && oldUrl !== this.audio?.src) URL.revokeObjectURL(oldUrl);
}
}
}
export const voicePlayer = new VoicePlayer();

View File

@@ -58,7 +58,7 @@ const playTone = (
oscillator.stop(startsAt + duration + 0.02); oscillator.stop(startsAt + duration + 0.02);
}; };
export const playChatCompletionSound = async ({ force = false } = {}): Promise<void> => { export const playNotificationSound = async ({ force = false } = {}): Promise<void> => {
if (!force && !isNotificationSoundEnabled()) { if (!force && !isNotificationSoundEnabled()) {
return; return;
} }
@@ -81,3 +81,5 @@ export const playChatCompletionSound = async ({ force = false } = {}): Promise<v
console.warn('Unable to play notification sound:', error); console.warn('Unable to play notification sound:', error);
} }
}; };
export const playChatCompletionSound = (options = {}): Promise<void> => playNotificationSound(options);