mirror of
https://github.com/siteboon/claudecodeui.git
synced 2026-06-27 06:05:54 +08:00
* feat(voice): add optional speech-to-text input and read-aloud TTS Adds a push-to-talk mic button in the composer and a read-aloud button on assistant messages. Both are opt-in and hidden unless a voice backend is configured via VOICE_SIDECAR_URL. The auth-gated /api/voice proxy forwards to a configurable backend exposing /transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health and hides the controls when disabled. Adds i18n keys and docs/voice.md. Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper for STT, Kokoro-82M for TTS, both CPU-capable). * refactor(voice): provider-agnostic backend and in-app config Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and /v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings toggle, and removes the bundled Python sidecar. Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in the button, trim before appending transcripts, and drop the repo-wide wav ignore. * fix(voice): relax backend timeout and surface timeout errors Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can synthesize long messages at roughly real-time, and returns a clear timed-out message (504) instead of failing silently. The read-aloud button now shows backend errors. * fix(voice): play read-aloud through an app-level player to stop cutoffs Read-aloud now runs in a single module-level player outside the React tree instead of per-message component state. Switching chats or re-rendering a message no longer revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once). * fix(voice): address review (SSRF guard, auth mapping, client timeout) Validates the user-supplied backend URL (http/https only, blocks the link-local metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key isn't read as the app's own auth failing; adds a client-side AbortController timeout on the read-aloud request so the button can't sit in loading if a request stalls. * docs(voice): provider-agnostic wording and jsdoc on proxy functions drop leftover sidecar/faster-whisper references now that the backend is any openai-compatible voice api, and add jsdoc to the voice-proxy functions so the docstring coverage check passes. * fix(voice): harden timeout parsing, tts input check, and player abort - fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a bad override can't make the abort fire immediately - type-check the tts `text` before calling .trim() so a non-string body returns 400 instead of throwing - abort the in-flight TTS fetch on stop() and on a superseding play, so tapping read-aloud repeatedly doesn't leave orphaned requests generating audio * feat(voice): send transcript with the main send button while recording while dictating, the main send button stops recording, transcribes, and sends in one tap, matching the codex-style flow. the mic button still stops and drops the transcript into the input box to edit before sending. voice recording state is lifted into the composer so both buttons share it, and the send button is enabled (not grayed) while recording. also fix a pre-existing type error: the quick-settings preferences map was missing voiceEnabled. * fix(voice): make stop() idempotent so a double tap can't throw guard on the recorder's own state instead of react state, so a double tap or the mic and send buttons both firing won't call stop() on an already-inactive MediaRecorder. * fix(voice): expose TTS format in user settings * fix(voice): harden recording and backend behavior Redirects could bypass the backend URL guard, and TTS playback waited for full buffering. Recording could overlap or finish after teardown. Controls also ignored backend readiness. Explicit formats and config-aware cache keys prevent stale audio after settings change. * fix(voice): validate config and request boundaries Malformed stored settings could break voice requests instead of using safe defaults. Health results could outlive auth changes. URL checks also did not guard the fetch sink. Remove constant recorder branches so lifecycle cancellation stays clear. * fix(voice): separate client and server backends User-selected backend URLs must remain usable without letting clients control server requests. Call custom providers from the browser while keeping the server proxy bound to its configured host. This restores voice controls for frontend settings without reopening the SSRF path. * fix: hide voice options until enabled --------- Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com> Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
444 lines
17 KiB
TypeScript
444 lines
17 KiB
TypeScript
import { useTranslation } from 'react-i18next';
|
|
import { useCallback, useEffect, useRef, useState } from 'react';
|
|
import type {
|
|
ChangeEvent,
|
|
ClipboardEvent,
|
|
FormEvent,
|
|
KeyboardEvent,
|
|
MouseEvent,
|
|
ReactNode,
|
|
RefObject,
|
|
TouchEvent,
|
|
} from 'react';
|
|
import { ImageIcon, MessageSquareIcon, XIcon, ArrowDownIcon, Loader2 } from 'lucide-react';
|
|
|
|
import { useVoiceInput } from '../../hooks/useVoiceInput';
|
|
import { useVoiceAvailable } from '../../hooks/useVoiceAvailable';
|
|
import type { SessionActivity } from '../../../../hooks/useSessionProtection';
|
|
import type { PendingPermissionRequest, PermissionMode } from '../../types/types';
|
|
import {
|
|
PromptInput,
|
|
PromptInputHeader,
|
|
PromptInputBody,
|
|
PromptInputTextarea,
|
|
PromptInputFooter,
|
|
PromptInputTools,
|
|
PromptInputButton,
|
|
PromptInputSubmit,
|
|
} from '../../../../shared/view/ui';
|
|
|
|
import CommandMenu from './CommandMenu';
|
|
import ActivityIndicator from './ActivityIndicator';
|
|
import ImageAttachment from './ImageAttachment';
|
|
import VoiceInputButton from './VoiceInputButton';
|
|
import PermissionRequestsBanner from './PermissionRequestsBanner';
|
|
import TokenUsageSummary from './TokenUsageSummary';
|
|
|
|
interface MentionableFile {
|
|
name: string;
|
|
path: string;
|
|
}
|
|
|
|
interface SlashCommand {
|
|
name: string;
|
|
description?: string;
|
|
namespace?: string;
|
|
path?: string;
|
|
type?: string;
|
|
metadata?: Record<string, unknown>;
|
|
[key: string]: unknown;
|
|
}
|
|
|
|
interface ChatComposerProps {
|
|
pendingPermissionRequests: PendingPermissionRequest[];
|
|
handlePermissionDecision: (
|
|
requestIds: string | string[],
|
|
decision: { allow?: boolean; message?: string; rememberEntry?: string | null; updatedInput?: unknown },
|
|
) => void;
|
|
handleGrantToolPermission: (suggestion: { entry: string; toolName: string }) => { success: boolean };
|
|
activity: SessionActivity | null;
|
|
isLoading: boolean;
|
|
onAbortSession: () => void;
|
|
permissionMode: PermissionMode | string;
|
|
onModeSwitch: () => void;
|
|
tokenBudget: Record<string, unknown> | null;
|
|
onShowTokenUsage: () => void;
|
|
slashCommandsCount: number;
|
|
onToggleCommandMenu: () => void;
|
|
hasInput: boolean;
|
|
onClearInput: () => void;
|
|
isUserScrolledUp: boolean;
|
|
hasMessages: boolean;
|
|
onScrollToBottom: () => void;
|
|
onSubmit: (event: FormEvent<HTMLFormElement> | MouseEvent<HTMLButtonElement> | TouchEvent<HTMLButtonElement>) => void;
|
|
isDragActive: boolean;
|
|
attachedImages: File[];
|
|
onRemoveImage: (index: number) => void;
|
|
uploadingImages: Map<string, number>;
|
|
imageErrors: Map<string, string>;
|
|
showFileDropdown: boolean;
|
|
filteredFiles: MentionableFile[];
|
|
selectedFileIndex: number;
|
|
onSelectFile: (file: MentionableFile) => void;
|
|
filteredCommands: SlashCommand[];
|
|
selectedCommandIndex: number;
|
|
onCommandSelect: (command: SlashCommand, index: number, isHover: boolean) => void;
|
|
onCloseCommandMenu: () => void;
|
|
isCommandMenuOpen: boolean;
|
|
frequentCommands: SlashCommand[];
|
|
getRootProps: (...args: unknown[]) => Record<string, unknown>;
|
|
getInputProps: (...args: unknown[]) => Record<string, unknown>;
|
|
openImagePicker: () => void;
|
|
inputHighlightRef: RefObject<HTMLDivElement>;
|
|
renderInputWithMentions: (text: string) => ReactNode;
|
|
textareaRef: RefObject<HTMLTextAreaElement>;
|
|
input: string;
|
|
onVoiceTranscript?: (text: string, send?: boolean) => void;
|
|
onInputChange: (event: ChangeEvent<HTMLTextAreaElement>) => void;
|
|
onTextareaClick: (event: MouseEvent<HTMLTextAreaElement>) => void;
|
|
onTextareaKeyDown: (event: KeyboardEvent<HTMLTextAreaElement>) => void;
|
|
onTextareaPaste: (event: ClipboardEvent<HTMLTextAreaElement>) => void;
|
|
onTextareaScrollSync: (target: HTMLTextAreaElement) => void;
|
|
onTextareaInput: (event: FormEvent<HTMLTextAreaElement>) => void;
|
|
onInputFocusChange?: (focused: boolean) => void;
|
|
placeholder: string;
|
|
isTextareaExpanded: boolean;
|
|
sendByCtrlEnter?: boolean;
|
|
}
|
|
|
|
export default function ChatComposer({
|
|
pendingPermissionRequests,
|
|
handlePermissionDecision,
|
|
handleGrantToolPermission,
|
|
activity,
|
|
isLoading,
|
|
onAbortSession,
|
|
permissionMode,
|
|
onModeSwitch,
|
|
tokenBudget,
|
|
onShowTokenUsage,
|
|
slashCommandsCount,
|
|
onToggleCommandMenu,
|
|
hasInput,
|
|
onClearInput,
|
|
isUserScrolledUp,
|
|
hasMessages,
|
|
onScrollToBottom,
|
|
onSubmit,
|
|
isDragActive,
|
|
attachedImages,
|
|
onRemoveImage,
|
|
uploadingImages,
|
|
imageErrors,
|
|
showFileDropdown,
|
|
filteredFiles,
|
|
selectedFileIndex,
|
|
onSelectFile,
|
|
filteredCommands,
|
|
selectedCommandIndex,
|
|
onCommandSelect,
|
|
onCloseCommandMenu,
|
|
isCommandMenuOpen,
|
|
frequentCommands,
|
|
getRootProps,
|
|
getInputProps,
|
|
openImagePicker,
|
|
inputHighlightRef,
|
|
renderInputWithMentions,
|
|
textareaRef,
|
|
input,
|
|
onVoiceTranscript,
|
|
onInputChange,
|
|
onTextareaClick,
|
|
onTextareaKeyDown,
|
|
onTextareaPaste,
|
|
onTextareaScrollSync,
|
|
onTextareaInput,
|
|
onInputFocusChange,
|
|
placeholder,
|
|
isTextareaExpanded,
|
|
sendByCtrlEnter,
|
|
}: ChatComposerProps) {
|
|
const { t } = useTranslation('chat');
|
|
|
|
// Voice state is hosted here (not in the mic button) so the main Send button can stop
|
|
// recording and send the transcript in one tap, the way the mic button drops it in the box.
|
|
const voiceAvailable = useVoiceAvailable();
|
|
const [voiceError, setVoiceError] = useState<string | null>(null);
|
|
const voiceErrorTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
|
|
const handleVoiceError = useCallback((msg: string) => {
|
|
setVoiceError(msg);
|
|
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
|
|
voiceErrorTimer.current = setTimeout(() => setVoiceError(null), 4000);
|
|
}, []);
|
|
useEffect(() => () => {
|
|
if (voiceErrorTimer.current) clearTimeout(voiceErrorTimer.current);
|
|
}, []);
|
|
const noopTranscript = useCallback(() => {}, []);
|
|
const { state: voiceState, toggle: voiceToggle, stop: voiceStop } = useVoiceInput(
|
|
onVoiceTranscript ?? noopTranscript,
|
|
handleVoiceError,
|
|
);
|
|
const isRecording = voiceState === 'recording';
|
|
const isTranscribing = voiceState === 'transcribing';
|
|
|
|
const textareaRect = textareaRef.current?.getBoundingClientRect();
|
|
const commandMenuPosition = {
|
|
top: textareaRect ? Math.max(16, textareaRect.top - 316) : 0,
|
|
left: textareaRect ? textareaRect.left : 16,
|
|
bottom: textareaRect ? window.innerHeight - textareaRect.top + 8 : 90,
|
|
};
|
|
|
|
// Detect if the AskUserQuestion interactive panel is active
|
|
const hasQuestionPanel = pendingPermissionRequests.some(
|
|
(r) => r.toolName === 'AskUserQuestion'
|
|
);
|
|
|
|
// Hide the thinking/status bar while any permission request is pending
|
|
const hasPendingPermissions = pendingPermissionRequests.length > 0;
|
|
|
|
return (
|
|
<div className="flex-shrink-0 p-2 pb-2 sm:p-4 sm:pb-4 md:p-4 md:pb-6">
|
|
{!hasPendingPermissions && (
|
|
<ActivityIndicator activity={activity} onAbort={onAbortSession} />
|
|
)}
|
|
|
|
{pendingPermissionRequests.length > 0 && (
|
|
<div className="mx-auto mb-3 max-w-4xl">
|
|
<PermissionRequestsBanner
|
|
pendingPermissionRequests={pendingPermissionRequests}
|
|
handlePermissionDecision={handlePermissionDecision}
|
|
handleGrantToolPermission={handleGrantToolPermission}
|
|
/>
|
|
</div>
|
|
)}
|
|
|
|
{!hasQuestionPanel && <div className="relative mx-auto max-w-4xl">
|
|
{isUserScrolledUp && hasMessages && (
|
|
<div className="absolute -top-10 left-0 right-0 z-10 flex justify-center">
|
|
<button
|
|
type="button"
|
|
onClick={onScrollToBottom}
|
|
className="flex h-8 w-8 items-center justify-center rounded-full border border-border/50 bg-card text-muted-foreground shadow-sm transition-all duration-200 hover:bg-accent hover:text-foreground"
|
|
title={t('input.scrollToBottom', { defaultValue: 'Scroll to bottom' })}
|
|
>
|
|
<ArrowDownIcon className="h-4 w-4" />
|
|
</button>
|
|
</div>
|
|
)}
|
|
{showFileDropdown && filteredFiles.length > 0 && (
|
|
<div className="absolute bottom-full left-0 right-0 z-50 mb-2 max-h-48 overflow-y-auto rounded-xl border border-border/50 bg-card/95 shadow-lg backdrop-blur-md">
|
|
{filteredFiles.map((file, index) => (
|
|
<div
|
|
key={file.path}
|
|
className={`cursor-pointer touch-manipulation border-b border-border/30 px-4 py-3 last:border-b-0 ${
|
|
index === selectedFileIndex
|
|
? 'bg-primary/8 text-primary'
|
|
: 'text-foreground hover:bg-accent/50'
|
|
}`}
|
|
onMouseDown={(event) => {
|
|
event.preventDefault();
|
|
event.stopPropagation();
|
|
}}
|
|
onClick={(event) => {
|
|
event.preventDefault();
|
|
event.stopPropagation();
|
|
onSelectFile(file);
|
|
}}
|
|
>
|
|
<div className="text-sm font-medium">{file.name}</div>
|
|
<div className="font-mono text-xs text-muted-foreground">{file.path}</div>
|
|
</div>
|
|
))}
|
|
</div>
|
|
)}
|
|
|
|
<CommandMenu
|
|
commands={filteredCommands}
|
|
selectedIndex={selectedCommandIndex}
|
|
onSelect={onCommandSelect}
|
|
onClose={onCloseCommandMenu}
|
|
position={commandMenuPosition}
|
|
isOpen={isCommandMenuOpen}
|
|
frequentCommands={frequentCommands}
|
|
/>
|
|
|
|
<PromptInput
|
|
onSubmit={onSubmit as (event: FormEvent<HTMLFormElement>) => void}
|
|
status={isLoading ? 'streaming' : 'ready'}
|
|
className={isTextareaExpanded ? 'chat-input-expanded' : ''}
|
|
{...getRootProps()}
|
|
>
|
|
{isDragActive && (
|
|
<div className="absolute inset-0 z-50 flex items-center justify-center rounded-2xl border-2 border-dashed border-primary/50 bg-primary/15">
|
|
<div className="rounded-xl border border-border/30 bg-card p-4 shadow-lg">
|
|
<svg className="mx-auto mb-2 h-8 w-8 text-primary" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
|
<path
|
|
strokeLinecap="round"
|
|
strokeLinejoin="round"
|
|
strokeWidth={2}
|
|
d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12"
|
|
/>
|
|
</svg>
|
|
<p className="text-sm font-medium">Drop images here</p>
|
|
</div>
|
|
</div>
|
|
)}
|
|
|
|
{attachedImages.length > 0 && (
|
|
<PromptInputHeader>
|
|
<div className="rounded-xl bg-muted/40 p-2">
|
|
<div className="flex flex-wrap gap-2">
|
|
{attachedImages.map((file, index) => (
|
|
<ImageAttachment
|
|
key={index}
|
|
file={file}
|
|
onRemove={() => onRemoveImage(index)}
|
|
uploadProgress={uploadingImages.get(file.name)}
|
|
error={imageErrors.get(file.name)}
|
|
/>
|
|
))}
|
|
</div>
|
|
</div>
|
|
</PromptInputHeader>
|
|
)}
|
|
|
|
<input {...getInputProps()} />
|
|
|
|
<PromptInputBody>
|
|
<div ref={inputHighlightRef} aria-hidden="true" className="pointer-events-none absolute inset-0 overflow-hidden rounded-xl">
|
|
<div className="chat-input-placeholder block w-full whitespace-pre-wrap break-words px-4 py-2 text-sm leading-6 text-transparent">
|
|
{renderInputWithMentions(input)}
|
|
</div>
|
|
</div>
|
|
|
|
<PromptInputTextarea
|
|
ref={textareaRef}
|
|
dir="auto"
|
|
value={input}
|
|
onChange={onInputChange}
|
|
onClick={onTextareaClick}
|
|
onKeyDown={onTextareaKeyDown}
|
|
onPaste={onTextareaPaste}
|
|
onScroll={(event) => onTextareaScrollSync(event.target as HTMLTextAreaElement)}
|
|
onFocus={() => onInputFocusChange?.(true)}
|
|
onBlur={() => onInputFocusChange?.(false)}
|
|
onInput={onTextareaInput}
|
|
placeholder={placeholder}
|
|
/>
|
|
</PromptInputBody>
|
|
|
|
<PromptInputFooter>
|
|
<PromptInputTools>
|
|
<PromptInputButton
|
|
tooltip={{ content: t('input.attachImages') }}
|
|
onClick={openImagePicker}
|
|
>
|
|
<ImageIcon />
|
|
</PromptInputButton>
|
|
|
|
{onVoiceTranscript && voiceAvailable && (
|
|
<VoiceInputButton state={voiceState} onToggle={voiceToggle} errorMsg={voiceError} />
|
|
)}
|
|
|
|
<button
|
|
type="button"
|
|
onClick={onModeSwitch}
|
|
className={`rounded-lg border p-2 text-xs font-medium transition-all duration-200 sm:px-2.5 sm:py-1 ${
|
|
permissionMode === 'default'
|
|
? 'border-border/60 bg-muted/50 text-muted-foreground hover:bg-muted'
|
|
: permissionMode === 'acceptEdits'
|
|
? 'border-green-300/60 bg-green-50 text-green-700 hover:bg-green-100 dark:border-green-600/40 dark:bg-green-900/15 dark:text-green-300 dark:hover:bg-green-900/25'
|
|
: permissionMode === 'auto'
|
|
? 'border-blue-300/60 bg-blue-50 text-blue-700 hover:bg-blue-100 dark:border-blue-600/40 dark:bg-blue-900/15 dark:text-blue-300 dark:hover:bg-blue-900/25'
|
|
: permissionMode === 'bypassPermissions'
|
|
? 'border-orange-300/60 bg-orange-50 text-orange-700 hover:bg-orange-100 dark:border-orange-600/40 dark:bg-orange-900/15 dark:text-orange-300 dark:hover:bg-orange-900/25'
|
|
: 'border-primary/20 bg-primary/5 text-primary hover:bg-primary/10'
|
|
}`}
|
|
title={t('input.clickToChangeMode')}
|
|
>
|
|
<div className="flex items-center gap-1.5">
|
|
<div
|
|
className={`h-2.5 w-2.5 rounded-full sm:h-1.5 sm:w-1.5 ${
|
|
permissionMode === 'default'
|
|
? 'bg-muted-foreground'
|
|
: permissionMode === 'acceptEdits'
|
|
? 'bg-green-500'
|
|
: permissionMode === 'auto'
|
|
? 'bg-blue-500'
|
|
: permissionMode === 'bypassPermissions'
|
|
? 'bg-orange-500'
|
|
: 'bg-primary'
|
|
}`}
|
|
/>
|
|
<span className="hidden whitespace-nowrap sm:inline">
|
|
{permissionMode === 'default' && t('codex.modes.default')}
|
|
{permissionMode === 'acceptEdits' && t('codex.modes.acceptEdits')}
|
|
{permissionMode === 'auto' && t('codex.modes.auto')}
|
|
{permissionMode === 'bypassPermissions' && t('codex.modes.bypassPermissions')}
|
|
{permissionMode === 'plan' && t('codex.modes.plan')}
|
|
</span>
|
|
</div>
|
|
</button>
|
|
|
|
<TokenUsageSummary usage={tokenBudget} onClick={onShowTokenUsage} />
|
|
|
|
<PromptInputButton
|
|
tooltip={{ content: t('input.showAllCommands') }}
|
|
onClick={onToggleCommandMenu}
|
|
className="relative"
|
|
>
|
|
<MessageSquareIcon />
|
|
{slashCommandsCount > 0 && (
|
|
<span
|
|
className="absolute -right-1 -top-1 flex h-4 w-4 items-center justify-center rounded-full bg-primary text-[10px] font-bold text-primary-foreground"
|
|
>
|
|
{slashCommandsCount}
|
|
</span>
|
|
)}
|
|
</PromptInputButton>
|
|
|
|
{hasInput && (
|
|
<PromptInputButton
|
|
tooltip={{ content: t('input.clearInput', { defaultValue: 'Clear input' }) }}
|
|
onClick={onClearInput}
|
|
className="hidden sm:flex"
|
|
>
|
|
<XIcon />
|
|
</PromptInputButton>
|
|
)}
|
|
|
|
</PromptInputTools>
|
|
|
|
<div className="flex items-center gap-2">
|
|
<div
|
|
className={`hidden text-xs text-muted-foreground/50 transition-opacity duration-200 lg:block ${
|
|
input.trim() ? 'opacity-0' : 'opacity-100'
|
|
}`}
|
|
>
|
|
{sendByCtrlEnter ? t('input.hintText.ctrlEnter') : t('input.hintText.enter')}
|
|
</div>
|
|
<PromptInputSubmit
|
|
onClick={
|
|
isLoading
|
|
? onAbortSession
|
|
: isRecording
|
|
? (e: MouseEvent<HTMLButtonElement>) => {
|
|
e.preventDefault();
|
|
voiceStop({ send: true });
|
|
}
|
|
: undefined
|
|
}
|
|
disabled={isLoading ? false : isRecording ? false : isTranscribing ? true : !input.trim()}
|
|
className="h-10 w-10 sm:h-10 sm:w-10"
|
|
>
|
|
{isTranscribing ? <Loader2 className="h-4 w-4 animate-spin" /> : undefined}
|
|
</PromptInputSubmit>
|
|
</div>
|
|
</PromptInputFooter>
|
|
</PromptInput>
|
|
</div>}
|
|
</div>
|
|
);
|
|
}
|