Compare commits

..

2 Commits

Author SHA1 Message Date
turato
ed4ae3114a fix(chat): prevent chat interface crash on malformed AskUserQuestion payload (#920)
* fix(chat): prevent chat interface crash when AskUserQuestion payload is malformed

Loading a session that contains an AskUserQuestion tool call could crash the
entire chat interface with "TypeError: e.map is not a function".

The AskUserQuestion tool is configured with `defaultOpen: true`, so
QuestionAnswerContent renders as soon as the session loads. Its array guard
(`!questions || questions.length === 0`) only checked for truthiness, and
`q.options` was mapped/iterated with no guard at all. When `questions` or
`options` arrive from the session transcript as a non-array value, the
`.map()` / `.some()` calls throw and take down the whole chat view via the
error boundary.

Guard both with `Array.isArray()` so a single malformed message can no longer
crash the interface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(chat): cover QuestionAnswerContent against malformed AskUserQuestion payloads

Adds the first frontend regression test, guarding the crash fixed in the
previous commit: a non-array `questions` value or a question missing its
`options` array must render gracefully instead of throwing
"e.map is not a function" and taking down the whole chat interface.

Follows the repo's existing test convention (node:test + tsx); uses
react-dom/server renderToStaticMarkup so no DOM/jsdom is required.
Run with: npx tsx --test src/**/QuestionAnswerContent.test.tsx

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(chat): harden QuestionAnswerContent against malformed question entries

Addresses review feedback: even with the array guards, a malformed transcript
could still crash before the options fallback ran —

- a `questions` entry that is null/non-object threw on `q.question` access
- a non-string `answers[q.question]` threw on `answer.split(', ')`

Skip entries that aren't a proper question object with a string prompt, and
only call string methods on the answer when it is actually a string. Extends
the regression test to cover both vectors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(chat): guard malformed question options

---------

Co-authored-by: hustuhao <hustuhao@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:47:24 +02:00
Haile
591e8e7642 fix: voice tts format settings (#919)
* feat(voice): add optional speech-to-text input and read-aloud TTS

Adds a push-to-talk mic button in the composer and a read-aloud button on
assistant messages. Both are opt-in and hidden unless a voice backend is
configured via VOICE_SIDECAR_URL.

The auth-gated /api/voice proxy forwards to a configurable backend exposing
/transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health
and hides the controls when disabled. Adds i18n keys and docs/voice.md.

Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper
for STT, Kokoro-82M for TTS, both CPU-capable).

* refactor(voice): provider-agnostic backend and in-app config

Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.

Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.

* fix(voice): relax backend timeout and surface timeout errors

Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can
synthesize long messages at roughly real-time, and returns a clear timed-out
message (504) instead of failing silently. The read-aloud button now shows
backend errors.

* fix(voice): play read-aloud through an app-level player to stop cutoffs

Read-aloud now runs in a single module-level player outside the React tree instead
of per-message component state. Switching chats or re-rendering a message no longer
revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so
re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once).

* fix(voice): address review (SSRF guard, auth mapping, client timeout)

Validates the user-supplied backend URL (http/https only, blocks the link-local
metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key
isn't read as the app's own auth failing; adds a client-side AbortController timeout
on the read-aloud request so the button can't sit in loading if a request stalls.

* docs(voice): provider-agnostic wording and jsdoc on proxy functions

drop leftover sidecar/faster-whisper references now that the backend is any
openai-compatible voice api, and add jsdoc to the voice-proxy functions so the
docstring coverage check passes.

* fix(voice): harden timeout parsing, tts input check, and player abort

- fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a
  bad override can't make the abort fire immediately
- type-check the tts `text` before calling .trim() so a non-string body returns
  400 instead of throwing
- abort the in-flight TTS fetch on stop() and on a superseding play, so tapping
  read-aloud repeatedly doesn't leave orphaned requests generating audio

* feat(voice): send transcript with the main send button while recording

while dictating, the main send button stops recording, transcribes, and sends
in one tap, matching the codex-style flow. the mic button still stops and drops
the transcript into the input box to edit before sending. voice recording state
is lifted into the composer so both buttons share it, and the send button is
enabled (not grayed) while recording. also fix a pre-existing type error: the
quick-settings preferences map was missing voiceEnabled.

* fix(voice): make stop() idempotent so a double tap can't throw

guard on the recorder's own state instead of react state, so a double tap or
the mic and send buttons both firing won't call stop() on an already-inactive
MediaRecorder.

* fix(voice): expose TTS format in user settings

* fix(voice): harden recording and backend behavior

Redirects could bypass the backend URL guard, and TTS playback waited for full buffering.

Recording could overlap or finish after teardown. Controls also ignored backend readiness.

Explicit formats and config-aware cache keys prevent stale audio after settings change.

* fix(voice): validate config and request boundaries

Malformed stored settings could break voice requests instead of using safe defaults.

Health results could outlive auth changes. URL checks also did not guard the fetch sink.

Remove constant recorder branches so lifecycle cancellation stays clear.

* fix(voice): separate client and server backends

User-selected backend URLs must remain usable without letting clients control server requests.

Call custom providers from the browser while keeping the server proxy bound to its configured host.

This restores voice controls for frontend settings without reopening the SSRF path.

* fix: hide voice options until enabled

---------

Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:06:40 +02:00
2 changed files with 99 additions and 6 deletions

View File

@@ -0,0 +1,77 @@
import test from 'node:test';
import assert from 'node:assert/strict';
import React from 'react';
import { renderToStaticMarkup } from 'react-dom/server';
import { QuestionAnswerContent } from './QuestionAnswerContent';
// Regression coverage for the chat-interface crash where an AskUserQuestion
// payload loaded from a session transcript arrives with a non-array `questions`
// or a question missing its `options` array. Rendering must degrade gracefully
// instead of throwing "TypeError: e.map is not a function".
test('renders without throwing when questions is a non-array value', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
// Malformed: object instead of an array
questions: { 0: { question: 'q?', options: [{ label: 'a' }] } } as never,
answers: {},
}),
);
});
});
test('renders without throwing when a question is missing options[]', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', header: 'H' } as never],
answers: { 'Pick one?': 'X' },
}),
);
});
});
test('renders without throwing when options[] contains malformed entries', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', options: [null, 'oops', { label: 'A' }] } as never],
answers: { 'Pick one?': 'A, Custom' },
}),
);
});
});
test('renders without throwing when a questions entry is null/non-object', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [null, 'oops', { question: 'Ok?', options: [{ label: 'A' }] }] as never,
answers: {},
}),
);
});
});
test('renders without throwing when an answer is a non-string value', () => {
assert.doesNotThrow(() => {
renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', options: [{ label: 'A' }] }],
// Malformed: answer is an object instead of the expected string
answers: { 'Pick one?': { unexpected: true } } as never,
}),
);
});
});
test('still renders a well-formed question + answer', () => {
const html = renderToStaticMarkup(
React.createElement(QuestionAnswerContent, {
questions: [{ question: 'Pick one?', header: 'H', options: [{ label: 'A' }, { label: 'B' }] }],
answers: { 'Pick one?': 'A' },
}),
);
assert.ok(html.includes('Pick one?'));
});

View File

@@ -15,7 +15,11 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
}) => { }) => {
const [expandedIdx, setExpandedIdx] = useState<number | null>(null); const [expandedIdx, setExpandedIdx] = useState<number | null>(null);
if (!questions || questions.length === 0) { // Tool inputs are runtime data loaded from session transcripts and may be
// malformed (e.g. `questions` arriving as a non-array). Guard with
// Array.isArray so a single bad payload can't crash the whole chat view
// with "e.map is not a function".
if (!Array.isArray(questions) || questions.length === 0) {
return null; return null;
} }
@@ -24,11 +28,23 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
return ( return (
<div className={`space-y-2 ${className}`}> <div className={`space-y-2 ${className}`}>
{questions.map((q, idx) => { {questions.map((rawQuestion, idx) => {
// Entries come from session transcripts and may be malformed; skip
// anything that isn't a proper question object with a string prompt.
if (!rawQuestion || typeof rawQuestion !== 'object' || typeof rawQuestion.question !== 'string') {
return null;
}
const q = rawQuestion;
const answer = answers?.[q.question]; const answer = answers?.[q.question];
const answerLabels = answer ? answer.split(', ') : []; // `answer` may be a non-string (or absent) in malformed payloads.
const answerLabels = typeof answer === 'string' ? answer.split(', ') : [];
const skipped = !answer; const skipped = !answer;
const isExpanded = expandedIdx === idx; const isExpanded = expandedIdx === idx;
// `options` is typed as an array but comes from untrusted runtime data;
// keep only valid entries so `.some`/`.map` below never throw.
const options = Array.isArray(q.options)
? q.options.filter((opt) => opt && typeof opt === 'object' && typeof opt.label === 'string')
: [];
return ( return (
<div <div
@@ -74,7 +90,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
{!isExpanded && answerLabels.length > 0 && ( {!isExpanded && answerLabels.length > 0 && (
<div className="mt-1.5 flex flex-wrap gap-1"> <div className="mt-1.5 flex flex-wrap gap-1">
{answerLabels.map((lbl) => { {answerLabels.map((lbl) => {
const isCustom = !q.options.some(o => o.label === lbl); const isCustom = !options.some(o => o.label === lbl);
return ( return (
<span <span
key={lbl} key={lbl}
@@ -110,7 +126,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
{isExpanded && ( {isExpanded && (
<div className="border-t border-gray-100 px-3 pb-2.5 pt-0.5 dark:border-gray-700/40"> <div className="border-t border-gray-100 px-3 pb-2.5 pt-0.5 dark:border-gray-700/40">
<div className="ml-6.5 space-y-1"> <div className="ml-6.5 space-y-1">
{q.options.map((opt) => { {options.map((opt) => {
const wasSelected = answerLabels.includes(opt.label); const wasSelected = answerLabels.includes(opt.label);
return ( return (
<div <div
@@ -148,7 +164,7 @@ export const QuestionAnswerContent: React.FC<QuestionAnswerContentProps> = ({
); );
})} })}
{answerLabels.filter(lbl => !q.options.some(o => o.label === lbl)).map(lbl => ( {answerLabels.filter(lbl => !options.some(o => o.label === lbl)).map(lbl => (
<div <div
key={lbl} key={lbl}
className="flex items-start gap-2 rounded-lg border border-blue-200/60 bg-blue-50/80 px-2.5 py-1.5 text-[12px] dark:border-blue-800/40 dark:bg-blue-900/20" className="flex items-start gap-2 rounded-lg border border-blue-200/60 bg-blue-50/80 px-2.5 py-1.5 text-[12px] dark:border-blue-800/40 dark:bg-blue-900/20"