Haile 591e8e7642 fix: voice tts format settings (#919)
* feat(voice): add optional speech-to-text input and read-aloud TTS

Adds a push-to-talk mic button in the composer and a read-aloud button on
assistant messages. Both are opt-in and hidden unless a voice backend is
configured via VOICE_SIDECAR_URL.

The auth-gated /api/voice proxy forwards to a configurable backend exposing
/transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health
and hides the controls when disabled. Adds i18n keys and docs/voice.md.

Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper
for STT, Kokoro-82M for TTS, both CPU-capable).

* refactor(voice): provider-agnostic backend and in-app config

Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.

Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.

* fix(voice): relax backend timeout and surface timeout errors

Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can
synthesize long messages at roughly real-time, and returns a clear timed-out
message (504) instead of failing silently. The read-aloud button now shows
backend errors.

* fix(voice): play read-aloud through an app-level player to stop cutoffs

Read-aloud now runs in a single module-level player outside the React tree instead
of per-message component state. Switching chats or re-rendering a message no longer
revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so
re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once).

* fix(voice): address review (SSRF guard, auth mapping, client timeout)

Validates the user-supplied backend URL (http/https only, blocks the link-local
metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key
isn't read as the app's own auth failing; adds a client-side AbortController timeout
on the read-aloud request so the button can't sit in loading if a request stalls.

* docs(voice): provider-agnostic wording and jsdoc on proxy functions

drop leftover sidecar/faster-whisper references now that the backend is any
openai-compatible voice api, and add jsdoc to the voice-proxy functions so the
docstring coverage check passes.

* fix(voice): harden timeout parsing, tts input check, and player abort

- fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a
  bad override can't make the abort fire immediately
- type-check the tts `text` before calling .trim() so a non-string body returns
  400 instead of throwing
- abort the in-flight TTS fetch on stop() and on a superseding play, so tapping
  read-aloud repeatedly doesn't leave orphaned requests generating audio

* feat(voice): send transcript with the main send button while recording

while dictating, the main send button stops recording, transcribes, and sends
in one tap, matching the codex-style flow. the mic button still stops and drops
the transcript into the input box to edit before sending. voice recording state
is lifted into the composer so both buttons share it, and the send button is
enabled (not grayed) while recording. also fix a pre-existing type error: the
quick-settings preferences map was missing voiceEnabled.

* fix(voice): make stop() idempotent so a double tap can't throw

guard on the recorder's own state instead of react state, so a double tap or
the mic and send buttons both firing won't call stop() on an already-inactive
MediaRecorder.

* fix(voice): expose TTS format in user settings

* fix(voice): harden recording and backend behavior

Redirects could bypass the backend URL guard, and TTS playback waited for full buffering.

Recording could overlap or finish after teardown. Controls also ignored backend readiness.

Explicit formats and config-aware cache keys prevent stale audio after settings change.

* fix(voice): validate config and request boundaries

Malformed stored settings could break voice requests instead of using safe defaults.

Health results could outlive auth changes. URL checks also did not guard the fetch sink.

Remove constant recorder branches so lifecycle cancellation stays clear.

* fix(voice): separate client and server backends

User-selected backend URLs must remain usable without letting clients control server requests.

Call custom providers from the browser while keeping the server proxy bound to its configured host.

This restores voice controls for frontend settings without reopening the SSRF path.

* fix: hide voice options until enabled

---------

Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
2026-06-26 16:06:40 +02:00
2026-04-21 16:41:40 +00:00
2026-03-10 17:44:11 +01:00
2026-06-08 14:52:09 +03:00
2026-05-28 10:50:41 +02:00
2026-06-26 16:06:40 +02:00
2026-03-09 13:00:52 +03:00
2025-12-30 17:49:30 +00:00
2026-06-09 20:34:48 +00:00
2026-03-29 00:57:09 +00:00
2026-03-29 00:57:09 +00:00
2025-07-11 10:29:36 +00:00
2025-10-31 09:45:35 +01:00

CloudCLI UI

Cloud CLI (aka Claude Code UI)

A desktop and mobile UI for Claude Code, Cursor CLI, Codex, and Gemini-CLI.
Use it locally or remotely to view your active projects and sessions from everywhere.

CloudCLI Cloud · Documentation · Discord · Bug Reports · Contributing

CloudCLI Cloud Join our Discord

siteboon%2Fclaudecodeui | Trendshift


Screenshots

Desktop View

Desktop Interface
Main interface showing project overview and chat

Mobile Experience

Mobile Interface
Responsive mobile design with touch navigation

CLI Selection

CLI Selection
Select between Claude Code, Gemini, Cursor CLI and Codex

Features

  • Responsive Design - Works seamlessly across desktop, tablet, and mobile so you can also use Agents from mobile
  • Interactive Chat Interface - Built-in chat interface for seamless communication with the Agents
  • Integrated Shell Terminal - Direct access to the Agents CLI through built-in shell functionality
  • File Explorer - Interactive file tree with syntax highlighting and live editing
  • Git Explorer - View, stage and commit your changes. You can also switch branches
  • Session Management - Resume conversations, manage multiple sessions, and track history
  • Plugin System - Extend CloudCLI with custom plugins — add new tabs, backend services, and integrations. Build your own →
  • TaskMaster AI Integration (Optional) - Advanced project management with AI-powered task planning, PRD parsing, and workflow automation
  • Model Compatibility - Works with Claude, GPT, and Gemini model families (the full list of supported models is available at runtime via GET /api/providers/:provider/models)

Quick Start

The fastest way to get started — no local setup required. Get a fully managed, containerized development environment accessible from the web, mobile app, API, or your favorite IDE.

Get started with CloudCLI Cloud

Self-Hosted (Open source)

npm

Try CloudCLI UI instantly with npx (requires Node.js v22+):

npx @cloudcli-ai/cloudcli

Or install globally for regular use:

npm install -g @cloudcli-ai/cloudcli
cloudcli

Open http://localhost:3001 — all your existing sessions are discovered automatically.

Visit the documentation → for full configuration options, PM2, remote server setup and more.

Docker Sandboxes (Experimental)

Run agents in isolated sandboxes with hypervisor-level isolation. Starts Claude Code by default. Requires the sbx CLI.

npx @cloudcli-ai/cloudcli@latest sandbox ~/my-project

Supports Claude Code, Codex, and Gemini CLI. See the sandbox docs for setup and advanced options.


Which option is right for you?

CloudCLI UI is the open source UI layer that powers CloudCLI Cloud. You can self-host it on your own machine, run it in a Docker sandbox for isolation, or use CloudCLI Cloud for a fully managed environment.

Self-Hosted (npm) Self-Hosted (Docker Sandbox) (Experimental) CloudCLI Cloud
Best for Local agent sessions on your own machine Isolated agents with web/mobile IDE Teams who want agents in the cloud
How you access it Browser via [yourip]:port Browser via localhost:port Browser, any IDE, REST API, n8n
Setup npx @cloudcli-ai/cloudcli npx @cloudcli-ai/cloudcli@latest sandbox ~/project No setup required
Isolation Runs on your host Hypervisor-level sandbox (microVM) Full cloud isolation
Machine needs to stay on Yes Yes No
Mobile access Any browser on your network Any browser on your network Any device, native app coming
Agents supported Claude Code, Cursor CLI, Codex, Gemini CLI Claude Code, Codex, Gemini CLI Claude Code, Cursor CLI, Codex, Gemini CLI
File explorer and Git Yes Yes Yes
MCP configuration Synced with ~/.claude Managed via UI Managed via UI
REST API Yes Yes Yes
Team sharing No No Yes
Platform cost Free, open source Free, open source Starts at $7/month

All options use your own AI subscriptions (Claude, Cursor, etc.) — CloudCLI provides the environment, not the AI.


Security & Tools Configuration

🔒 Important Notice: All Claude Code tools are disabled by default. This prevents potentially harmful operations from running automatically.

Enabling Tools

To use Claude Code's full functionality, you'll need to manually enable tools:

  1. Open Tools Settings - Click the gear icon in the sidebar
  2. Enable Selectively - Turn on only the tools you need
  3. Apply Settings - Your preferences are saved locally

Tools Settings Modal Tools Settings interface - enable only what you need

Recommended approach: Start with basic tools enabled and add more as needed. You can always adjust these settings later.


Plugins

CloudCLI has a plugin system that lets you add custom tabs with their own frontend UI and optional Node.js backend. Install plugins from git repos directly in Settings > Plugins, or build your own.

Available Plugins

Plugin Description
Project Stats Shows file counts, lines of code, file-type breakdown, largest files, and recently modified files for your current project
Web Terminal Full xterm.js terminal with multi-tab support
Claude Watch Watches long-running Claude Code sessions for hangs and exposes process controls
CloudCLI Scheduler Create workspace-scoped scheduled prompts and execute them through a local CLI such as Codex, Claude Code, or Gemini CLI
PRISM CloudCLI Session intelligence for Claude Code inside CloudCLI, including token burn visibility
Sessions View, manage, and kill active Claude Code sessions
Token Cost Calculator Calculate API costs from model prices and token usage, with preset model pricing support
Task Queue Task queue dashboard to view, filter, and launch agent tasks
GitHub Issues Board Kanban board for GitHub Issues with bidirectional TaskMaster sync and /github-task CLI skill auto-install

Build Your Own

Plugin Starter Template → — fork this repo to create your own plugin. It includes a working example with frontend rendering, live context updates, and RPC communication to a backend server.

Plugin Documentation → — full guide to the plugin API, manifest format, security model, and more.


FAQ

How is this different from Claude Code Remote Control?

Claude Code Remote Control lets you send messages to a session already running in your local terminal. Your machine has to stay on, your terminal has to stay open, and sessions time out after roughly 10 minutes without a network connection.

CloudCLI UI and CloudCLI Cloud extend Claude Code rather than sit alongside it — your MCP servers, permissions, settings, and sessions are the exact same ones Claude Code uses natively. Nothing is duplicated or managed separately.

Here's what that means in practice:

  • All your sessions, not just one — CloudCLI UI auto-discovers every session from your ~/.claude folder. Remote Control only exposes the single active session to make it available in the Claude mobile app.
  • Your settings are your settings — MCP servers, tool permissions, and project config you change in CloudCLI UI are written directly to your Claude Code config and take effect immediately, and vice versa.
  • Works with more agents — Claude Code, Cursor CLI, Codex, and Gemini CLI, not just Claude Code.
  • Full UI, not just a chat window — file explorer, Git integration, MCP management, and a shell terminal are all built in.
  • CloudCLI Cloud runs in the cloud — close your laptop, the agent keeps running. No terminal to babysit, no machine to keep awake.
Do I need to pay for an AI subscription separately?

Yes. CloudCLI provides the environment, not the AI. You bring your own Claude, Cursor, Codex, or Gemini subscription. CloudCLI Cloud starts at $7/month for the hosted environment on top of that.

Can I use CloudCLI UI on my phone?

Yes. For self-hosted, run the server on your machine and open [yourip]:port in any browser on your network. For CloudCLI Cloud, open it from any device — no VPN, no port forwarding, no setup. A native app is also in the works.

Will changes I make in the UI affect my local Claude Code setup?

Yes, for self-hosted. CloudCLI UI reads from and writes to the same ~/.claude config that Claude Code uses natively. MCP servers you add via the UI show up in Claude Code immediately and vice versa.


Community & Support

License

GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later) — see LICENSE for the full text, including additional terms under Section 7.

This project is open source and free to use, modify, and distribute under the AGPL-3.0-or-later license. If you modify this software and run it as a network service, you must make your modified source code available to users of that service.

CloudCLI UI - (https://cloudcli.ai).

Acknowledgments

Built With

Sponsors


Made with care for the Claude Code, Cursor and Codex community.
Description
No description provided
Readme AGPL-3.0 41 MiB
Languages
TypeScript 80.4%
JavaScript 17.4%
HTML 1.2%
CSS 0.9%