* feat(voice): add optional speech-to-text input and read-aloud TTS
Adds a push-to-talk mic button in the composer and a read-aloud button on
assistant messages. Both are opt-in and hidden unless a voice backend is
configured via VOICE_SIDECAR_URL.
The auth-gated /api/voice proxy forwards to a configurable backend exposing
/transcribe and /tts (provider-agnostic); the frontend probes /api/voice/health
and hides the controls when disabled. Adds i18n keys and docs/voice.md.
Includes a local, no-API-key reference backend in voice-sidecar/ (faster-whisper
for STT, Kokoro-82M for TTS, both CPU-capable).
* refactor(voice): provider-agnostic backend and in-app config
Switches the voice proxy to the OpenAI audio API (/v1/audio/transcriptions and
/v1/audio/speech) so it works with OpenAI, Groq, or a local server. Adds a
Settings -> Voice tab (base URL, API key, models, voice) plus a Quick Settings
toggle, and removes the bundled Python sidecar.
Review fixes: stop mic tracks on unmount, clear the global TTS stop handler and
revoke leaked blob URLs, add fetch timeouts in the proxy, surface mic errors in
the button, trim before appending transcripts, and drop the repo-wide wav ignore.
* fix(voice): relax backend timeout and surface timeout errors
Bumps the proxy timeout to 5 minutes (VOICE_TIMEOUT_MS) since local TTS can
synthesize long messages at roughly real-time, and returns a clear timed-out
message (504) instead of failing silently. The read-aloud button now shows
backend errors.
* fix(voice): play read-aloud through an app-level player to stop cutoffs
Read-aloud now runs in a single module-level player outside the React tree instead
of per-message component state. Switching chats or re-rendering a message no longer
revokes the blob URL mid-play (the 'Invalid URI' cutoff). Adds content-keyed caching so
re-listening doesn't regenerate, and reuses one audio element (also unlocks iOS once).
* fix(voice): address review (SSRF guard, auth mapping, client timeout)
Validates the user-supplied backend URL (http/https only, blocks the link-local
metadata range) to prevent SSRF; remaps upstream 401/403 so a bad voice API key
isn't read as the app's own auth failing; adds a client-side AbortController timeout
on the read-aloud request so the button can't sit in loading if a request stalls.
* docs(voice): provider-agnostic wording and jsdoc on proxy functions
drop leftover sidecar/faster-whisper references now that the backend is any
openai-compatible voice api, and add jsdoc to the voice-proxy functions so the
docstring coverage check passes.
* fix(voice): harden timeout parsing, tts input check, and player abort
- fall back to the default when VOICE_TIMEOUT_MS is non-numeric or <= 0, so a
bad override can't make the abort fire immediately
- type-check the tts `text` before calling .trim() so a non-string body returns
400 instead of throwing
- abort the in-flight TTS fetch on stop() and on a superseding play, so tapping
read-aloud repeatedly doesn't leave orphaned requests generating audio
* feat(voice): send transcript with the main send button while recording
while dictating, the main send button stops recording, transcribes, and sends
in one tap, matching the codex-style flow. the mic button still stops and drops
the transcript into the input box to edit before sending. voice recording state
is lifted into the composer so both buttons share it, and the send button is
enabled (not grayed) while recording. also fix a pre-existing type error: the
quick-settings preferences map was missing voiceEnabled.
* fix(voice): make stop() idempotent so a double tap can't throw
guard on the recorder's own state instead of react state, so a double tap or
the mic and send buttons both firing won't call stop() on an already-inactive
MediaRecorder.
* fix(voice): expose TTS format in user settings
* fix(voice): harden recording and backend behavior
Redirects could bypass the backend URL guard, and TTS playback waited for full buffering.
Recording could overlap or finish after teardown. Controls also ignored backend readiness.
Explicit formats and config-aware cache keys prevent stale audio after settings change.
* fix(voice): validate config and request boundaries
Malformed stored settings could break voice requests instead of using safe defaults.
Health results could outlive auth changes. URL checks also did not guard the fetch sink.
Remove constant recorder branches so lifecycle cancellation stays clear.
* fix(voice): separate client and server backends
User-selected backend URLs must remain usable without letting clients control server requests.
Call custom providers from the browser while keeping the server proxy bound to its configured host.
This restores voice controls for frontend settings without reopening the SSRF path.
* fix: hide voice options until enabled
---------
Co-authored-by: newsbubbles <nathaniel.gibson@gmail.com>
Co-authored-by: Simos Mikelatos <simosmik@gmail.com>
* feat(skills): add provider skill management
Users need one settings surface to discover and install skills without manually navigating provider-specific directories.
Add provider-backed global skill installation for Claude, Codex, Gemini, and Cursor, while keeping OpenCode read-only because it reuses other providers' skill locations.
Add a responsive Skills settings tab with scoped discovery, search, refresh controls, markdown and folder uploads, upload feedback, and overflow-safe layouts.
Validate bundled skill files and paths before writing them, preserve scripts and assets, and cover provider discovery and installation behavior with tests.
* fix(skills): preserve uploaded skill folders
Folder drops discarded supporting scripts and assets.
Keep relative paths and upload every file from the selected skill folder.
Use the selected folder name for installation and cover it in provider tests.
* fix(skills): restrict standalone skill uploads
Only show Markdown files when selecting standalone skills.
Normalize browser file paths so SKILL.md is not mistaken for a folder named dot.
* fix(skills): validate installs before writing
Preserve bundled files and normalize fallback names across skill installation paths.
Validate complete batches before writing and reject existing targets to avoid partial installs.
Keep project metadata and make folder selection tolerant of casing and cancelled dialogs.
* fix(skills): overwrite existing installations
Replace an existing skill directory instead of rejecting a duplicate installation.
Remove stale supporting files so the installed directory exactly matches the new upload.
Users can miss chat completions while the app is in the background.
They can also miss completions when their attention is elsewhere.
Add opt-out sound notifications and a temporary title marker.
This makes completion noticeable without external audio assets or persistent browser notifications.
* feat: add opencode support
* fix: stabilize opencode session startup
* fix: /models
* fix: improveUI for commands
* fix: format commands.js
* feat: load models through provider adapters
Provider model selection had outgrown a single hardcoded service.
The old service mixed shared caching with provider catalogs and CLI lookup details.
That made stale model lists more likely as providers changed on separate schedules.
Move model discovery behind each provider so lookup lives next to the integration.
The shared service now focuses on provider resolution, caching, persistence, and dedupe.
Return cache metadata and add bypassCache because model availability changes outside the app.
The UI and /models command can show freshness and let users force a provider refresh.
Surface model descriptions while keeping fallback catalogs for unavailable CLIs or SDKs.
* feat(models): resolve active session models through provider adapters
The model inventory command was showing a mix of catalog defaults and
composer-local state instead of the model that is actually active for a
real provider session. That made /models, /cost, and /status
misleading once a session had already started, especially for providers
whose effective runtime model can differ from the optimistic model value
held in the UI.
Introduce an explicit getCurrentActiveModel() contract on
IProviderModels so model resolution lives next to each provider's
catalog logic and uses the provider-native source of truth:
- Claude reads the init event from a resumed stream-json run
- Codex reads model from ~/.codex/config.toml
- Cursor reads lastUsedModel from the chat store.db
- OpenCode reads the persisted session model from opencode.db
- Gemini intentionally returns its default because the CLI does not
provide a reliable active-session lookup
Keep the returned shape intentionally minimal ({ model }). The goal is
to expose only what downstream command consumers need and avoid leaking
provider-specific metadata into a shared transport shape that would
create extra UI coupling and future cleanup cost.
Also make command behavior session-aware: when there is no concrete
session id, do not spawn provider processes or inspect provider session
storage just to answer /models, /cost, or /status. In a new-session
view the correct answer is simply the provider default, and doing more
work there adds latency and unnecessary side effects for no user value.
As part of this, centralize two supporting concerns:
- add a shared helper for building the default current-model result from
a provider catalog so fallbacks stay aligned with DEFAULT
- move leaf-directory validation into shared utils so Cursor session
readers and model lookup code enforce the same path-safety rule
Tests were expanded to cover both the new service delegation path and
the sessionless command behavior, while keeping cache-sensitive tests
isolated from persisted host cache state.
Why this change:
- command output should reflect the model actually driving a session
- new-session views should stay fast and side-effect free
- provider-specific active-model lookup should not be scattered across
routes or UI code
- fallback behavior should be explicit, consistent, and limited to the
provider default when no true active model can be resolved
* feat: support session-scoped model overrides
Model selection was acting like a provider-level preference.
That made resumed sessions drift back to a default or request-time model.
Users expect /models changes made inside a conversation to affect that session.
Store explicit session choices in app-owned ~/.cloudcli state.
This avoids editing provider transcripts or native provider config.
Resolve the effective model before launching each provider runtime.
Claude, Cursor, Codex, Gemini, and OpenCode now honor stored resume choices.
Expose a backend active-model change endpoint for existing sessions.
The models modal can now distinguish default changes from session overrides.
It also shows when a selected model will apply on the next response.
For Claude, stop probing active model state by resuming with a dummy prompt.
Read the indexed JSONL transcript from the end instead.
This preserves provider history while honoring /model stdout or model fields.
Add service tests for adapter delegation and resume-model precedence.
The tests keep cache state, override state, and requested fallback separate.
* feat: make command modal more compact
* fix: preserve opencode session creation events
OpenCode emits the real session id asynchronously on its first JSON output. The runner
registered that id from a helper that could not see the spawned process because
the process reference was scoped inside the model-resolution callback. That
ReferenceError was swallowed by the generic JSON parse fallback, so the client
never received session_created. Without that event, a new OpenCode chat stayed
on / and the assistant stream was not attached to the new session view.
Keep the process reference in the outer spawn scope so registration can update
the active-process map and websocket writer as soon as OpenCode announces the
session id. Split JSON parsing from event processing so malformed non-JSON
output can still stream as raw text, while registration or adapter failures are
surfaced as real errors instead of being hidden as assistant content.
Add a fake opencode executable regression test to lock in the expected lifecycle
ordering: session_created must be sent before live assistant messages, and the
same session id must carry through stream_end and complete.
* fix: clarify model refresh and onboarding providers
OpenCode is now a supported chat provider, but first-run onboarding still only offered
Claude, Cursor, Codex, and Gemini. That made OpenCode harder to discover and
forced users to finish setup before finding the provider in settings or chat.
Adding it to onboarding keeps first-run setup aligned with the providers the
application already supports elsewhere.
The model refresh control was also doing too much visual work. In the new chat
model picker, the previous Hard Refresh label looked like the dialog heading,
which made the primary task unclear. Users open that dialog to choose a model;
refreshing catalogs is only a secondary maintenance action for stale cached
provider model lists.
Rename and reposition the refresh affordance so the model picker reads as a
model picker first. The copy now explains why catalogs are cached, when a refresh
is useful, and that the refresh checks every provider. The /models modal gets the
same clarification so both model-selection surfaces describe the cache behavior
consistently.
* fix: format opencode model catalog labels
OpenCode returns provider-prefixed ids directly from the CLI. Passing those ids through as
labels made the model picker hard to scan: users saw values like
anthropic/claude-3-5-sonnet-20241022 or lowercased, hyphen-split text instead
of readable model names.
Keep the exact OpenCode id as the option value because that is what the CLI
expects, but derive a presentation label for the frontend. The formatter is
intentionally generic rather than a catalog of known providers. It handles common
identifier structure such as provider/model, hyphen-delimited words, v-prefixed
versions, adjacent numeric version tokens, and 8-digit date suffixes.
This keeps OpenCode usable as its model list expands across many upstream
providers without requiring code changes for every new provider or model family.
The description keeps the raw provider-prefixed id visible so users can still
confirm the precise model being selected.
* feat: add more fallback models for cursor
* docs: move model catalog out of shared
The model catalog is no longer a frontend/backend runtime contract.
Keeping it under shared made ownership misleading. It implied the catalog was
application code shared by runtime consumers, even though it now only supports
README links and public API documentation.
Move the catalog into public so it lives beside the docs surfaces that need it.
This gives the API docs a stable, served module and gives README readers a
linkable source without suggesting frontend or backend runtime dependency.
Render the API docs model list from the exported provider registry instead of a
hardcoded Claude/Cursor/Codex subset. That keeps Gemini and OpenCode visible and
makes future provider documentation changes flow through one docs-specific file.
Update README links, provider maintenance notes, and package files so published
artifacts include the standalone docs page and model catalog without relying on
the old shared path.
* fix: simplify empty-state model selector
Keep the provider empty state focused on the setup action users need there:
choosing a model.
The refresh control, cache timestamp, and refresh explanation made the dialog feel
like a cache-management surface.
That extra action is out of place in the empty state, where the goal is to start
a chat with the selected provider and model.
Remove the refresh-specific UI from ProviderSelectionEmptyState and drop the
now-unused refresh/cache props from the ChatMessagesPane pass-through.
Refresh behavior remains available in the dedicated command result flow.
* feat: introduce notification system and claude notifications
* fix(sw): prevent caching of API requests and WebSocket upgrades
* default to false for webpush notifications and translations for the button
* fix: notifications orchestrator and add a notification when first enabled
* fix: remove unused state update and dependency in settings controller hook
* fix: show notifications settings tab
* fix: add notifications for response completion for all providers
* feat: show session name in notification and don't reload tab on clicking
--- the notification
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Haileyesus <something@gmail.com>
* feat: new plugin system
* Potential fix for code scanning alert no. 312: Uncontrolled data used in path expression
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Update manifest.json
* feat(plugins): add SVG icon support with authenticated inline rendering
* fix: coderabbit changes and new plugin name & repo
* fix: design changes to plugins settings tab
* fix(plugins): prevent git arg injection, add repo URL detection
* fix: lint errors and deleting plugin error on windows
* fix: coderabbit nitpick comments
* fix(plugins): harden path traversal and respect enabled state
Use realpathSync to canonicalize paths before the plugin asset
boundary check, preventing symlink-based traversal bypasses that
could escape the plugin directory.
PluginTabContent now guards on plugin.enabled before mounting the
plugin module, and re-mounts when the enabled state changes so
toggling a plugin takes effect without a page reload.
PluginIcon safely handles a missing iconFile prop and skips
processing non-OK fetch responses instead of attempting to parse
error bodies as SVG.
Register 'plugins' as a known main tab so the settings router
preserves the tab on navigation.
* fix(plugins): support concurrent plugin updates
Replace single updatingPlugin string state with a Set to allow
multiple plugins to update simultaneously. Also disable the update
button and show a descriptive tooltip when a plugin has no git
remote configured.
* fix(plugins): async shutdown and asset/RPC fixes
Await stopPluginServer/stopAllPlugins in signal handlers and route
handlers so process exit and state transitions wait for clean plugin
shutdown instead of racing ahead.
Validate asset paths are regular files before streaming to prevent
directory traversal returning unexpected content; add a stream error
handler to avoid unhandled crashes on read failures.
Fix RPC proxy body detection to use the content-length header instead
of Object.keys, so falsy but valid JSON payloads (null, false, 0, {})
are forwarded correctly to plugin servers.
Track in-flight start operations via a startingPlugins map to prevent
duplicate concurrent plugin starts.
* refactor(git-panel): simplify setCommitMessage with plain function
* fix(plugins): harden input validation and scan reliability
- Validate plugin names against [a-zA-Z0-9_-] allowlist in
manifest and asset routes to prevent path traversal via URL
- Strip embedded credentials (user:pass@) from git remote URLs
before exposing them to the client
- Skip .tmp-* directories during scan to avoid partial installs
from in-progress updates appearing as broken plugins
- Deduplicate plugins sharing the same manifest name to prevent
ambiguous state
- Guard RPC proxy error handler against writing to an already-sent
response, preventing uncaught exceptions on aborted requests
* fix(git-panel): reset changes view on project switch
* refactor: move plugin content to /view folder
* fix: resolve type error in MobileNav and PluginTabContent components
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Haileyesus <118998054+blackmammoth@users.noreply.github.com>
Co-authored-by: Haileyesus <something@gmail.com>
* feat: add terminal shortcuts panel for mobile users
Slide-out panel providing touch-friendly shortcut buttons (Esc, Tab,
Shift+Tab, Arrow Up/Down) and scroll-to-bottom for the terminal.
Integrates into the new modular shell architecture by exposing
terminalRef and wsRef from useShellRuntime hook and reusing the
existing sendSocketMessage utility.
Includes localization keys for en, ja, ko, and zh-CN.
* fix: replace dual touch/click handlers with unified pointer events
Prevents double-fire on touch devices by removing onTouchEnd handlers
and using a single onClick for all interactions (mouse, touch, keyboard).
onPointerDown with preventDefault handles focus steal prevention.
Also clears pending close timer before scheduling a new one to avoid
stale timeout overlap.
Addresses CodeRabbit review feedback on PR #411.
---------
Co-authored-by: Haileyesus <118998054+blackmammoth@users.noreply.github.com>