feat(chat): unify session gateway with stable IDs and a single WS protocol

The frontend previously juggled placeholder IDs, provider-native IDs, and session_created handoffs, which caused race conditions and provider-specific branching. This introduces app-allocated session IDs, a chat run registry with event replay, delta sidebar updates, and one kind-based websocket contract so the UI can treat every provider the same while JSONL remains the source of truth.
This commit is contained in:
Haileyesus
2026-06-11 18:47:19 +03:00
parent 3d948217ef
commit f5eac2ec12
40 changed files with 2451 additions and 1226 deletions

View File

@@ -33,10 +33,12 @@ Benefits:
|---|---|
| `services/websocket-server.service.ts` | Creates `WebSocketServer`, binds `verifyClient`, routes connection by pathname |
| `services/websocket-auth.service.ts` | Authenticates upgrade requests and attaches `request.user` |
| `services/chat-websocket.service.ts` | Handles `/ws` chat protocol and provider command/session control messages |
| `services/chat-websocket.service.ts` | Handles the `/ws` chat protocol (`chat.send` / `chat.abort` / `chat.subscribe` / `chat.permission-response`) |
| `services/chat-run-registry.service.ts` | Tracks live provider runs per app session id: seq numbering, event replay buffer, provider-id mapping, completion state |
| `services/chat-session-writer.service.ts` | Gateway writer handed to provider runtimes: remaps provider session ids to app ids, swallows `session_created`, assigns `seq` |
| `services/shell-websocket.service.ts` | Handles `/shell` PTY lifecycle, reconnect buffering, auth URL detection |
| `services/plugin-websocket-proxy.service.ts` | Bridges client socket to plugin socket |
| `services/websocket-writer.service.ts` | Adapts raw WebSocket to writer interface (`send`, `setSessionId`, `getSessionId`) |
| `services/websocket-writer.service.ts` | Adapts raw WebSocket to writer interface (`send`, `setSessionId`, `getSessionId`) for non-chat writer consumers |
| `services/websocket-state.service.ts` | Holds shared chat client set and open-state constant |
## High-Level Architecture
@@ -52,12 +54,12 @@ flowchart LR
D -->|other| H[close()]
E --> I[connectedClients Set]
E --> J[WebSocketWriter]
E --> J[chatRunRegistry + ChatSessionWriter]
F --> K[ptySessionsMap]
G --> L[Upstream Plugin ws://127.0.0.1:port/ws]
I --> M[projects.service broadcastProgress]
I --> N[sessions-watcher.service projects_updated]
I --> M[projects.service loading_progress]
I --> N[sessions-watcher.service session_upserted]
```
## Connection Handshake + Routing
@@ -105,38 +107,41 @@ sequenceDiagram
When a chat socket connects:
1. Add socket to `connectedClients`.
2. Build `WebSocketWriter` (captures `userId` from authenticated request).
3. Parse each incoming message with `parseIncomingJsonObject`.
4. Dispatch by `data.type`.
5. On close, remove socket from `connectedClients`.
2. Parse each incoming message with `parseIncomingJsonObject`.
3. Dispatch by `data.type` (four message types, none provider-specific).
4. On close, remove socket from `connectedClients`.
### Session identity model
The frontend only ever knows the **app session id** (allocated by
`POST /api/providers/sessions` or discovered via the session index). The
provider-native id (JSONL file name, CLI resume id) stays inside the backend:
1. `chat.send` resolves the app id to `{ provider, provider_session_id, project_path }` from the sessions DB.
2. The provider runtime receives the provider-native id for resume.
3. The `ChatSessionWriter` remaps every outbound event back to the app id, and turns `session_created` announcements into a DB mapping update instead of forwarding them.
### Chat Message Dispatch
```mermaid
flowchart TD
A[Incoming WS message] --> B[parseIncomingJsonObject]
B -->|invalid| C[send {type:error}]
B -->|invalid| C[send kind:protocol_error]
B -->|ok| D{data.type}
D -->|claude-command| E[queryClaudeSDK]
D -->|cursor-command| F[spawnCursor]
D -->|codex-command| G[queryCodex]
D -->|gemini-command| H[spawnGemini]
D -->|cursor-resume| I[spawnCursor resume]
D -->|abort-session| J[abort by provider]
D -->|claude-permission-response| K[resolveToolApproval]
D -->|cursor-abort| L[abortCursorSession]
D -->|check-session-status| M[is*SessionActive + optional reconnectSessionWriter]
D -->|get-pending-permissions| N[getPendingApprovalsForSession]
D -->|get-active-sessions| O[getActive*Sessions]
D -->|chat.send| E[resolve session row -> startRun -> spawnFns provider]
D -->|chat.abort| F[abortFns provider + synthetic complete]
D -->|chat.subscribe| G[chat_subscribed ack + attach socket + replay events seq > lastSeq]
D -->|chat.permission-response| H[resolveToolApproval]
D -->|other| I[send kind:protocol_error]
```
### Chat Notes
1. **Unified terminal lifecycle**: every provider run ends with exactly one `complete` message built by `createCompleteMessage()` (`server/shared/utils.ts`), regardless of provider: `{ kind: "complete", sessionId, actualSessionId, exitCode, success, aborted }`. Failed runs emit an informational `error` message first, then the terminal `complete` with `success: false`. Mid-run `error` messages (e.g. stderr output) are non-terminal; the frontend only treats `complete` as end-of-run.
2. `abort-session` sends the terminal `complete` (`aborted: true`) on behalf of the cancelled run; providers detect the abort and skip their own `complete` so the client sees exactly one.
3. `check-session-status` returns `{ type: "session-status", isProcessing }`.
4. Claude status checks can reconnect output stream to the new socket via `reconnectSessionWriter`.
1. **Unified envelope**: every server-to-client frame carries a `kind` — either a provider `NormalizedMessage` kind or a gateway kind (`chat_subscribed`, `session_upserted`, `loading_progress`, `protocol_error`). There is no second `type`-based protocol.
2. **Unified terminal lifecycle**: every provider run ends with exactly one `complete` message built by `createCompleteMessage()` (`server/shared/utils.ts`): `{ kind: "complete", sessionId, actualSessionId, exitCode, success, aborted }`. The chat handler emits a synthetic `complete` for runs that crash or get aborted, and the run registry drops duplicate completes.
3. **Per-run event log**: every live event gets a monotonically increasing `seq`. `chat.subscribe { sessions: [{ sessionId, lastSeq }] }` re-attaches the live stream to the requesting socket (any provider, not just Claude) and replays events with `seq > lastSeq`. If the buffer no longer covers `lastSeq`, the client refreshes over REST.
4. `chat_subscribed` includes `isProcessing` (replaces `check-session-status`) and `pendingPermissions` (replaces `get-pending-permissions`).
## `/shell` Terminal Flow
@@ -224,9 +229,9 @@ Only chat sockets (`/ws`) are tracked in `connectedClients`.
That shared set is consumed by:
1. `modules/projects/services/projects-with-sessions-fetch.service.ts`
Broadcasts `loading_progress` while project snapshots are being built.
Broadcasts `kind: loading_progress` while project snapshots are being built.
2. `modules/providers/services/sessions-watcher.service.ts`
Broadcasts `projects_updated` when provider session artifacts change.
Broadcasts per-session `kind: session_upserted` deltas when provider session artifacts change (no full project snapshots).
This design centralizes cross-module realtime fanout without requiring route-local references to WebSocket internals.
@@ -253,7 +258,7 @@ Current explicit close codes in this module:
Other errors:
1. Chat handler catches and emits `{ type: "error", error }`.
1. Chat handler catches and emits `{ kind: "protocol_error", code, error }`.
2. Shell handler catches and writes terminal-visible error output.
3. Unknown websocket paths are closed immediately.

View File

@@ -0,0 +1,257 @@
import { sessionsDb } from '@/modules/database/index.js';
import { ChatSessionWriter } from '@/modules/websocket/services/chat-session-writer.service.js';
import type {
LLMProvider,
NormalizedMessage,
RealtimeClientConnection,
} from '@/shared/types.js';
type ChatRunStatus = 'running' | 'completed';
/**
* One live (or recently finished) provider run for a single app session.
*
* State notes — why each mutable field is essential:
* - `providerSessionId`: the provider-native id captured mid-run. The abort
* handler needs it to address the provider runtime, and the DB mapping is
* written from it so history/resume work after the run.
* - `status`: drives `chat_subscribed.isProcessing`, prevents double sends
* into the same session, and guards the synthetic-complete fallback in the
* chat handler (only emitted when a runtime died without completing).
* - `lastSeq` / `events`: the per-run event log. Every live event gets a
* monotonically increasing `seq` and is buffered so a reconnecting client
* can replay exactly the events it missed via `chat.subscribe`.
*/
type ChatRun = {
appSessionId: string;
provider: LLMProvider;
providerSessionId: string | null;
status: ChatRunStatus;
lastSeq: number;
events: NormalizedMessage[];
writer: ChatSessionWriter;
startedAt: number;
completedAt: number | null;
};
/**
* How long a completed run stays available for replay. Covers the window
* between a run finishing and the client refreshing history over REST (for
* example when the browser tab was asleep while the run completed).
*/
const COMPLETED_RUN_RETENTION_MS = 5 * 60 * 1000;
/**
* Upper bound on buffered events per run so a very long tool-heavy run cannot
* grow memory unbounded. When exceeded, the oldest events are dropped —
* a reconnecting client whose `lastSeq` predates the buffer falls back to a
* REST history refresh, which is always the authoritative source.
*/
const MAX_BUFFERED_EVENTS_PER_RUN = 5000;
/**
* Active and recently-completed runs keyed by app session id.
*
* This map is the single in-memory source of truth for "is something running
* for this session" — the chat websocket handler, abort path, and subscribe
* path all consult it instead of asking each provider runtime individually.
*/
const runs = new Map<string, ChatRun>();
function evictRunLater(appSessionId: string): void {
const timer = setTimeout(() => {
const run = runs.get(appSessionId);
if (run && run.status === 'completed') {
runs.delete(appSessionId);
}
}, COMPLETED_RUN_RETENTION_MS);
// Never keep the process alive just to evict a buffered run.
timer.unref?.();
}
/**
* Decorates one outbound live event for a run and records it in the event log.
*
* Responsibilities:
* 1. Remap `sessionId` (and `actualSessionId` on `complete`) to the stable
* app session id — provider-native ids never leave the backend.
* 2. Assign the next `seq` so clients can detect/replay gaps.
* 3. Buffer the event for `chat.subscribe` replay.
* 4. Flip the run to `completed` when the terminal `complete` event passes by.
*/
function decorateAndRecordEvent(run: ChatRun, message: NormalizedMessage): NormalizedMessage | null {
// Exactly-one-complete contract: when a run is aborted the chat handler
// emits the terminal `complete` immediately, but the killed runtime may
// still emit its own `complete` from its exit handler moments later.
// Whichever arrives first wins; the duplicate is dropped here.
if (message.kind === 'complete' && run.status === 'completed') {
return null;
}
run.lastSeq += 1;
const outbound: NormalizedMessage = {
...message,
sessionId: run.appSessionId,
seq: run.lastSeq,
};
if (message.kind === 'complete') {
// The provider may report its own id here; the frontend only ever knows
// the app id, so the "actual" id is by definition the app id as well.
outbound.actualSessionId = run.appSessionId;
run.status = 'completed';
run.completedAt = Date.now();
evictRunLater(run.appSessionId);
}
run.events.push(outbound);
if (run.events.length > MAX_BUFFERED_EVENTS_PER_RUN) {
run.events.splice(0, run.events.length - MAX_BUFFERED_EVENTS_PER_RUN);
}
return outbound;
}
/**
* Records the provider-native session id for a run and persists the
* app-id-to-provider-id mapping so history fetches and future resumes can
* address the provider transcript.
*
* Called from the gateway writer when the runtime either calls
* `setSessionId(...)` or emits its `session_created` event — whichever
* happens first wins; later calls with the same id are no-ops.
*/
function recordProviderSessionId(run: ChatRun, providerSessionId: string): void {
if (!providerSessionId || run.providerSessionId === providerSessionId) {
return;
}
run.providerSessionId = providerSessionId;
try {
sessionsDb.assignProviderSessionId(run.appSessionId, providerSessionId);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error('[ChatRunRegistry] Failed to persist provider session id mapping', {
appSessionId: run.appSessionId,
providerSessionId,
error: message,
});
}
}
/**
* Registry of live provider runs keyed by the stable app session id.
*
* The registry is what makes the websocket protocol provider-independent:
* every run gets a `ChatSessionWriter` that remaps provider-native session
* ids to the app id, assigns `seq` numbers, and buffers events for replay —
* regardless of which provider runtime produced them.
*/
export const chatRunRegistry = {
/**
* Starts tracking a run and returns it, or `null` when a run is already in
* progress for the session (callers must reject the duplicate send).
*/
startRun(input: {
appSessionId: string;
provider: LLMProvider;
providerSessionId: string | null;
connection: RealtimeClientConnection;
userId: string | number | null;
}): ChatRun | null {
const existing = runs.get(input.appSessionId);
if (existing && existing.status === 'running') {
return null;
}
const run: ChatRun = {
appSessionId: input.appSessionId,
provider: input.provider,
providerSessionId: input.providerSessionId,
status: 'running',
lastSeq: 0,
events: [],
writer: null as unknown as ChatSessionWriter,
startedAt: Date.now(),
completedAt: null,
};
run.writer = new ChatSessionWriter({
connection: input.connection,
userId: input.userId,
provider: input.provider,
providerSessionId: input.providerSessionId,
onProviderSessionId: (providerSessionId) => {
recordProviderSessionId(run, providerSessionId);
},
decorateOutboundEvent: (message) => decorateAndRecordEvent(run, message),
});
runs.set(input.appSessionId, run);
return run;
},
getRun(appSessionId: string): ChatRun | undefined {
return runs.get(appSessionId);
},
isProcessing(appSessionId: string): boolean {
return runs.get(appSessionId)?.status === 'running';
},
/**
* Re-attaches a run's outbound stream to a (new) websocket connection.
*
* This is the generic replacement for the Claude-only writer reconnect:
* after a page refresh the new socket subscribes and immediately starts
* receiving the still-running stream, for every provider.
*/
attachConnection(appSessionId: string, connection: RealtimeClientConnection): boolean {
const run = runs.get(appSessionId);
if (!run) {
return false;
}
run.writer.updateWebSocket(connection);
return true;
},
/**
* Returns buffered events with `seq` greater than `afterSeq` for replay.
*
* An empty array with `run.lastSeq > afterSeq` not covered by the buffer
* means the buffer was truncated; the client should refresh over REST.
*/
replayEvents(appSessionId: string, afterSeq: number): NormalizedMessage[] {
const run = runs.get(appSessionId);
if (!run) {
return [];
}
return run.events.filter((event) => typeof event.seq === 'number' && event.seq > afterSeq);
},
/**
* Emits a synthetic terminal `complete` if (and only if) the run is still
* marked running. Used when a provider runtime throws or resolves without
* having produced its own terminal event, and by the abort path.
*/
completeRun(appSessionId: string, opts: { exitCode: number; aborted?: boolean }): void {
const run = runs.get(appSessionId);
if (!run || run.status !== 'running') {
return;
}
run.writer.sendComplete(opts);
},
/**
* Test-only escape hatch: clears every tracked run.
*/
clearAll(): void {
runs.clear();
},
};

View File

@@ -0,0 +1,145 @@
import { WS_OPEN_STATE } from '@/modules/websocket/services/websocket-state.service.js';
import type {
LLMProvider,
NormalizedMessage,
RealtimeClientConnection,
} from '@/shared/types.js';
import { createCompleteMessage, readObjectRecord } from '@/shared/utils.js';
type ChatSessionWriterOptions = {
connection: RealtimeClientConnection;
userId: string | number | null;
provider: LLMProvider;
/** Provider-native id when resuming an existing session, otherwise null. */
providerSessionId: string | null;
/**
* Invoked the moment the provider runtime reveals its native session id
* (either via `setSessionId` or a `session_created` event). The registry
* persists the app-id-to-provider-id mapping from this callback.
*/
onProviderSessionId: (providerSessionId: string) => void;
/**
* Remaps/sequences/buffers one outbound live event. Implemented by the chat
* run registry; the writer never forwards a provider event untouched.
* Returns `null` when the event must be dropped (duplicate terminal
* `complete` after an abort already completed the run).
*/
decorateOutboundEvent: (message: NormalizedMessage) => NormalizedMessage | null;
};
/**
* Gateway writer handed to provider runtimes instead of a raw websocket writer.
*
* It exposes the exact same surface as `WebSocketWriter` (`send`,
* `setSessionId`, `getSessionId`, `updateWebSocket`, `userId`,
* `isWebSocketWriter`) so the provider runtimes (`claude-sdk.js`,
* `cursor-cli.js`, ...) need zero changes — but everything that flows through
* it is translated from the provider's world into the app's protocol:
*
* - `session_created` events are swallowed and turned into a provider-id
* mapping; the frontend never learns provider-native ids.
* - every other event gets `sessionId` remapped to the app session id and a
* per-run `seq` assigned before being forwarded.
* - `setSessionId(...)` calls (used by runtimes to label captured ids) are
* intercepted and recorded as the provider-id mapping as well.
*/
export class ChatSessionWriter {
ws: RealtimeClientConnection;
userId: string | number | null;
/**
* Some runtimes feature-detect their writer with this flag; keep it so the
* gateway writer is a drop-in replacement for `WebSocketWriter`.
*/
isWebSocketWriter = true;
private readonly options: ChatSessionWriterOptions;
/**
* The provider-native session id as the runtime knows it. Kept locally
* (besides the registry) because runtimes read it back via `getSessionId()`
* to label their own outgoing events — those labels are remapped on send
* anyway, but the runtime-visible value must stay provider-native.
*/
private providerSessionId: string | null;
constructor(options: ChatSessionWriterOptions) {
this.options = options;
this.ws = options.connection;
this.userId = options.userId;
this.providerSessionId = options.providerSessionId;
}
send(data: unknown): void {
const record = readObjectRecord(data);
if (!record || typeof record.kind !== 'string') {
// Provider runtimes only emit kind-based normalized messages. Anything
// else indicates a programming error; drop it rather than leaking an
// un-remapped payload to the client.
console.error('[ChatSessionWriter] Dropping non-normalized outbound payload', data);
return;
}
const message = record as NormalizedMessage;
if (message.kind === 'session_created') {
const announcedId =
typeof message.newSessionId === 'string' && message.newSessionId
? message.newSessionId
: message.sessionId;
if (announcedId) {
this.captureProviderSessionId(announcedId);
}
// Swallowed on purpose: the frontend already has the stable app session
// id, so there is no client-side handoff to perform anymore.
return;
}
const outbound = this.options.decorateOutboundEvent(message);
if (outbound) {
this.forward(outbound);
}
}
/**
* Emits the synthetic terminal `complete` for runs that ended without one
* (runtime crash before completing, or user abort).
*/
sendComplete(opts: { exitCode: number; aborted?: boolean }): void {
const message = createCompleteMessage({
provider: this.options.provider,
sessionId: this.providerSessionId,
exitCode: opts.exitCode,
aborted: opts.aborted,
});
const outbound = this.options.decorateOutboundEvent(message);
if (outbound) {
this.forward(outbound);
}
}
updateWebSocket(newConnection: RealtimeClientConnection): void {
this.ws = newConnection;
}
setSessionId(sessionId: string): void {
this.captureProviderSessionId(sessionId);
}
getSessionId(): string | null {
return this.providerSessionId;
}
private captureProviderSessionId(providerSessionId: string): void {
if (!providerSessionId || this.providerSessionId === providerSessionId) {
return;
}
this.providerSessionId = providerSessionId;
this.options.onProviderSessionId(providerSessionId);
}
private forward(message: NormalizedMessage): void {
if (this.ws.readyState === WS_OPEN_STATE) {
this.ws.send(JSON.stringify(message));
}
}
}

View File

@@ -1,40 +1,35 @@
import type { WebSocket } from 'ws';
import { connectedClients } from '@/modules/websocket/services/websocket-state.service.js';
import { WebSocketWriter } from '@/modules/websocket/services/websocket-writer.service.js';
import { sessionsDb } from '@/modules/database/index.js';
import { chatRunRegistry } from '@/modules/websocket/services/chat-run-registry.service.js';
import { connectedClients, WS_OPEN_STATE } from '@/modules/websocket/services/websocket-state.service.js';
import type {
AnyRecord,
AuthenticatedWebSocketRequest,
LLMProvider,
} from '@/shared/types.js';
import { createCompleteMessage, parseIncomingJsonObject } from '@/shared/utils.js';
import { parseIncomingJsonObject } from '@/shared/utils.js';
type ChatIncomingMessage = AnyRecord & {
type?: string;
command?: string;
options?: AnyRecord;
provider?: string;
sessionId?: string;
requestId?: string;
allow?: unknown;
updatedInput?: unknown;
message?: unknown;
rememberEntry?: unknown;
};
const DEFAULT_PROVIDER: LLMProvider = 'claude';
/**
* One provider runtime entry point. All five runtimes share this signature,
* which lets the chat handler dispatch through a provider-keyed map instead
* of provider-specific branches.
*/
type ProviderSpawnFn = (
command: string,
options: AnyRecord,
writer: unknown
) => Promise<unknown>;
type ChatWebSocketDependencies = {
queryClaudeSDK: (command: string, options: unknown, writer: WebSocketWriter) => Promise<unknown>;
spawnCursor: (command: string, options: unknown, writer: WebSocketWriter) => Promise<unknown>;
queryCodex: (command: string, options: unknown, writer: WebSocketWriter) => Promise<unknown>;
spawnGemini: (command: string, options: unknown, writer: WebSocketWriter) => Promise<unknown>;
spawnOpenCode: (command: string, options: unknown, writer: WebSocketWriter) => Promise<unknown>;
abortClaudeSDKSession: (sessionId: string) => Promise<boolean>;
abortCursorSession: (sessionId: string) => boolean;
abortCodexSession: (sessionId: string) => boolean;
abortGeminiSession: (sessionId: string) => boolean;
abortOpenCodeSession: (sessionId: string) => boolean;
/** Provider runtimes keyed by provider id. */
spawnFns: Record<LLMProvider, ProviderSpawnFn>;
/**
* Abort functions keyed by provider id. They are addressed with the
* provider-native session id (that is how runtimes key their process maps).
* The Claude abort is async; the rest are sync — both shapes are accepted.
*/
abortFns: Record<LLMProvider, (providerSessionId: string) => boolean | Promise<boolean>>;
resolveToolApproval: (
requestId: string,
payload: {
@@ -44,31 +39,10 @@ type ChatWebSocketDependencies = {
rememberEntry?: unknown;
}
) => void;
isClaudeSDKSessionActive: (sessionId: string) => boolean;
isCursorSessionActive: (sessionId: string) => boolean;
isCodexSessionActive: (sessionId: string) => boolean;
isGeminiSessionActive: (sessionId: string) => boolean;
isOpenCodeSessionActive: (sessionId: string) => boolean;
reconnectSessionWriter: (sessionId: string, ws: WebSocket) => boolean;
getPendingApprovalsForSession: (sessionId: string) => unknown[];
getActiveClaudeSDKSessions: () => unknown;
getActiveCursorSessions: () => unknown;
getActiveCodexSessions: () => unknown;
getActiveGeminiSessions: () => unknown;
getActiveOpenCodeSessions: () => unknown;
/** Claude-only today: pending tool approvals included in `chat_subscribed`. */
getPendingApprovalsForSession: (providerSessionId: string) => unknown[];
};
/**
* Normalizes potentially invalid provider names coming from websocket payloads.
*/
function readProvider(value: unknown): LLMProvider {
if (value === 'claude' || value === 'cursor' || value === 'codex' || value === 'gemini' || value === 'opencode') {
return value;
}
return DEFAULT_PROVIDER;
}
/**
* Extracts the authenticated request user id in the formats currently produced
* by platform and OSS auth code paths.
@@ -92,8 +66,258 @@ function readRequestUserId(
return null;
}
function sendJson(ws: WebSocket, payload: unknown): void {
if (ws.readyState === WS_OPEN_STATE) {
ws.send(JSON.stringify(payload));
}
}
/**
* Reports a protocol-level failure to the requesting client.
*
* Protocol errors deliberately use their own `kind` (instead of the provider
* `error` message kind) so the frontend can distinguish "your request was
* invalid" from "the model run produced an error" without inspecting text.
*/
function sendProtocolError(
ws: WebSocket,
code: string,
error: string,
sessionId?: string
): void {
sendJson(ws, {
kind: 'protocol_error',
code,
error,
sessionId: sessionId ?? null,
timestamp: new Date().toISOString(),
});
}
function readRequiredSessionId(data: AnyRecord): string | null {
const sessionId = typeof data.sessionId === 'string' ? data.sessionId.trim() : '';
return sessionId.length > 0 ? sessionId : null;
}
/**
* Handles `chat.send`: resolves the session row (provider, project path, and
* provider-native id all come from the database — never from the client),
* registers the run, and dispatches to the provider runtime.
*/
async function handleChatSend(
ws: WebSocket,
userId: string | number | null,
data: AnyRecord,
dependencies: ChatWebSocketDependencies
): Promise<void> {
const sessionId = readRequiredSessionId(data);
if (!sessionId) {
sendProtocolError(ws, 'SESSION_ID_REQUIRED', 'chat.send requires a sessionId.');
return;
}
const session = sessionsDb.getSessionById(sessionId);
if (!session) {
sendProtocolError(
ws,
'SESSION_NOT_FOUND',
`Session "${sessionId}" was not found. Create it via POST /api/providers/sessions first.`,
sessionId
);
return;
}
const provider = session.provider as LLMProvider;
const spawnFn = dependencies.spawnFns[provider];
if (!spawnFn) {
sendProtocolError(ws, 'UNSUPPORTED_PROVIDER', `Provider "${provider}" is not available.`, sessionId);
return;
}
const run = chatRunRegistry.startRun({
appSessionId: sessionId,
provider,
providerSessionId: session.provider_session_id,
connection: ws,
userId,
});
if (!run) {
sendProtocolError(
ws,
'RUN_IN_PROGRESS',
`Session "${sessionId}" already has a run in progress.`,
sessionId
);
return;
}
const clientOptions = (data.options ?? {}) as AnyRecord;
const command = typeof data.content === 'string' ? data.content : '';
// The provider runtimes receive the provider-native session id (that is the
// id their CLI/SDK understands for resume). Brand-new sessions have no
// provider id yet, so the runtime starts fresh and announces one, which the
// gateway writer captures and maps back to the app session id.
const runtimeOptions: AnyRecord = {
...clientOptions,
sessionId: session.provider_session_id ?? undefined,
resume: Boolean(session.provider_session_id),
cwd: clientOptions.cwd ?? session.project_path ?? undefined,
projectPath: session.project_path ?? clientOptions.projectPath,
};
try {
await spawnFn(command, runtimeOptions, run.writer);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error(`[Chat] Provider runtime "${provider}" failed`, { sessionId, error: message });
} finally {
// Safety net: a runtime that crashed (or resolved) without emitting its
// terminal `complete` would otherwise leave the session stuck in
// "processing" forever on every connected client.
chatRunRegistry.completeRun(sessionId, { exitCode: 1 });
}
}
/**
* Handles `chat.abort`: cancels the run for one app session and emits the
* terminal `complete` on its behalf (runtimes skip their own complete for
* aborted runs, and the registry drops any duplicate).
*/
async function handleChatAbort(
ws: WebSocket,
data: AnyRecord,
dependencies: ChatWebSocketDependencies
): Promise<void> {
const sessionId = readRequiredSessionId(data);
if (!sessionId) {
sendProtocolError(ws, 'SESSION_ID_REQUIRED', 'chat.abort requires a sessionId.');
return;
}
const run = chatRunRegistry.getRun(sessionId);
if (!run || run.status !== 'running') {
sendProtocolError(ws, 'NO_ACTIVE_RUN', `Session "${sessionId}" has no active run.`, sessionId);
return;
}
const abortFn = dependencies.abortFns[run.provider];
let success = false;
if (abortFn && run.providerSessionId) {
success = Boolean(await abortFn(run.providerSessionId));
}
chatRunRegistry.completeRun(sessionId, {
exitCode: success ? 0 : 1,
aborted: true,
});
}
/**
* Handles `chat.subscribe`: for each requested session, reports whether a run
* is processing, re-attaches the live stream to this socket, replays missed
* events (seq > lastSeq), and includes pending permission requests.
*
* This single message replaces the old `check-session-status`,
* `get-pending-permissions`, and Claude-only writer reconnect flows.
*/
function handleChatSubscribe(
ws: WebSocket,
data: AnyRecord,
dependencies: ChatWebSocketDependencies
): void {
const targets = Array.isArray(data.sessions) ? data.sessions : [];
for (const target of targets) {
if (!target || typeof target !== 'object') {
continue;
}
const sessionId = typeof (target as AnyRecord).sessionId === 'string'
? ((target as AnyRecord).sessionId as string).trim()
: '';
if (!sessionId) {
continue;
}
const lastSeqRaw = (target as AnyRecord).lastSeq;
const lastSeq = typeof lastSeqRaw === 'number' && Number.isFinite(lastSeqRaw)
? Math.max(0, Math.floor(lastSeqRaw))
: 0;
const run = chatRunRegistry.getRun(sessionId);
const isProcessing = chatRunRegistry.isProcessing(sessionId);
// Future live events for this run should land on the socket that asked —
// this is what makes mid-stream page refreshes work for all providers.
if (isProcessing) {
chatRunRegistry.attachConnection(sessionId, ws);
}
// Pending approvals are tracked under the provider-native id inside the
// Claude runtime; remap their sessionId so the client only sees app ids.
const pendingPermissions = (run?.providerSessionId
? dependencies.getPendingApprovalsForSession(run.providerSessionId)
: []
).map((approval) =>
approval && typeof approval === 'object'
? { ...(approval as AnyRecord), sessionId }
: approval,
);
sendJson(ws, {
kind: 'chat_subscribed',
sessionId,
isProcessing,
lastSeq: run?.lastSeq ?? 0,
pendingPermissions,
timestamp: new Date().toISOString(),
});
// Replay only for RUNNING runs, strictly after the ack. Completed runs
// are fully persisted to the provider transcript and served over REST —
// replaying them (e.g. after a page reload where the client's lastSeq is
// 0) would duplicate messages the history fetch already returned.
if (isProcessing) {
for (const event of chatRunRegistry.replayEvents(sessionId, lastSeq)) {
sendJson(ws, event);
}
}
}
}
/**
* Handles `chat.permission-response`: forwards a tool-approval decision to the
* pending approval resolver (Claude is the only provider with interactive
* approvals today, but the message is intentionally provider-neutral).
*/
function handlePermissionResponse(data: AnyRecord, dependencies: ChatWebSocketDependencies): void {
if (typeof data.requestId !== 'string' || data.requestId.length === 0) {
return;
}
dependencies.resolveToolApproval(data.requestId, {
allow: Boolean(data.allow),
updatedInput: data.updatedInput,
message: typeof data.message === 'string' ? data.message : undefined,
rememberEntry: data.rememberEntry,
});
}
/**
* Handles authenticated chat websocket messages used by the main chat panel.
*
* Inbound protocol (client to server):
* - `chat.send` { sessionId, content, options? }
* - `chat.abort` { sessionId }
* - `chat.subscribe` { sessions: [{ sessionId, lastSeq? }] }
* - `chat.permission-response` { requestId, allow, updatedInput?, message?, rememberEntry? }
*
* Outbound protocol (server to client): every frame is `kind`-based — either
* a provider `NormalizedMessage` (with `seq`) or a gateway event
* (`chat_subscribed`, `session_upserted`, `loading_progress`,
* `protocol_error`).
*/
export function handleChatConnection(
ws: WebSocket,
@@ -103,7 +327,7 @@ export function handleChatConnection(
console.log('[INFO] Chat WebSocket connected');
connectedClients.add(ws);
const writer = new WebSocketWriter(ws, readRequestUserId(request));
const userId = readRequestUserId(request);
ws.on('message', async (rawMessage) => {
try {
@@ -112,167 +336,30 @@ export function handleChatConnection(
throw new Error('Invalid websocket payload');
}
const data = parsed as ChatIncomingMessage;
const messageType = data.type;
if (!messageType) {
throw new Error('Message type is required');
}
const data = parsed as AnyRecord;
const messageType = typeof data.type === 'string' ? data.type : '';
if (messageType === 'claude-command') {
await dependencies.queryClaudeSDK(data.command ?? '', data.options, writer);
return;
}
if (messageType === 'cursor-command') {
await dependencies.spawnCursor(data.command ?? '', data.options, writer);
return;
}
if (messageType === 'codex-command') {
await dependencies.queryCodex(data.command ?? '', data.options, writer);
return;
}
if (messageType === 'gemini-command') {
await dependencies.spawnGemini(data.command ?? '', data.options, writer);
return;
}
if (messageType === 'opencode-command') {
await dependencies.spawnOpenCode(data.command ?? '', data.options, writer);
return;
}
if (messageType === 'cursor-resume') {
await dependencies.spawnCursor(
'',
{
sessionId: data.sessionId,
resume: true,
cwd: data.options?.cwd,
},
writer
);
return;
}
if (messageType === 'abort-session') {
const provider = readProvider(data.provider);
const sessionId = typeof data.sessionId === 'string' ? data.sessionId : '';
let success = false;
if (provider === 'cursor') {
success = dependencies.abortCursorSession(sessionId);
} else if (provider === 'codex') {
success = dependencies.abortCodexSession(sessionId);
} else if (provider === 'gemini') {
success = dependencies.abortGeminiSession(sessionId);
} else if (provider === 'opencode') {
success = dependencies.abortOpenCodeSession(sessionId);
} else {
success = await dependencies.abortClaudeSDKSession(sessionId);
}
// Terminal complete on behalf of the cancelled run — providers skip
// their own complete for aborted runs so the client sees exactly one.
writer.send(
createCompleteMessage({
provider,
sessionId,
exitCode: success ? 0 : 1,
aborted: true,
})
);
return;
}
if (messageType === 'claude-permission-response') {
if (typeof data.requestId === 'string' && data.requestId.length > 0) {
dependencies.resolveToolApproval(data.requestId, {
allow: Boolean(data.allow),
updatedInput: data.updatedInput,
message: typeof data.message === 'string' ? data.message : undefined,
rememberEntry: data.rememberEntry,
});
}
return;
}
if (messageType === 'cursor-abort') {
const sessionId = typeof data.sessionId === 'string' ? data.sessionId : '';
const success = dependencies.abortCursorSession(sessionId);
writer.send(
createCompleteMessage({
provider: 'cursor',
sessionId,
exitCode: success ? 0 : 1,
aborted: true,
})
);
return;
}
if (messageType === 'check-session-status') {
const provider = readProvider(data.provider);
const sessionId = typeof data.sessionId === 'string' ? data.sessionId : '';
let isActive = false;
if (provider === 'cursor') {
isActive = dependencies.isCursorSessionActive(sessionId);
} else if (provider === 'codex') {
isActive = dependencies.isCodexSessionActive(sessionId);
} else if (provider === 'gemini') {
isActive = dependencies.isGeminiSessionActive(sessionId);
} else if (provider === 'opencode') {
isActive = dependencies.isOpenCodeSessionActive(sessionId);
} else {
isActive = dependencies.isClaudeSDKSessionActive(sessionId);
if (isActive) {
dependencies.reconnectSessionWriter(sessionId, ws);
}
}
writer.send({
type: 'session-status',
sessionId,
provider,
isProcessing: isActive,
});
return;
}
if (messageType === 'get-pending-permissions') {
const sessionId = typeof data.sessionId === 'string' ? data.sessionId : '';
if (sessionId && dependencies.isClaudeSDKSessionActive(sessionId)) {
const pending = dependencies.getPendingApprovalsForSession(sessionId);
writer.send({
type: 'pending-permissions-response',
sessionId,
data: pending,
});
}
return;
}
if (messageType === 'get-active-sessions') {
writer.send({
type: 'active-sessions',
sessions: {
claude: dependencies.getActiveClaudeSDKSessions(),
cursor: dependencies.getActiveCursorSessions(),
codex: dependencies.getActiveCodexSessions(),
gemini: dependencies.getActiveGeminiSessions(),
opencode: dependencies.getActiveOpenCodeSessions(),
},
});
switch (messageType) {
case 'chat.send':
await handleChatSend(ws, userId, data, dependencies);
return;
case 'chat.abort':
await handleChatAbort(ws, data, dependencies);
return;
case 'chat.subscribe':
handleChatSubscribe(ws, data, dependencies);
return;
case 'chat.permission-response':
handlePermissionResponse(data, dependencies);
return;
default:
sendProtocolError(ws, 'UNKNOWN_MESSAGE_TYPE', `Unknown message type "${messageType}".`);
return;
}
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error('[ERROR] Chat WebSocket error:', message);
writer.send({
type: 'error',
error: message,
});
sendProtocolError(ws, 'INTERNAL_ERROR', message);
}
});

View File

@@ -0,0 +1,207 @@
import assert from 'node:assert/strict';
import { mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import test from 'node:test';
import { closeConnection, initializeDatabase, sessionsDb } from '@/modules/database/index.js';
import { chatRunRegistry } from '@/modules/websocket/services/chat-run-registry.service.js';
import type { NormalizedMessage } from '@/shared/types.js';
/**
* Minimal stand-in for a websocket connection: collects every JSON frame the
* gateway writer forwards so assertions can inspect the outbound protocol.
*/
class FakeConnection {
readyState = 1; // WS_OPEN_STATE
frames: NormalizedMessage[] = [];
send(data: string): void {
this.frames.push(JSON.parse(data) as NormalizedMessage);
}
}
async function withIsolatedDatabase(runTest: () => void | Promise<void>): Promise<void> {
const previousDatabasePath = process.env.DATABASE_PATH;
const tempDirectory = await mkdtemp(path.join(tmpdir(), 'chat-run-registry-'));
const databasePath = path.join(tempDirectory, 'auth.db');
closeConnection();
process.env.DATABASE_PATH = databasePath;
await initializeDatabase();
try {
await runTest();
} finally {
chatRunRegistry.clearAll();
closeConnection();
if (previousDatabasePath === undefined) {
delete process.env.DATABASE_PATH;
} else {
process.env.DATABASE_PATH = previousDatabasePath;
}
await rm(tempDirectory, { recursive: true, force: true });
}
}
test('live events are remapped to the app session id and sequenced', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-1', 'claude', '/workspace/demo');
const connection = new FakeConnection();
const run = chatRunRegistry.startRun({
appSessionId: 'app-run-1',
provider: 'claude',
providerSessionId: null,
connection,
userId: 'user-1',
});
assert.ok(run);
run.writer.send({ kind: 'stream_delta', provider: 'claude', sessionId: 'provider-id-9', content: 'hello' });
run.writer.send({ kind: 'text', provider: 'claude', sessionId: 'provider-id-9', content: 'hello world' });
assert.equal(connection.frames.length, 2);
assert.equal(connection.frames[0]?.sessionId, 'app-run-1');
assert.equal(connection.frames[0]?.seq, 1);
assert.equal(connection.frames[1]?.sessionId, 'app-run-1');
assert.equal(connection.frames[1]?.seq, 2);
});
});
test('session_created is swallowed and persisted as the provider-id mapping', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-2', 'cursor', '/workspace/demo');
const connection = new FakeConnection();
const run = chatRunRegistry.startRun({
appSessionId: 'app-run-2',
provider: 'cursor',
providerSessionId: null,
connection,
userId: null,
});
assert.ok(run);
run.writer.send({
kind: 'session_created',
provider: 'cursor',
sessionId: 'cursor-native-7',
newSessionId: 'cursor-native-7',
});
// Never forwarded to the client...
assert.equal(connection.frames.length, 0);
// ...but recorded in the registry and persisted in the database.
assert.equal(run.providerSessionId, 'cursor-native-7');
assert.equal(sessionsDb.getSessionById('app-run-2')?.provider_session_id, 'cursor-native-7');
});
});
test('complete marks the run finished and duplicate completes are dropped', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-3', 'codex', '/workspace/demo');
const connection = new FakeConnection();
const run = chatRunRegistry.startRun({
appSessionId: 'app-run-3',
provider: 'codex',
providerSessionId: null,
connection,
userId: null,
});
assert.ok(run);
run.writer.send({ kind: 'complete', provider: 'codex', sessionId: 'native-3', exitCode: 0 });
// Late duplicate from a killed runtime's exit handler.
run.writer.send({ kind: 'complete', provider: 'codex', sessionId: 'native-3', exitCode: 1 });
const completes = connection.frames.filter((frame) => frame.kind === 'complete');
assert.equal(completes.length, 1);
assert.equal(completes[0]?.actualSessionId, 'app-run-3');
assert.equal(chatRunRegistry.isProcessing('app-run-3'), false);
// completeRun is also a no-op once the run already completed.
chatRunRegistry.completeRun('app-run-3', { exitCode: 1 });
assert.equal(connection.frames.filter((frame) => frame.kind === 'complete').length, 1);
});
});
test('replayEvents returns only events after the requested seq', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-4', 'claude', '/workspace/demo');
const connection = new FakeConnection();
const run = chatRunRegistry.startRun({
appSessionId: 'app-run-4',
provider: 'claude',
providerSessionId: null,
connection,
userId: null,
});
assert.ok(run);
run.writer.send({ kind: 'stream_delta', provider: 'claude', sessionId: 'x', content: 'a' });
run.writer.send({ kind: 'stream_delta', provider: 'claude', sessionId: 'x', content: 'b' });
run.writer.send({ kind: 'stream_delta', provider: 'claude', sessionId: 'x', content: 'c' });
const replayed = chatRunRegistry.replayEvents('app-run-4', 1);
assert.deepEqual(replayed.map((event) => event.content), ['b', 'c']);
assert.deepEqual(replayed.map((event) => event.seq), [2, 3]);
});
});
test('attachConnection reroutes the live stream to a new socket', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-5', 'gemini', '/workspace/demo');
const firstConnection = new FakeConnection();
const run = chatRunRegistry.startRun({
appSessionId: 'app-run-5',
provider: 'gemini',
providerSessionId: null,
connection: firstConnection,
userId: null,
});
assert.ok(run);
run.writer.send({ kind: 'stream_delta', provider: 'gemini', sessionId: 'g', content: 'before' });
const secondConnection = new FakeConnection();
assert.equal(chatRunRegistry.attachConnection('app-run-5', secondConnection), true);
run.writer.send({ kind: 'stream_delta', provider: 'gemini', sessionId: 'g', content: 'after' });
assert.deepEqual(firstConnection.frames.map((frame) => frame.content), ['before']);
assert.deepEqual(secondConnection.frames.map((frame) => frame.content), ['after']);
});
});
test('startRun rejects a second concurrent run for the same session', async () => {
await withIsolatedDatabase(() => {
sessionsDb.createAppSession('app-run-6', 'opencode', '/workspace/demo');
const connection = new FakeConnection();
const first = chatRunRegistry.startRun({
appSessionId: 'app-run-6',
provider: 'opencode',
providerSessionId: null,
connection,
userId: null,
});
assert.ok(first);
const second = chatRunRegistry.startRun({
appSessionId: 'app-run-6',
provider: 'opencode',
providerSessionId: null,
connection,
userId: null,
});
assert.equal(second, null);
// After the run finishes a new one is allowed again.
chatRunRegistry.completeRun('app-run-6', { exitCode: 0 });
const third = chatRunRegistry.startRun({
appSessionId: 'app-run-6',
provider: 'opencode',
providerSessionId: null,
connection,
userId: null,
});
assert.ok(third);
});
});