feat(chat): unify session gateway with stable IDs and a single WS protocol

The frontend previously juggled placeholder IDs, provider-native IDs, and session_created handoffs, which caused race conditions and provider-specific branching. This introduces app-allocated session IDs, a chat run registry with event replay, delta sidebar updates, and one kind-based websocket contract so the UI can treat every provider the same while JSONL remains the source of truth.
This commit is contained in:
Haileyesus
2026-06-11 18:47:19 +03:00
parent 3d948217ef
commit f5eac2ec12
40 changed files with 2451 additions and 1226 deletions

View File

@@ -33,10 +33,12 @@ Benefits:
|---|---|
| `services/websocket-server.service.ts` | Creates `WebSocketServer`, binds `verifyClient`, routes connection by pathname |
| `services/websocket-auth.service.ts` | Authenticates upgrade requests and attaches `request.user` |
| `services/chat-websocket.service.ts` | Handles `/ws` chat protocol and provider command/session control messages |
| `services/chat-websocket.service.ts` | Handles the `/ws` chat protocol (`chat.send` / `chat.abort` / `chat.subscribe` / `chat.permission-response`) |
| `services/chat-run-registry.service.ts` | Tracks live provider runs per app session id: seq numbering, event replay buffer, provider-id mapping, completion state |
| `services/chat-session-writer.service.ts` | Gateway writer handed to provider runtimes: remaps provider session ids to app ids, swallows `session_created`, assigns `seq` |
| `services/shell-websocket.service.ts` | Handles `/shell` PTY lifecycle, reconnect buffering, auth URL detection |
| `services/plugin-websocket-proxy.service.ts` | Bridges client socket to plugin socket |
| `services/websocket-writer.service.ts` | Adapts raw WebSocket to writer interface (`send`, `setSessionId`, `getSessionId`) |
| `services/websocket-writer.service.ts` | Adapts raw WebSocket to writer interface (`send`, `setSessionId`, `getSessionId`) for non-chat writer consumers |
| `services/websocket-state.service.ts` | Holds shared chat client set and open-state constant |
## High-Level Architecture
@@ -52,12 +54,12 @@ flowchart LR
D -->|other| H[close()]
E --> I[connectedClients Set]
E --> J[WebSocketWriter]
E --> J[chatRunRegistry + ChatSessionWriter]
F --> K[ptySessionsMap]
G --> L[Upstream Plugin ws://127.0.0.1:port/ws]
I --> M[projects.service broadcastProgress]
I --> N[sessions-watcher.service projects_updated]
I --> M[projects.service loading_progress]
I --> N[sessions-watcher.service session_upserted]
```
## Connection Handshake + Routing
@@ -105,38 +107,41 @@ sequenceDiagram
When a chat socket connects:
1. Add socket to `connectedClients`.
2. Build `WebSocketWriter` (captures `userId` from authenticated request).
3. Parse each incoming message with `parseIncomingJsonObject`.
4. Dispatch by `data.type`.
5. On close, remove socket from `connectedClients`.
2. Parse each incoming message with `parseIncomingJsonObject`.
3. Dispatch by `data.type` (four message types, none provider-specific).
4. On close, remove socket from `connectedClients`.
### Session identity model
The frontend only ever knows the **app session id** (allocated by
`POST /api/providers/sessions` or discovered via the session index). The
provider-native id (JSONL file name, CLI resume id) stays inside the backend:
1. `chat.send` resolves the app id to `{ provider, provider_session_id, project_path }` from the sessions DB.
2. The provider runtime receives the provider-native id for resume.
3. The `ChatSessionWriter` remaps every outbound event back to the app id, and turns `session_created` announcements into a DB mapping update instead of forwarding them.
### Chat Message Dispatch
```mermaid
flowchart TD
A[Incoming WS message] --> B[parseIncomingJsonObject]
B -->|invalid| C[send {type:error}]
B -->|invalid| C[send kind:protocol_error]
B -->|ok| D{data.type}
D -->|claude-command| E[queryClaudeSDK]
D -->|cursor-command| F[spawnCursor]
D -->|codex-command| G[queryCodex]
D -->|gemini-command| H[spawnGemini]
D -->|cursor-resume| I[spawnCursor resume]
D -->|abort-session| J[abort by provider]
D -->|claude-permission-response| K[resolveToolApproval]
D -->|cursor-abort| L[abortCursorSession]
D -->|check-session-status| M[is*SessionActive + optional reconnectSessionWriter]
D -->|get-pending-permissions| N[getPendingApprovalsForSession]
D -->|get-active-sessions| O[getActive*Sessions]
D -->|chat.send| E[resolve session row -> startRun -> spawnFns provider]
D -->|chat.abort| F[abortFns provider + synthetic complete]
D -->|chat.subscribe| G[chat_subscribed ack + attach socket + replay events seq > lastSeq]
D -->|chat.permission-response| H[resolveToolApproval]
D -->|other| I[send kind:protocol_error]
```
### Chat Notes
1. **Unified terminal lifecycle**: every provider run ends with exactly one `complete` message built by `createCompleteMessage()` (`server/shared/utils.ts`), regardless of provider: `{ kind: "complete", sessionId, actualSessionId, exitCode, success, aborted }`. Failed runs emit an informational `error` message first, then the terminal `complete` with `success: false`. Mid-run `error` messages (e.g. stderr output) are non-terminal; the frontend only treats `complete` as end-of-run.
2. `abort-session` sends the terminal `complete` (`aborted: true`) on behalf of the cancelled run; providers detect the abort and skip their own `complete` so the client sees exactly one.
3. `check-session-status` returns `{ type: "session-status", isProcessing }`.
4. Claude status checks can reconnect output stream to the new socket via `reconnectSessionWriter`.
1. **Unified envelope**: every server-to-client frame carries a `kind` — either a provider `NormalizedMessage` kind or a gateway kind (`chat_subscribed`, `session_upserted`, `loading_progress`, `protocol_error`). There is no second `type`-based protocol.
2. **Unified terminal lifecycle**: every provider run ends with exactly one `complete` message built by `createCompleteMessage()` (`server/shared/utils.ts`): `{ kind: "complete", sessionId, actualSessionId, exitCode, success, aborted }`. The chat handler emits a synthetic `complete` for runs that crash or get aborted, and the run registry drops duplicate completes.
3. **Per-run event log**: every live event gets a monotonically increasing `seq`. `chat.subscribe { sessions: [{ sessionId, lastSeq }] }` re-attaches the live stream to the requesting socket (any provider, not just Claude) and replays events with `seq > lastSeq`. If the buffer no longer covers `lastSeq`, the client refreshes over REST.
4. `chat_subscribed` includes `isProcessing` (replaces `check-session-status`) and `pendingPermissions` (replaces `get-pending-permissions`).
## `/shell` Terminal Flow
@@ -224,9 +229,9 @@ Only chat sockets (`/ws`) are tracked in `connectedClients`.
That shared set is consumed by:
1. `modules/projects/services/projects-with-sessions-fetch.service.ts`
Broadcasts `loading_progress` while project snapshots are being built.
Broadcasts `kind: loading_progress` while project snapshots are being built.
2. `modules/providers/services/sessions-watcher.service.ts`
Broadcasts `projects_updated` when provider session artifacts change.
Broadcasts per-session `kind: session_upserted` deltas when provider session artifacts change (no full project snapshots).
This design centralizes cross-module realtime fanout without requiring route-local references to WebSocket internals.
@@ -253,7 +258,7 @@ Current explicit close codes in this module:
Other errors:
1. Chat handler catches and emits `{ type: "error", error }`.
1. Chat handler catches and emits `{ kind: "protocol_error", code, error }`.
2. Shell handler catches and writes terminal-visible error output.
3. Unknown websocket paths are closed immediately.