chat

package
v0.0.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Package chat provides the LLM-backed chat endpoint that powers the "Ask about your jobs" surface in the SPA. It is deliberately small: one HTTP endpoint, one LLM provider (Anthropic), a tool registry scoped to the authenticated user, and a streaming protocol that matches what Vercel's AI SDK `useChat()` hook expects on the wire.

The package is shaped around a single Engine value the HTTP handler constructs once and reuses for every request. Engine.Stream takes a fully-formed Request (incoming messages + per-tool-call results the frontend has already settled) and returns an event stream of AI-SDK protocol parts that the handler relays to the browser verbatim. The Engine never reads the *http.Request directly — the handler does that and converts to a chat.Request, so this package stays independent of the HTTP server's session machinery.

Index

Constants

View Source
const AnthropicAPIVersion = "2023-06-01"

AnthropicAPIVersion pins the Anthropic Messages API version we negotiate. Anthropic dates major changes; bumping this is an explicit operator decision.

View Source
const DefaultAnthropicModel = "claude-sonnet-4-5"

DefaultAnthropicModel is the model used when the chat request doesn't override it. Sonnet is the standard "good enough at reasoning, cheap enough at scale" middle.

View Source
const DefaultAnthropicURL = "https://api.anthropic.com/v1/messages"

DefaultAnthropicURL is the upstream endpoint when the operator hasn't pointed HTTP_API_LLM_API_URL at a proxy.

View Source
const MaxTokens = 4096

MaxTokens caps how much the LLM can emit per turn. Tool-heavy flows can take a few thousand tokens of reasoning + tool args; 4096 is comfortable headroom while still bounding worst-case spend.

Variables

This section is empty.

Functions

This section is empty.

Types

type AnthropicClient

type AnthropicClient struct {
	// contains filtered or unexported fields
}

AnthropicClient is the minimal HTTP wrapper around Anthropic's Messages API. We hand-roll instead of importing a Go SDK because the surface we use is small (one endpoint, streaming) and a fresh dep adds non-trivial code to vet.

func NewAnthropicClient

func NewAnthropicClient(cfg AnthropicConfig) (*AnthropicClient, error)

NewAnthropicClient loads the API key from disk and returns a ready client. Returns nil, nil when no key is configured — that's the "chat disabled" path; callers must check.

Refuses to read the key file with mode bits set for group or other (`stat --printf=%a key` should be `600` or `400`). Same security posture as seal.LoadMasterKEKFromFile.

func (*AnthropicClient) Model

func (c *AnthropicClient) Model() string

Model returns the default model id this client was configured with. Useful in handler logs.

func (*AnthropicClient) Stream

func (c *AnthropicClient) Stream(ctx context.Context, system string, msgs []AnthropicMessage, tools []AnthropicTool, model string) <-chan StreamEvent

Stream POSTs the conversation to Anthropic with stream=true, parses the SSE event stream, and emits a flattened sequence of StreamEvent values on the returned channel. The channel is closed when the upstream connection closes; consumers typically iterate with `for ev := range ch`.

Cancel by canceling the supplied context — the underlying HTTP request is cancelable and the read loop honors ctx.Done.

Error handling: terminal errors (HTTP error from Anthropic, parse failure, network drop) are emitted as a final {Kind:"error"} event before the channel closes, never returned out-of-band. Callers don't need a separate err-check codepath.

func (*AnthropicClient) URL

func (c *AnthropicClient) URL() string

URL returns the configured endpoint URL (after defaults applied). Useful in handler logs to confirm which upstream is in use.

type AnthropicConfig

type AnthropicConfig struct {
	// APIKeyFile is the path the chat handler reads on every server
	// start to load the Anthropic API key. Mode 0600/0400 enforced.
	APIKeyFile string

	// URL overrides the upstream endpoint. Use for self-hosted proxies
	// (e.g. an LLM gateway with audit logging or a cache). Empty = use
	// DefaultAnthropicURL.
	URL string

	// Model overrides the default Anthropic model identifier. Empty =
	// DefaultAnthropicModel.
	Model string

	// HTTPClient is the http.Client to dispatch with. nil = a stock
	// client with a 60-second per-request timeout.
	HTTPClient *http.Client
}

AnthropicConfig configures the client. APIKeyFile is the on-disk path to a file holding the API key — the bytes are never expected to live in HTCondor config (which is publicly readable on the host). Empty APIKey + empty APIKeyFile = chat disabled.

type AnthropicContentBlock

type AnthropicContentBlock struct {
	Type      string          `json:"type"`
	Text      string          `json:"text,omitempty"`
	ID        string          `json:"id,omitempty"`
	Name      string          `json:"name,omitempty"`
	Input     json.RawMessage `json:"input,omitempty"`
	ToolUseID string          `json:"tool_use_id,omitempty"`
	Content   string          `json:"content,omitempty"`
	IsError   bool            `json:"is_error,omitempty"`
}

AnthropicContentBlock is one element inside a message's content array. Type discriminates the union:

  • "text" → Text is set
  • "tool_use" → ID, Name, Input set (assistant proposes a tool call)
  • "tool_result" → ToolUseID, Content (string) set (user replies with the result)

type AnthropicMessage

type AnthropicMessage struct {
	Role    string                  `json:"role"` // "user" or "assistant"
	Content []AnthropicContentBlock `json:"content"`
}

AnthropicMessage is one turn in the conversation. Content is polymorphic — for text-only turns it's a single TextBlock; turns that carry tool results from the previous round include ToolResultBlock entries; assistant turns can carry ToolUseBlock.

We expose the typed shape rather than json.RawMessage so the engine can construct messages without per-call JSON gymnastics.

type AnthropicTool

type AnthropicTool struct {
	Name        string          `json:"name"`
	Description string          `json:"description"`
	InputSchema json.RawMessage `json:"input_schema"`
}

AnthropicTool is the tool schema we hand the model. InputSchema is a JSON Schema describing the tool's arguments — we let callers build it once per tool registry construction.

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine drives one chat turn end-to-end. Construct with NewEngine once at startup and reuse — it's safe for concurrent calls because every Stream invocation owns its own state.

The engine knows nothing about specific pages; it just filters tools and looks up a system-prompt suffix by the page string the SPA sends. Adding a new page is purely additive: register the new tools (tagged with the page) and add a key to the pageInstructions map.

func NewEngine

func NewEngine(
	client *AnthropicClient,
	tools []Tool,
	pageInstructions map[string]string,
	operatorAddendum string,
) *Engine

NewEngine builds an engine over the supplied LLM client and tool registry.

pageInstructions maps an SPA page identifier (e.g. "jobs", "submit") to a chunk of system-prompt text that's appended to the base prompt whenever a request arrives with Page=<that key>. Pages NOT in this map are still allowed (the SPA can send anything) — they just get the base prompt with no page-specific guidance, plus only those tools that are universally tagged. Operators looking to add a third page should register tools and add an entry here together.

operatorAddendum is freeform text supplied by the server admin via configuration. It's appended to the system prompt on every turn, regardless of page. Use it for site-specific rules ("we don't allow LongJob requests over 24h", "our preferred default is Singularity") — anything you'd hand to an in-house user-help operator.

func (*Engine) Stream

func (e *Engine) Stream(ctx context.Context, w *Writer, actor string, req Request) error

Stream runs one chat turn, possibly looping internally to dispatch server-side tool calls until the assistant produces a final text reply or hands off a client-side tool. Output is a sequence of AI-SDK protocol parts written via the Writer.

`actor` is the authenticated username; tools use it for owner scoping. The engine never lets a tool see anything else.

Stream returns when the assistant's turn ends OR when a client-side tool needs the browser's help (the browser's next POST will include the resolved tool_result and the conversation continues).

func (*Engine) Tool

func (e *Engine) Tool(name string) Tool

Tool returns a tool by name, or nil. Used by the HTTP handler so it can recognize a confirmation continuation pointing at a tool the engine knows about.

type Request

type Request struct {
	Messages              []RequestMessage `json:"messages"`
	PreApprovedToolUseIDs []string         `json:"approved_tool_use_ids,omitempty"`
	// AutoApprove names tool kinds the user has flagged as "always
	// run without prompting" (e.g. ["release","hold"]). Same shape
	// as Tool.Name(); the engine checks membership before honoring
	// RequiresConfirmation.
	AutoApprove []string `json:"auto_approve,omitempty"`
	// Page is the SPA page identifier (e.g. "jobs", "submit") sent
	// with each request. The engine uses it to:
	//   - filter the advertised tool schema to tools tagged for this
	//     page (or untagged tools, which are universal), and
	//   - select the per-page system-prompt suffix from the engine's
	//     pageInstructions map.
	// The Page string is NEVER interpolated into prompt text directly;
	// it's only used as a map key, so a malicious browser sending a
	// crafted Page value cannot inject prompt content.
	Page string `json:"page,omitempty"`
	// PageContext is a free-form string the SPA can send to provide
	// per-request context the LLM should know about — e.g. on the
	// per-job page, the cluster.proc id, current job status, last
	// host. It's appended verbatim to the system prompt under a
	// "Page context" header.
	//
	// Trust model: the SPA is not adversarial against itself, and the
	// chat surface only ever shows the user their own data, so the
	// risk of "prompt injection from the page context" is the same as
	// the user typing the same content. Tools never read PageContext;
	// it exists purely to pre-populate the model's working memory so
	// the LLM doesn't waste tokens asking the user "which job?" on
	// every turn. Cap is enforced by the engine.
	PageContext string `json:"page_context,omitempty"`
}

Request is one /api/v1/chat call. Messages is the full conversation history (browser sends every turn so the server is stateless). Each message's content is the AI-SDK shape — see request_message.go.

PreApprovedToolUseIDs lists the tool_use ids the user has explicitly approved this turn. When the engine encounters a confirmation-required tool whose id is NOT in this set, it pauses and emits a confirmation prompt instead of executing.

type RequestMessage

type RequestMessage struct {
	ID    string               `json:"id,omitempty"`
	Role  string               `json:"role"` // "user" | "assistant"
	Parts []RequestMessagePart `json:"parts,omitempty"`
}

RequestMessage is one turn from the AI-SDK v6 `useChat()` payload. The browser sends every turn on every POST so the server stays stateless. Content is loosely-typed because the AI SDK packs "parts" of various kinds into the same array — text and tool invocations — and we only need to reshape them for Anthropic.

type RequestMessagePart

type RequestMessagePart struct {
	Type       string `json:"type"`
	Text       string `json:"text,omitempty"`
	ToolCallID string `json:"toolCallId,omitempty"`

	State     string          `json:"state,omitempty"`
	Input     json.RawMessage `json:"input,omitempty"`
	Output    json.RawMessage `json:"output,omitempty"`
	ErrorText string          `json:"errorText,omitempty"`
}

RequestMessagePart is one element inside RequestMessage.Parts.

AI SDK v6 packs each tool invocation into a single part whose Type is "tool-<toolName>" and whose lifecycle is tracked via State:

input-streaming         — args still streaming; ignore on server
input-available         — args complete; emit tool_use to model
input-approval-required — paused for user confirmation
output-available        — result back; emit tool_use + tool_result
output-error            — tool failed; emit tool_use + tool_result(err)

The same part holds Input (model args) and Output (executed result) once the lifecycle progresses, so we split it back into Anthropic's separate tool_use / tool_result blocks at flatten-time.

type StreamEvent

type StreamEvent struct {
	// Kind discriminates the event:
	//   "text_delta"    — incremental text from the assistant
	//   "tool_use"      — a complete tool_use block (gathered from input_json_delta deltas)
	//   "message_stop"  — assistant turn finished; StopReason is set
	//   "error"         — terminal error; Err is set
	Kind       string
	Text       string          // "text_delta"
	ToolUseID  string          // "tool_use"
	ToolName   string          // "tool_use"
	ToolInput  json.RawMessage // "tool_use"
	StopReason string          // "message_stop": "end_turn", "tool_use", "max_tokens"
	Err        error           // "error"
}

StreamEvent is one parsed Anthropic SSE event the engine emits to its caller. We surface a small fixed set of types (the others — "ping", "message_start", etc. — are absorbed internally) so the engine code stays linear.

type Tool

type Tool interface {
	Name() string
	Description() string
	InputSchema() json.RawMessage
	// ClientSide reports whether this tool runs in the browser. The
	// engine handles client-side tools differently: it forwards the
	// tool_use to the SPA and pauses until the SPA sends back a
	// tool_result on the next request.
	ClientSide() bool
	// RequiresConfirmation reports whether the engine should pause
	// for user approval before executing. Always false for read-only
	// or client-side tools; true for write actions (hold/release/
	// remove). The frontend renders an Approve / Reject UI for these.
	RequiresConfirmation() bool
	// AvailablePages reports which SPA pages this tool is exposed on.
	// Empty/nil means "every page" — i.e. universally available.
	// The engine filters tools by Request.Page before advertising the
	// schema to the LLM, so a tool tagged ["jobs"] is invisible from
	// the submit page (and vice versa). Use this to prevent the LLM
	// from calling a UI-mutation tool whose target component isn't
	// even mounted.
	AvailablePages() []string
	Execute(ctx context.Context, actor string, input json.RawMessage) (string, error)
}

Tool is the server-side interface every chat tool must satisfy. The engine consults Schema to advertise the tool to the LLM and calls Execute when the LLM emits a tool_use referencing this name.

Execute receives the authenticated user identity in `actor` so each implementation can enforce its own scoping (e.g. inject Owner == actor into a job-query constraint). The string return is the JSON payload the LLM will see as the tool result; if it contains non-JSON, the engine wraps it as { "result": <string> }. Errors surface to the LLM as a tool_result with is_error=true so the model can recover (or apologize) rather than crashing the turn.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer serializes engine output as the AI-SDK v6 UI-message-stream format `useChat()` reads via DefaultChatTransport. The on-wire shape is plain SSE (`data: {json}\n\n`) where each event's payload is one UIMessageChunk — the typed union from `ai` v6's TypeScript surface.

We emit a small subset of chunk types:

text-start / text-delta / text-end          — assistant prose
tool-input-start                             — opening a tool call
tool-input-available                         — tool call's args complete
tool-output-available / tool-output-error    — server-executed result
tool-approval-request                        — destructive-tool confirm
error                                        — terminal error

Buffered writer + flusher pinning ensures text characters reach the browser as they're produced; the SDK's stream parser doesn't require any padding or keep-alives.

func NewWriter

func NewWriter(rw http.ResponseWriter) *Writer

NewWriter wraps an http.ResponseWriter for the chat handler. Sets the SSE response headers if not already present.

IMPORTANT: we use http.NewResponseController(rw) instead of the classic `rw.(http.Flusher)` type assertion. The Go server pipes every request through wrapping middleware (accessLogMiddleware's responseWriter, metrics' statusRecorder, etc.) that captures status codes and byte counts by embedding http.ResponseWriter. None of those wrappers re-implement the Flusher interface explicitly — and a direct type assertion fails on an embedded-interface wrapper, so `flusher` would be nil and our Flush calls would be no-ops. Result: every text-delta sits in Go's chunked-encoding buffer until the response ends, so the entire turn appears at once in the browser no matter how slow the upstream LLM streamed.

http.NewResponseController (Go 1.20+) walks the Unwrap() chain to find a real Flusher, so as long as every wrapper exposes Unwrap() — which they do — we always get a working flush. Belt-and-braces: see ../server.go and ../metrics.go for the wrappers.

func (*Writer) Close

func (w *Writer) Close()

Close flushes any buffered output and closes any open text part. Defer this from the handler — engine doesn't own the writer.

func (*Writer) FinishText

func (w *Writer) FinishText()

FinishText emits the matching text-end for any open text part. Idempotent. Called automatically by Close, but the engine calls it explicitly between assistant turns and tool dispatches so the UI can render distinct message segments rather than one giant streaming blob.

func (*Writer) WriteConfirmationRequest

func (w *Writer) WriteConfirmationRequest(toolCallID, toolName string, args json.RawMessage)

WriteConfirmationRequest emits a tool-input-available followed by a tool-approval-request. The SDK's `addToolApprovalResponse` helper on the React side completes the round-trip — when the user clicks Approve, the SPA POSTs back with the approval response, and the engine's next call sees the tool-use ID in PreApprovedToolUseIDs.

The approvalId we send here is the same as the toolCallId so the frontend can correlate without an extra map.

func (*Writer) WriteError

func (w *Writer) WriteError(msg string)

WriteError emits a terminal error chunk. Caller should return after this — the stream is over.

func (*Writer) WriteFinish

func (w *Writer) WriteFinish(_ string)

WriteFinish balances any open text part. Optional but well- behaved; the SDK uses message boundaries to drop the typing indicator.

stopReason is preserved for log breadcrumbs in protocol.go's signature but no longer threaded into the wire format — the SDK no longer carries a finish-reason chunk.

func (*Writer) WriteStepFinish

func (w *Writer) WriteStepFinish()

WriteStepFinish emits a finish-step chunk balancing a prior WriteStepStart. The SDK uses it to flush per-step state (active text/reasoning parts); it doesn't push a part itself. Safe to skip, but emitting matches the SDK's server-side conventions.

func (*Writer) WriteStepStart

func (w *Writer) WriteStepStart()

WriteStepStart emits a start-step chunk. The AI-SDK on the client uses these as boundaries between LLM "turns" within a single streaming assistant message: lastAssistantMessageIsCompleteWithTool Calls (the auto-resubmit predicate) only inspects parts AFTER the most recent step-start. Without these, a turn that emits both a (resolved) tool call AND trailing text gets misclassified as "complete with tool calls" — the predicate fires forever and the chat loops on /api/v1/chat with no progress.

Pair with WriteStepFinish around each LLM call. Subsequent re- submits accumulate into the same assistant message client-side, so the step boundaries are what tells them apart.

func (*Writer) WriteText

func (w *Writer) WriteText(s string)

WriteText emits an incremental text delta. The first call implicitly emits a text-start; subsequent calls just emit deltas. Call FinishText (or Close) to balance with text-end.

func (*Writer) WriteToolCall

func (w *Writer) WriteToolCall(toolCallID, toolName string, args json.RawMessage, providerExecuted bool)

WriteToolCall emits the tool-input-start + tool-input-available pair. The two-step shape is the SDK's protocol (the start exists for streaming arg deltas, which we don't bother with — we always have the complete input in hand).

providerExecuted controls whether the SDK fires `onToolCall` on the client when the tool-input-available chunk arrives:

false — we're announcing a CLIENT-SIDE tool. The SPA is expected
        to execute it via its hooks bag and post the result back
        via addToolResult. onToolCall MUST fire.
true  — we're announcing a SERVER-SIDE tool. We're just informing
        the UI so it can render a "calling X..." indicator;
        execution is happening server-side and the result chunk
        follows shortly. onToolCall MUST NOT fire — if it did, the
        SPA's handler would see an unknown-tool name and reply
        with an error result that races the actual server result.

The SDK's tool-input-available handler skips onToolCall when providerExecuted is truthy (see ai/dist/index.mjs ~5669: `if (onToolCall && !chunk.providerExecuted)`).

func (*Writer) WriteToolError

func (w *Writer) WriteToolError(toolCallID, _, msg string)

WriteToolError emits a tool-output-error so the SDK can render the failure inline and the LLM gets is_error=true on the next turn. toolName unused on the wire; see WriteToolResult.

func (*Writer) WriteToolResult

func (w *Writer) WriteToolResult(toolCallID, _, result string)

WriteToolResult emits the result of a server-executed tool. The SDK keeps it in the message history; the assistant references it on the next turn.

toolName is accepted but currently unused on the wire — the SDK correlates results to calls via toolCallId. Kept on the signature so the call sites read symmetrically with WriteToolCall / WriteToolError, and so a future SDK that wants the name on the result chunk can be wired in without churning every caller.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL