Documentation
¶
Overview ¶
Package mcpharness is a testing toolkit for Go MCP server authors.
The Model Context Protocol (MCP) ecosystem has two competing Go server frameworks — mark3labs/mcp-go and modelcontextprotocol/go-sdk — plus a growing set of domain-specific MCP servers (github, grafana, k8s, terraform, …). Every project ends up hand-rolling roughly the same test plumbing: a client to drive the server in-process, a way to record real sessions for regression tests, an assertion harness for tool behaviour.
mcpharness fills that gap with a small SDK-neutral surface:
Client is the test interface every adapter implements. Two adapters ship: github.com/ultramcu/mcpharness/mark3 for mark3labs/mcp-go, and github.com/ultramcu/mcpharness/sdk for the official modelcontextprotocol/go-sdk.
Recorder wraps any Client and writes every call's request and response to a JSON Lines stream. [Replay] reads such a stream back and returns a deterministic Client that asserts each call matches the recorded sequence.
FuzzCallTool plugs a Client + tool name into Go's native `*testing.F` fuzz infrastructure. Fails on panic, hang, or transport error; treats IsError=true as a handled error.
Snapshot is a golden-file regression helper for any value, with stable JSON canonicalisation. Set MCPHARNESS_UPDATE_SNAPSHOTS=1 to regenerate baselines after intentional behaviour changes.
Quick example using the mark3labs adapter:
import (
"context"
"testing"
"github.com/mark3labs/mcp-go/server"
"github.com/ultramcu/mcpharness"
"github.com/ultramcu/mcpharness/mark3"
)
func TestEcho(t *testing.T) {
srv := server.NewMCPServer("echo", "0.1.0")
// ... register tools on srv ...
client, err := mark3.New(srv)
if err != nil { t.Fatal(err) }
defer client.Close()
if _, err := client.Initialize(context.Background()); err != nil {
t.Fatal(err)
}
tools, _ := client.ListTools(context.Background())
if len(tools) == 0 {
t.Fatal("no tools advertised")
}
}
See the mark3 and sdk subpackages for adapter end-to-end patterns, and the conformance subpackage for driving Anthropic's official test harness from go test.
Index ¶
- Constants
- Variables
- func FuzzCallTool(f *testing.F, client Client, toolName string, seeds ...map[string]any)
- func Snapshot(t TestingT, name string, got any, opt ...SnapshotOption)
- type CallToolResult
- type Client
- type InitResult
- type Recorder
- func (r *Recorder) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)
- func (r *Recorder) Close() error
- func (r *Recorder) Initialize(ctx context.Context) (*InitResult, error)
- func (r *Recorder) ListResources(ctx context.Context) ([]Resource, error)
- func (r *Recorder) ListTools(ctx context.Context) ([]Tool, error)
- func (r *Recorder) ReadResource(ctx context.Context, uri string) (*ResourceContents, error)
- type ReplayClient
- func (c *ReplayClient) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)
- func (c *ReplayClient) Close() error
- func (c *ReplayClient) Initialize(ctx context.Context) (*InitResult, error)
- func (c *ReplayClient) ListResources(ctx context.Context) ([]Resource, error)
- func (c *ReplayClient) ListTools(ctx context.Context) ([]Tool, error)
- func (c *ReplayClient) ReadResource(ctx context.Context, uri string) (*ResourceContents, error)
- type Resource
- type ResourceContents
- type SnapshotOption
- type TestingT
- type Tool
Constants ¶
const SnapshotDir = "testdata/snapshots"
SnapshotDir is the on-disk location for snapshot files, relative to the test's working directory (which Go conventionally sets to the package directory). Override per-call via WithDir.
const UpdateSnapshotsEnv = "MCPHARNESS_UPDATE_SNAPSHOTS"
UpdateSnapshotsEnv is the environment variable that, when set to a truthy value ("1", "true", "yes"), tells Snapshot to (re)write the on-disk snapshot file instead of comparing. Use it when you've intentionally changed behaviour and want to regenerate baselines:
MCPHARNESS_UPDATE_SNAPSHOTS=1 go test ./...
Variables ¶
var ErrNotInitialized = errors.New("mcpharness: Initialize not called")
ErrNotInitialized signals a method was called before Initialize. Adapters MAY return this directly; the interface contract says Initialize must be the first call.
Functions ¶
func FuzzCallTool ¶ added in v0.3.0
FuzzCallTool wires a Client + a tool name into Go's native fuzz infrastructure. Given one or more seed argument maps, it marshals each to JSON, registers the bytes as fuzz seeds, then on each fuzz iteration unmarshals the mutated bytes back into a map and invokes the tool with a hard per-call timeout.
The test FAILS only on conditions a well-behaved server should never produce on adversarial input:
- The transport returns an error (treated as a protocol panic).
- The call hangs past the per-call timeout.
- The Go runtime panics inside the handler.
A tool returning `IsError = true` is acceptable — that's the spec- correct way to signal a handled tool execution error, and is exactly what well-formed validation should produce on garbage input.
Inputs that don't decode as a JSON object are silently skipped (via t.SkipNow inside the fuzz callback), so the corpus stays focused on "valid JSON, possibly hostile content" — which is the realistic threat model for an MCP tool fed by a possibly-misbehaving LLM.
Typical usage:
func FuzzEchoTool(f *testing.F) {
srv := buildEchoServer()
client, _ := mark3.New(srv)
defer client.Close()
mcpharness.FuzzCallTool(f, client, "echo",
map[string]any{"text": "hello"},
map[string]any{"text": ""},
map[string]any{},
)
}
Run with: `go test -fuzz=FuzzEchoTool -fuzztime=30s ./...`
func Snapshot ¶ added in v0.3.0
func Snapshot(t TestingT, name string, got any, opt ...SnapshotOption)
Snapshot compares got against a stored snapshot file named after `name`. Behaviour:
- On first run (file missing): the canonical JSON form of got is written to disk and the test is logged-but-not-failed.
- On subsequent runs: got is canonicalised the same way and compared byte-for-byte to the stored file; on divergence, Snapshot calls t.Fatalf with a line-aware diff.
- When UpdateSnapshotsEnv is set, the file is (re)written and the test is logged-but-not-failed regardless of any divergence.
Snapshots are canonicalised by `json.MarshalIndent` with stable (lexicographic) map key order, so unrelated changes in input map ordering do not produce false diffs.
The directory `testdata/snapshots/` is created on demand. Snapshot files are intended to be committed to the repo as test fixtures.
Types ¶
type CallToolResult ¶
CallToolResult mirrors `tools/call` results. Content is the raw content array from the server — each element is typically a map with at least a "type" key ("text", "image", "resource"). IsError is the MCP-spec flag that signals a tool-execution error (distinct from a transport or protocol error, which the caller gets via err).
type Client ¶
type Client interface {
// Initialize performs the MCP initialize handshake and returns the
// server's advertised name/version/capabilities. Must be called once
// before any other method.
Initialize(ctx context.Context) (*InitResult, error)
// ListTools returns the server's advertised tools after handshake.
ListTools(ctx context.Context) ([]Tool, error)
// CallTool invokes a tool by name with structured arguments. The
// returned result preserves the server's content blocks and the
// IsError flag (which distinguishes a tool-execution error from a
// protocol error — the latter comes back via the error return).
CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)
// ListResources returns the server's advertised resources.
ListResources(ctx context.Context) ([]Resource, error)
// ReadResource fetches the contents of a single resource by URI.
ReadResource(ctx context.Context, uri string) (*ResourceContents, error)
// Close releases the underlying client/transport. Safe to call
// multiple times; subsequent calls return nil.
Close() error
}
Client is the SDK-neutral test interface that MCP server tests target.
Adapters in subpackages (mark3, and future sdk) provide concrete implementations against the underlying server frameworks. Tests stay written against this interface so they survive framework choice and minor version drift.
The interface intentionally covers the 80% of calls that tool tests actually need (Initialize, list/call tools, list/read resources). Prompts, sampling, completion, subscriptions, and logging are deferred until a real test demands them — keeping the surface small is the whole point.
type InitResult ¶
type InitResult struct {
ServerName string
ServerVersion string
ProtocolVersion string
Capabilities map[string]any
}
InitResult mirrors the subset of MCP InitializeResult that tests care about. Capabilities is the raw map from the server — tests can type-assert into it for capability-specific checks.
type Recorder ¶
type Recorder struct {
// contains filtered or unexported fields
}
Recorder wraps a Client and writes every method call's request and response to an io.Writer in JSON Lines format. One line per call.
The recorded stream is a deterministic regression artifact: capture a real server session once, commit the file (testdata/<name>.jsonl), then drive future tests with Replay against the file. If the server changes its behaviour, the test fails with a precise diff.
Recorder is safe for concurrent use only when the wrapped Client is.
func NewRecorder ¶
NewRecorder wraps inner and writes JSON Lines records to out. Typical usage in a test:
f, _ := os.Create("testdata/echo.jsonl")
defer f.Close()
client := mcpharness.NewRecorder(realClient, f)
// ... drive client as usual ...
func (*Recorder) CallTool ¶
func (r *Recorder) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)
CallTool records the call.
func (*Recorder) Close ¶
Close releases the underlying client. The recording writer is the caller's responsibility (we don't own it — the caller passed it in).
func (*Recorder) Initialize ¶
func (r *Recorder) Initialize(ctx context.Context) (*InitResult, error)
Initialize records the call.
func (*Recorder) ListResources ¶
ListResources records the call.
func (*Recorder) ReadResource ¶
ReadResource records the call.
type ReplayClient ¶
type ReplayClient struct {
// contains filtered or unexported fields
}
ReplayClient is a Client that returns deterministic responses from a previously-recorded JSON Lines stream. Each call asserts the method and params match the next recorded entry; if they don't, the test fails via t.Fatalf with a precise diff so the divergence is obvious.
Replay is single-goroutine: it advances a position counter and is not safe for concurrent use. Recordings from concurrent runs are inherently non-deterministic and should not be replayed.
func NewReplay ¶
func NewReplay(t TestingT, r io.Reader) *ReplayClient
NewReplay reads a JSON Lines stream from r and returns a ReplayClient that walks it on each call. The Client implementation reports any divergence (wrong method, wrong params, extra call, missing call) via t.Fatalf.
func (*ReplayClient) CallTool ¶
func (c *ReplayClient) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)
func (*ReplayClient) Close ¶
func (c *ReplayClient) Close() error
Close marks the replay as finished and fails the test if any recorded entries went unused (missing calls). This catches the "test didn't drive everything we recorded" class of regression.
func (*ReplayClient) Initialize ¶
func (c *ReplayClient) Initialize(ctx context.Context) (*InitResult, error)
func (*ReplayClient) ListResources ¶
func (c *ReplayClient) ListResources(ctx context.Context) ([]Resource, error)
func (*ReplayClient) ListTools ¶
func (c *ReplayClient) ListTools(ctx context.Context) ([]Tool, error)
func (*ReplayClient) ReadResource ¶
func (c *ReplayClient) ReadResource(ctx context.Context, uri string) (*ResourceContents, error)
type ResourceContents ¶
ResourceContents holds the body of a single resource read. Exactly one of Text or Blob is populated based on MimeType — callers can switch on `len(Blob) > 0` or check MimeType.
type SnapshotOption ¶ added in v0.3.0
type SnapshotOption func(*snapshotConfig)
SnapshotOption configures Snapshot.
func WithDir ¶ added in v0.3.0
func WithDir(dir string) SnapshotOption
WithDir overrides the directory snapshots live in (default: testdata/snapshots).
func WithExt ¶ added in v0.3.0
func WithExt(ext string) SnapshotOption
WithExt overrides the on-disk file extension (default: ".json").
type TestingT ¶
type TestingT interface {
Helper()
Fatalf(format string, args ...any)
Logf(format string, args ...any)
}
TestingT is the subset of testing.TB that this package needs. Pass *testing.T in your test; the indirection lets the package stay free of a testing-only import path in production code.
Methods used:
- Helper: marks the caller as a helper for error attribution.
- Fatalf: report a failure and abort the test.
- Logf: emit non-fatal informational output (used by Snapshot's first-run / update paths).
Directories
¶
| Path | Synopsis |
|---|---|
|
Package conformance bridges Anthropic's official conformance test harness (https://github.com/modelcontextprotocol/conformance) to Go tests.
|
Package conformance bridges Anthropic's official conformance test harness (https://github.com/modelcontextprotocol/conformance) to Go tests. |
|
Package mark3 is an mcpharness adapter for mark3labs/mcp-go.
|
Package mark3 is an mcpharness adapter for mark3labs/mcp-go. |
|
Package sdk is an mcpharness adapter for modelcontextprotocol/go-sdk (the official Anthropic Go SDK).
|
Package sdk is an mcpharness adapter for modelcontextprotocol/go-sdk (the official Anthropic Go SDK). |