mcpharness

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2026 License: MIT Imports: 14 Imported by: 0

README

mcpharness

Go Reference CI Go Report Card

Testing toolkit for Go MCP server authors.

The Model Context Protocol (MCP) ecosystem in Go has two competing server frameworks (mark3labs/mcp-go and modelcontextprotocol/go-sdk) plus a growing set of domain-specific MCP servers (GitHub, Grafana, Kubernetes, Terraform, …). Every project ends up hand-rolling roughly the same test plumbing: an in-process client to drive the server, a way to record real sessions for regression tests, an assertion harness for tool behaviour.

mcpharness fills that gap with a small SDK-neutral surface.

Features

  • mcpharness.Client — neutral interface every adapter implements. Two adapters ship today: mark3 for mark3labs/mcp-go (8.7k ⭐, the de-facto Go MCP framework), and sdk for modelcontextprotocol/go-sdk (4.6k ⭐, the official Anthropic SDK).
  • Recorder wraps any Client and writes every call (initialize, tools/list, tools/call, resources/list, resources/read) to a JSON Lines stream.
  • Replay reads a recorded stream back and returns a deterministic Client that asserts each call matches the recording. Catches three regression classes: wrong method, wrong params, extra/missing calls.
  • FuzzCallTool plugs any Client + tool name into Go's native *testing.F fuzz infrastructure. Per-iteration timeout, fails on panic / hang / transport error, accepts IsError=true as a handled-error signal.
  • Snapshot golden-file regression for any value with stable JSON canonicalisation. First run creates the baseline; subsequent runs diff. MCPHARNESS_UPDATE_SNAPSHOTS=1 to bulk-regenerate.
  • conformance.Run — bridge to Anthropic's official conformance test harness. Drive npx @modelcontextprotocol/conformance from go test, fail loudly on any scenario regression. Skips automatically when Node.js is unavailable.

Why not just use the framework's own client?

You can, and you should for simple smoke tests. mcpharness exists for the moments when one client isn't enough:

  • Test against multiple framework versions without rewriting tests — the neutral Client interface stays put when the underlying framework's request types churn.
  • Record once, replay forever — capture a real session against your production server, commit the file, then run deterministic regression tests in CI without standing up the real server.
  • Catch divergence earlyReplay fails loudly on wrong method, wrong params, or missing/extra calls, so a behaviour drift surfaces as a precise test failure rather than a silent wrong assertion.

Install

go get github.com/ultramcu/mcpharness@latest

Quick start (mark3labs adapter)

package mcpserver_test

import (
    "context"
    "testing"

    "github.com/mark3labs/mcp-go/mcp"
    "github.com/mark3labs/mcp-go/server"
    "github.com/ultramcu/mcpharness"
    "github.com/ultramcu/mcpharness/mark3"
)

func TestEcho(t *testing.T) {
    srv := server.NewMCPServer("echo", "0.1.0")
    srv.AddTool(
        mcp.NewTool("echo", mcp.WithString("text", mcp.Required())),
        func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
            text, _ := req.Params.Arguments.(map[string]any)["text"].(string)
            return mcp.NewToolResultText(text), nil
        },
    )

    client, err := mark3.New(srv)
    if err != nil { t.Fatal(err) }
    defer client.Close()

    if _, err := client.Initialize(context.Background()); err != nil {
        t.Fatal(err)
    }
    res, err := client.CallTool(context.Background(), "echo", map[string]any{"text": "ping"})
    if err != nil { t.Fatal(err) }
    if res.IsError { t.Fatal("tool returned IsError") }
    // res.Content[0] == map[string]any{"type":"text", "text":"ping"}
}

Record and replay

// Phase 1: record a real session into testdata/echo.jsonl
func TestRecord(t *testing.T) {
    f, _ := os.Create("testdata/echo.jsonl")
    defer f.Close()

    real, _ := mark3.New(buildServer(t))
    rec := mcpharness.NewRecorder(real, f)
    defer rec.Close()

    rec.Initialize(ctx)
    rec.CallTool(ctx, "echo", map[string]any{"text": "ping"})
}

// Phase 2: in CI, replay deterministically without the real server
func TestReplay(t *testing.T) {
    f, _ := os.Open("testdata/echo.jsonl")
    defer f.Close()

    replay := mcpharness.NewReplay(t, f)
    defer replay.Close()  // fails the test if any recorded entries were not consumed

    replay.Initialize(ctx)                                       // asserts seq=1 matches
    replay.CallTool(ctx, "echo", map[string]any{"text": "ping"}) // asserts seq=2 matches
}

If the second-phase test calls a method that doesn't match the recording, or passes different params, Replay calls t.Fatalf with a precise diff — no silent drift.

Conformance bridge

import "github.com/ultramcu/mcpharness/conformance"

func TestMCPConformance(t *testing.T) {
    srv := startMyServerOnRandomPort(t) // your own HTTP transport setup
    conformance.Run(t, srv.URL)         // skips if npx not on PATH
}

Narrow the run to a single suite for faster iteration:

conformance.Run(t, srv.URL, conformance.WithSuite("core"))

Fuzz a tool

func FuzzEchoTool(f *testing.F) {
    srv := buildEchoServer()
    client, _ := mark3.New(srv)
    defer client.Close()

    mcpharness.FuzzCallTool(f, client, "echo",
        map[string]any{"text": "hello"},
        map[string]any{"text": ""},
        map[string]any{},
    )
}
// go test -fuzz=FuzzEchoTool -fuzztime=30s ./...

Inputs that don't decode as JSON objects are silently skipped. Inputs that make the tool panic, hang past the per-iteration timeout, or surface a transport error fail the fuzz iteration — but a tool returning IsError=true is treated as a valid handled-error path.

Snapshot a result

res, _ := client.CallTool(ctx, "echo", map[string]any{"text": "ping"})
mcpharness.Snapshot(t, "echo-ping", res)

First run writes testdata/snapshots/echo-ping.json and logs that a baseline was created. Subsequent runs compare byte-for-byte after stable JSON canonicalisation. To intentionally regenerate after a behaviour change, set the env var:

MCPHARNESS_UPDATE_SNAPSHOTS=1 go test ./...

Roadmap

  • v0.1: Client + Recorder + Replay + mark3labs adapter. (shipped)
  • v0.2: adapter for modelcontextprotocol/go-sdk; conformance.Run bridge to the official npx @modelcontextprotocol/conformance harness. (shipped)
  • v0.3 (this release): FuzzCallTool harness on top of Go's native *testing.F; Snapshot golden-file helper with MCPHARNESS_UPDATE_SNAPSHOTS env override.
  • v0.4+: HTTP-transport spawner helper to make the conformance bridge fully turnkey; resource-template support in the Client surface; multi-content ReadResource accessor.

Versioning

mcpharness follows SemVer. Until 1.0, any minor version bump may include breaking API changes (we'll keep them minimal and well-documented in the CHANGELOG).

Contributing

Issues and PRs welcome. Please open an issue first for any non-trivial change so we can align on direction before you spend time on a PR.

License

MIT © 2026 ultramcu

Documentation

Overview

Package mcpharness is a testing toolkit for Go MCP server authors.

The Model Context Protocol (MCP) ecosystem has two competing Go server frameworks — mark3labs/mcp-go and modelcontextprotocol/go-sdk — plus a growing set of domain-specific MCP servers (github, grafana, k8s, terraform, …). Every project ends up hand-rolling roughly the same test plumbing: a client to drive the server in-process, a way to record real sessions for regression tests, an assertion harness for tool behaviour.

mcpharness fills that gap with a small SDK-neutral surface:

  • Client is the test interface every adapter implements. Two adapters ship: github.com/ultramcu/mcpharness/mark3 for mark3labs/mcp-go, and github.com/ultramcu/mcpharness/sdk for the official modelcontextprotocol/go-sdk.

  • Recorder wraps any Client and writes every call's request and response to a JSON Lines stream. [Replay] reads such a stream back and returns a deterministic Client that asserts each call matches the recorded sequence.

  • FuzzCallTool plugs a Client + tool name into Go's native `*testing.F` fuzz infrastructure. Fails on panic, hang, or transport error; treats IsError=true as a handled error.

  • Snapshot is a golden-file regression helper for any value, with stable JSON canonicalisation. Set MCPHARNESS_UPDATE_SNAPSHOTS=1 to regenerate baselines after intentional behaviour changes.

Quick example using the mark3labs adapter:

import (
    "context"
    "testing"

    "github.com/mark3labs/mcp-go/server"
    "github.com/ultramcu/mcpharness"
    "github.com/ultramcu/mcpharness/mark3"
)

func TestEcho(t *testing.T) {
    srv := server.NewMCPServer("echo", "0.1.0")
    // ... register tools on srv ...

    client, err := mark3.New(srv)
    if err != nil { t.Fatal(err) }
    defer client.Close()

    if _, err := client.Initialize(context.Background()); err != nil {
        t.Fatal(err)
    }
    tools, _ := client.ListTools(context.Background())
    if len(tools) == 0 {
        t.Fatal("no tools advertised")
    }
}

See the mark3 and sdk subpackages for adapter end-to-end patterns, and the conformance subpackage for driving Anthropic's official test harness from go test.

Index

Constants

View Source
const SnapshotDir = "testdata/snapshots"

SnapshotDir is the on-disk location for snapshot files, relative to the test's working directory (which Go conventionally sets to the package directory). Override per-call via WithDir.

View Source
const UpdateSnapshotsEnv = "MCPHARNESS_UPDATE_SNAPSHOTS"

UpdateSnapshotsEnv is the environment variable that, when set to a truthy value ("1", "true", "yes"), tells Snapshot to (re)write the on-disk snapshot file instead of comparing. Use it when you've intentionally changed behaviour and want to regenerate baselines:

MCPHARNESS_UPDATE_SNAPSHOTS=1 go test ./...

Variables

View Source
var ErrNotInitialized = errors.New("mcpharness: Initialize not called")

ErrNotInitialized signals a method was called before Initialize. Adapters MAY return this directly; the interface contract says Initialize must be the first call.

Functions

func FuzzCallTool added in v0.3.0

func FuzzCallTool(f *testing.F, client Client, toolName string, seeds ...map[string]any)

FuzzCallTool wires a Client + a tool name into Go's native fuzz infrastructure. Given one or more seed argument maps, it marshals each to JSON, registers the bytes as fuzz seeds, then on each fuzz iteration unmarshals the mutated bytes back into a map and invokes the tool with a hard per-call timeout.

The test FAILS only on conditions a well-behaved server should never produce on adversarial input:

  • The transport returns an error (treated as a protocol panic).
  • The call hangs past the per-call timeout.
  • The Go runtime panics inside the handler.

A tool returning `IsError = true` is acceptable — that's the spec- correct way to signal a handled tool execution error, and is exactly what well-formed validation should produce on garbage input.

Inputs that don't decode as a JSON object are silently skipped (via t.SkipNow inside the fuzz callback), so the corpus stays focused on "valid JSON, possibly hostile content" — which is the realistic threat model for an MCP tool fed by a possibly-misbehaving LLM.

Typical usage:

func FuzzEchoTool(f *testing.F) {
    srv := buildEchoServer()
    client, _ := mark3.New(srv)
    defer client.Close()
    mcpharness.FuzzCallTool(f, client, "echo",
        map[string]any{"text": "hello"},
        map[string]any{"text": ""},
        map[string]any{},
    )
}

Run with: `go test -fuzz=FuzzEchoTool -fuzztime=30s ./...`

func Snapshot added in v0.3.0

func Snapshot(t TestingT, name string, got any, opt ...SnapshotOption)

Snapshot compares got against a stored snapshot file named after `name`. Behaviour:

  • On first run (file missing): the canonical JSON form of got is written to disk and the test is logged-but-not-failed.
  • On subsequent runs: got is canonicalised the same way and compared byte-for-byte to the stored file; on divergence, Snapshot calls t.Fatalf with a line-aware diff.
  • When UpdateSnapshotsEnv is set, the file is (re)written and the test is logged-but-not-failed regardless of any divergence.

Snapshots are canonicalised by `json.MarshalIndent` with stable (lexicographic) map key order, so unrelated changes in input map ordering do not produce false diffs.

The directory `testdata/snapshots/` is created on demand. Snapshot files are intended to be committed to the repo as test fixtures.

Types

type CallToolResult

type CallToolResult struct {
	Content []any
	IsError bool
}

CallToolResult mirrors `tools/call` results. Content is the raw content array from the server — each element is typically a map with at least a "type" key ("text", "image", "resource"). IsError is the MCP-spec flag that signals a tool-execution error (distinct from a transport or protocol error, which the caller gets via err).

type Client

type Client interface {
	// Initialize performs the MCP initialize handshake and returns the
	// server's advertised name/version/capabilities. Must be called once
	// before any other method.
	Initialize(ctx context.Context) (*InitResult, error)

	// ListTools returns the server's advertised tools after handshake.
	ListTools(ctx context.Context) ([]Tool, error)

	// CallTool invokes a tool by name with structured arguments. The
	// returned result preserves the server's content blocks and the
	// IsError flag (which distinguishes a tool-execution error from a
	// protocol error — the latter comes back via the error return).
	CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)

	// ListResources returns the server's advertised resources.
	ListResources(ctx context.Context) ([]Resource, error)

	// ReadResource fetches the contents of a single resource by URI.
	ReadResource(ctx context.Context, uri string) (*ResourceContents, error)

	// Close releases the underlying client/transport. Safe to call
	// multiple times; subsequent calls return nil.
	Close() error
}

Client is the SDK-neutral test interface that MCP server tests target.

Adapters in subpackages (mark3, and future sdk) provide concrete implementations against the underlying server frameworks. Tests stay written against this interface so they survive framework choice and minor version drift.

The interface intentionally covers the 80% of calls that tool tests actually need (Initialize, list/call tools, list/read resources). Prompts, sampling, completion, subscriptions, and logging are deferred until a real test demands them — keeping the surface small is the whole point.

type InitResult

type InitResult struct {
	ServerName      string
	ServerVersion   string
	ProtocolVersion string
	Capabilities    map[string]any
}

InitResult mirrors the subset of MCP InitializeResult that tests care about. Capabilities is the raw map from the server — tests can type-assert into it for capability-specific checks.

type Recorder

type Recorder struct {
	// contains filtered or unexported fields
}

Recorder wraps a Client and writes every method call's request and response to an io.Writer in JSON Lines format. One line per call.

The recorded stream is a deterministic regression artifact: capture a real server session once, commit the file (testdata/<name>.jsonl), then drive future tests with Replay against the file. If the server changes its behaviour, the test fails with a precise diff.

Recorder is safe for concurrent use only when the wrapped Client is.

func NewRecorder

func NewRecorder(inner Client, out io.Writer) *Recorder

NewRecorder wraps inner and writes JSON Lines records to out. Typical usage in a test:

f, _ := os.Create("testdata/echo.jsonl")
defer f.Close()
client := mcpharness.NewRecorder(realClient, f)
// ... drive client as usual ...

func (*Recorder) CallTool

func (r *Recorder) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)

CallTool records the call.

func (*Recorder) Close

func (r *Recorder) Close() error

Close releases the underlying client. The recording writer is the caller's responsibility (we don't own it — the caller passed it in).

func (*Recorder) Initialize

func (r *Recorder) Initialize(ctx context.Context) (*InitResult, error)

Initialize records the call.

func (*Recorder) ListResources

func (r *Recorder) ListResources(ctx context.Context) ([]Resource, error)

ListResources records the call.

func (*Recorder) ListTools

func (r *Recorder) ListTools(ctx context.Context) ([]Tool, error)

ListTools records the call.

func (*Recorder) ReadResource

func (r *Recorder) ReadResource(ctx context.Context, uri string) (*ResourceContents, error)

ReadResource records the call.

type ReplayClient

type ReplayClient struct {
	// contains filtered or unexported fields
}

ReplayClient is a Client that returns deterministic responses from a previously-recorded JSON Lines stream. Each call asserts the method and params match the next recorded entry; if they don't, the test fails via t.Fatalf with a precise diff so the divergence is obvious.

Replay is single-goroutine: it advances a position counter and is not safe for concurrent use. Recordings from concurrent runs are inherently non-deterministic and should not be replayed.

func NewReplay

func NewReplay(t TestingT, r io.Reader) *ReplayClient

NewReplay reads a JSON Lines stream from r and returns a ReplayClient that walks it on each call. The Client implementation reports any divergence (wrong method, wrong params, extra call, missing call) via t.Fatalf.

func (*ReplayClient) CallTool

func (c *ReplayClient) CallTool(ctx context.Context, name string, args map[string]any) (*CallToolResult, error)

func (*ReplayClient) Close

func (c *ReplayClient) Close() error

Close marks the replay as finished and fails the test if any recorded entries went unused (missing calls). This catches the "test didn't drive everything we recorded" class of regression.

func (*ReplayClient) Initialize

func (c *ReplayClient) Initialize(ctx context.Context) (*InitResult, error)

func (*ReplayClient) ListResources

func (c *ReplayClient) ListResources(ctx context.Context) ([]Resource, error)

func (*ReplayClient) ListTools

func (c *ReplayClient) ListTools(ctx context.Context) ([]Tool, error)

func (*ReplayClient) ReadResource

func (c *ReplayClient) ReadResource(ctx context.Context, uri string) (*ResourceContents, error)

type Resource

type Resource struct {
	URI         string
	Name        string
	Description string
	MimeType    string
}

Resource is the SDK-neutral view of a resource entry from `resources/list`.

type ResourceContents

type ResourceContents struct {
	URI      string
	MimeType string
	Text     string
	Blob     []byte
}

ResourceContents holds the body of a single resource read. Exactly one of Text or Blob is populated based on MimeType — callers can switch on `len(Blob) > 0` or check MimeType.

type SnapshotOption added in v0.3.0

type SnapshotOption func(*snapshotConfig)

SnapshotOption configures Snapshot.

func WithDir added in v0.3.0

func WithDir(dir string) SnapshotOption

WithDir overrides the directory snapshots live in (default: testdata/snapshots).

func WithExt added in v0.3.0

func WithExt(ext string) SnapshotOption

WithExt overrides the on-disk file extension (default: ".json").

type TestingT

type TestingT interface {
	Helper()
	Fatalf(format string, args ...any)
	Logf(format string, args ...any)
}

TestingT is the subset of testing.TB that this package needs. Pass *testing.T in your test; the indirection lets the package stay free of a testing-only import path in production code.

Methods used:

  • Helper: marks the caller as a helper for error attribution.
  • Fatalf: report a failure and abort the test.
  • Logf: emit non-fatal informational output (used by Snapshot's first-run / update paths).

type Tool

type Tool struct {
	Name        string
	Description string
	InputSchema map[string]any
}

Tool is the SDK-neutral view of a tool entry from `tools/list`. InputSchema is the raw JSON schema, kept as a generic map so tests can assert on any field without binding to a schema-library type.

Directories

Path Synopsis
Package conformance bridges Anthropic's official conformance test harness (https://github.com/modelcontextprotocol/conformance) to Go tests.
Package conformance bridges Anthropic's official conformance test harness (https://github.com/modelcontextprotocol/conformance) to Go tests.
Package mark3 is an mcpharness adapter for mark3labs/mcp-go.
Package mark3 is an mcpharness adapter for mark3labs/mcp-go.
Package sdk is an mcpharness adapter for modelcontextprotocol/go-sdk (the official Anthropic Go SDK).
Package sdk is an mcpharness adapter for modelcontextprotocol/go-sdk (the official Anthropic Go SDK).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL