iouring

package module
v0.0.0-...-c23b921 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

README

iouring

Go Reference Go Report Card

A minimal, zero-dependency Go vectored-writer library for append-heavy log and WAL workloads. Uses Linux io_uring on capable kernels and falls back transparently to writev(2) on other Unix platforms.

Why

The common pattern for log writers is one write(2) syscall per event. At 10 000 events/s that's 10 000 syscalls/s of pure overhead — audit events rarely exceed 500 bytes so device I/O is not the bottleneck; syscall cost is. This library collapses N events into one vectored submission, amortising the syscall overhead across the whole batch, and picks the fastest available primitive on the current kernel.

Install

go get github.com/axonops/audit/iouring

Requires Go 1.22+. No runtime dependencies beyond the Go standard library.

Quick start

Zero ceremony. Use it when you just want to write bytes and do not care about strategy selection or lifecycle:

import "github.com/axonops/audit/iouring"

f, _ := os.OpenFile("audit.log",
    os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o600)
defer f.Close()

// Writev takes int, not uintptr — cast the os.File fd.
n, err := iouring.Writev(int(f.Fd()), [][]byte{
    []byte("hello "),
    []byte("io_uring\n"),
})

Safe for concurrent use — an internal mutex on a lazily- initialised default Writer serialises calls.

Explicit instance

Construct your own Writer when you want a dedicated instance — typically to avoid sharing the default mutex under heavy concurrency, to force a specific strategy in tests, or to hook a *slog.Logger for startup diagnostics:

w, err := iouring.New()
if err != nil {
    return err
}
defer w.Close()
_, err = w.Writev(int(f.Fd()), bufs)

A Writer returned by New is NOT safe for concurrent Writev calls — callers must serialise.

Strategies

A Writer picks one of two strategies at construction:

Strategy Requirement Used for
StrategyIouring Linux 5.5+ with IORING_FEAT_NODROP The fastest path on capable hosts
StrategyWritev Any Unix (Linux, Darwin, *BSD) Portable fallback

On Windows, New returns an error wrapping ErrUnsupported — the library exposes no vectored-write primitive there.

The negotiated strategy is reported by (w *Writer).Strategy() and is stable for the Writer's lifetime:

if w.Strategy() == iouring.StrategyIouring {
    // running on io_uring
}

Short writes

Writev may return (n, nil) with n less than the sum of buffer lengths — for example when writing to a pipe whose buffer is smaller than the iovec total. Callers are responsible for retrying the remainder:

written := 0
for written < total {
    n, err := w.Writev(fd, bufs)
    if err != nil { return err }
    written += n
    bufs = advance(bufs, n) // skip fully-written iovecs
}

fd ownership

The library never takes ownership of the file descriptor passed to Writev / Write. The caller retains fd lifecycle responsibility: keep the fd open for the duration of each call and close it when finished. Closing the fd during a call is undefined behaviour.

Concurrency

  • Package-level iouring.Writev is safe for concurrent use via an internal mutex on the default writer.
  • (*Writer).Writev / .Write are NOT safe for concurrent calls; serialise them in the caller.
  • (*Writer).Close is idempotent and safe to call from any goroutine relative to itself — but not concurrently with a Writev. Serialise the close against any in-flight writes.

For higher parallel throughput than the default mutex allows, construct one Writer per producer with New().

Benchmarks

On an AMD Ryzen 9 7950X, Linux kernel 6.14, /dev/shm target, 256-byte events, zero allocations across every path:

Strategy Batch ns/op MB/s
iouring 1 6 404 40
iouring 1024 100 242 2 615
writev 1 591 433
writev 1024 89 274 2 936

On tmpfs, syscall.writev(2) actually beats io_uring at every batch size because the SQE/CQE ring management costs more than a single writev(2) does for page-cache writes. io_uring earns its place on real disks where submissions can overlap with in-flight I/O — a regime tmpfs cannot simulate. Run the benchmarks on your production storage target to see the real delta.

Full matrix in BENCHMARKS.md in the parent repo.

Correctness notes

  • CQE reordering. On Linux 6.x kernels with IORING_FEAT_NATIVE_WORKERS, the kernel is permitted to post CQEs in a different order than submissions. The library tags every SQE with a monotonic UserData counter and scans the CQ for the matching tag, discarding any earlier completions it encounters. This makes the library robust against kernel reordering while preserving the single-goroutine contract.
  • IORING_FEAT_NODROP required. A full CQ on a kernel without NODROP silently drops completions — an unacceptable failure mode for audit-grade writers. New fails fast on kernels that don't advertise NODROP (pre-5.5).
  • Ring poisoning on unrecoverable errors. A failed io_uring_enter leaves the SQE published to the kernel with no safe way to drain its future completion; the library sets a permanent closed flag and returns ErrClosed from subsequent calls. Recovery requires Close + New.

Platform support

Platform Strategy Notes
linux/amd64 iouring / writev 5.5+ with NODROP for iouring
linux/arm64 iouring / writev Same as amd64
darwin/arm64 writev BSD writev(2)
darwin/amd64 writev BSD writev(2)
freebsd, openbsd writev Portable writev(2)
windows unsupported New returns ErrUnsupported

Stability

v0.x. The public API is small (12 exported symbols) and intentionally minimal. Future versions may add additional opcodes (fsync, read) additively; the current surface is stable.

License

Apache 2.0. See LICENSE.

Status and extraction

This library is developed in-repo as a submodule of github.com/axonops/audit; the source of truth today lives at audit/iouring/. Extraction to a standalone github.com/axonops/iouring repository is tracked as audit issue #674; at that point the package import path and API will remain unchanged, and this submodule will be removed from the parent repo in favour of a require line.

Documentation

Overview

Package iouring is a vectored-writer library for append-heavy log and WAL workloads. It uses Linux io_uring on capable kernels and falls back transparently to writev(2) on other Unix platforms. The selection is internal; callers do not need to probe the kernel or branch on platform.

Quick start

The zero-ceremony form — use it when you want to write bytes and do not care about strategy selection or lifecycle:

f, err := os.OpenFile("audit.log",
    os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o600)
if err != nil {
    return err
}
defer f.Close()

n, err := iouring.Writev(int(f.Fd()), [][]byte{a, b, c})

No construction, no Close, no options. The package holds a lazily-initialised default Writer that serves every call. It is safe for concurrent use (internally serialised) at the cost of an uncontended mutex per call.

Explicit instance

Construct your own Writer when you need a dedicated instance — typically to avoid sharing the default mutex under heavy concurrency, to force a specific Strategy in tests, or to attach a slog.Logger for startup diagnostics:

w, err := iouring.New()
if err != nil {
    return err
}
defer w.Close()
_, err = w.Writev(int(f.Fd()), bufs)

A Writer returned by New is NOT safe for concurrent use; callers must serialise Writer.Writev, Writer.Write, and Writer.Close. The race detector catches violations.

Strategies

A Writer picks one of two strategies at construction:

  • StrategyIouring — Linux 5.5+ with IORING_FEAT_NODROP; the fastest path for batched writes.
  • StrategyWritev — syscall writev(2) on Unix; portable and allocation-free.

On platforms without any vectored-write primitive (currently Windows), New returns an error wrapping ErrUnsupported.

The negotiated strategy is reported by Writer.Strategy and is stable for the writer's lifetime. Callers that want to test specifically for io_uring can call IouringSupported — but in the common case the package-level Writev handles everything.

File-descriptor contract

The library never takes ownership of the file descriptor passed to Writev / Writer.Writev / Writer.Write. The caller is responsible for keeping the fd open for the duration of each call and closing it when finished; closing the fd during a call is undefined behaviour.

Concurrency

The package-level Writev is safe for concurrent use via an internal mutex on the default writer. For higher throughput under heavy contention, construct one Writer per producer goroutine with New and serialise access to each writer.

Short writes

All Writev entry points may return (n, nil) with n less than the sum of buffer lengths (for example when the destination is a pipe whose buffer is smaller than the iovec total). Callers are responsible for retrying the remainder. Byte counting follows writev(2): count is total bytes written across all iovecs, so advancing past a short write means skipping completed iovecs and slicing the first incomplete one. A minimal retry loop looks like:

written := 0
for written < total {
    n, err := iouring.Writev(fd, bufs)
    if err != nil {
        return err
    }
    written += n
    bufs = advance(bufs, n) // skip fully-written iovecs
}

Platform support

v0.x supports Linux (io_uring or writev), Darwin (writev), and the *BSDs (writev). Windows callers receive ErrUnsupported from New and from the package-level Writev. Applications needing Windows coverage should fall back to a buffered os.File.Write loop at their layer.

Dependencies

This package has no runtime dependencies beyond the Go standard library. It is safe to vendor into audit-sensitive deployments.

Stability

v0.x exports writev-style submissions only. Future versions may add additional opcodes additively; the current surface is intentionally minimal.

Index

Examples

Constants

View Source
const MaxIovecs = 1024

MaxIovecs is the maximum number of buffers accepted by a single Writev / Writer.Writev call. It matches the cross-Unix POSIX minimum — Linux `UIO_MAXIOV`, Darwin/FreeBSD/NetBSD/OpenBSD `IOV_MAX` — each defined as 1024. Callers batching more buffers must split the batch across multiple calls.

Variables

View Source
var (
	// ErrUnsupported is returned when no vectored-write strategy
	// is available on the current platform, or when a caller
	// explicitly requests a strategy that is unavailable here.
	// Platform causes include:
	//   - non-Unix platforms (Windows) — no writev primitive;
	//   - [WithStrategy]([StrategyIouring]) on a non-Linux host;
	//   - [WithStrategy]([StrategyIouring]) on a Linux kernel
	//     older than 5.5 or without IORING_FEAT_NODROP.
	ErrUnsupported = errors.New("iouring: vectored I/O not supported on this platform")

	// ErrClosed is returned when an operation is attempted on a
	// [Writer] that has already been closed. The package-level
	// [Writev] never returns ErrClosed; it returns [ErrUnsupported]
	// instead if the platform has no vectored-write support.
	ErrClosed = errors.New("iouring: writer is closed")
)

Sentinel errors returned by the package. Kernel errors (for example syscall.EAGAIN, syscall.EBADF) are returned unwrapped so callers can match them with errors.Is; the sentinels below identify library-originated conditions.

Functions

func IouringSupported

func IouringSupported() bool

IouringSupported reports whether the StrategyIouring path is available on this host. The first call performs a probe syscall (~200 μs) and caches the result. Callers using the default StrategyAuto do NOT need to call this — New and the package-level Writev handle negotiation internally.

On non-Linux platforms, IouringSupported always returns false.

func Writev

func Writev(fd int, bufs [][]byte) (int, error)

Writev writes bufs to fd using the process-wide default Writer. Safe for concurrent use via an internal mutex on the default writer. For the uncontended case the mutex overhead is ~20 ns; under heavy contention, construct a dedicated Writer per producer goroutine with New.

If the platform has no vectored-write support, Writev returns an error wrapping ErrUnsupported. Writev does not return ErrClosed — the default writer is never closed.

Example

Zero-ceremony: use the package-level Writev with a lazily- initialised default Writer. Safe for concurrent use.

package main

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/axonops/audit/iouring"
)

func main() {
	// Create an append-mode file to write to.
	path := filepath.Join(os.TempDir(), "iouring-example.log")
	f, err := os.OpenFile(path, os.O_CREATE|os.O_APPEND|os.O_WRONLY|os.O_TRUNC, 0o600)
	if err != nil {
		panic(err)
	}
	defer func() {
		_ = f.Close()
		_ = os.Remove(path)
	}()

	// Writev takes int, not uintptr — cast the os.File fd.
	n, err := iouring.Writev(int(f.Fd()), [][]byte{
		[]byte("hello "),
		[]byte("io_uring\n"),
	})
	if err != nil {
		panic(err)
	}

	fmt.Println("wrote", n, "bytes")
}
Output:
wrote 15 bytes

Types

type Option

type Option func(*config)

Option configures New. All options are optional; New with zero options uses sensible defaults.

func WithLogger

func WithLogger(l *slog.Logger) Option

WithLogger configures a slog.Logger that receives exactly one log line at construction indicating the selected strategy. Nil (the default) disables logging. Writers never log on the hot path — the logger is used at construction only.

func WithRingDepth

func WithRingDepth(n uint32) Option

WithRingDepth sets the io_uring submission-queue depth. Must be a power of two between 1 and 4096 inclusive. Ignored by strategies other than StrategyIouring. Defaults to 16.

Passing an out-of-range or non-power-of-two value is a programmer error; New will surface the validation error unwrapped.

func WithStrategy

func WithStrategy(s Strategy) Option

WithStrategy forces a specific Strategy. Passing StrategyIouring on a host without io_uring causes New to return an error wrapping ErrUnsupported. Passing StrategyWritev forces the writev path even on io_uring-capable hosts — useful for benchmarking and A/B testing. The default is StrategyAuto.

Passing an out-of-range Strategy value is a programmer error; New will return an error that does not wrap any sentinel.

type Strategy

type Strategy int

Strategy identifies the vectored-write path a Writer uses. The concrete strategy is chosen at construction and is stable for the Writer's lifetime; callers retrieve it with Writer.Strategy.

The Strategy.String values are part of the public contract — consumers log and alert on them, so they do not change lightly.

const (
	// StrategyAuto requests the best available strategy at
	// construction. It is the default when no [WithStrategy]
	// option is supplied. After construction, [Writer.Strategy]
	// returns the strategy that was actually selected
	// ([StrategyIouring] or [StrategyWritev]) — never
	// StrategyAuto.
	StrategyAuto Strategy = iota

	// StrategyIouring is the Linux io_uring fast path. Requires
	// kernel 5.5+ with IORING_FEAT_NODROP. Passing this to
	// [WithStrategy] and calling [New] on an unsupported host
	// returns an error wrapping [ErrUnsupported].
	StrategyIouring

	// StrategyWritev is the portable writev(2) path. Available
	// on all Unix platforms (Linux, Darwin, the *BSDs). Passing
	// this to [WithStrategy] forces the writev path even on
	// io_uring-capable hosts — useful for benchmarking the
	// fallback and for operational A/B testing.
	StrategyWritev
)

func (Strategy) String

func (s Strategy) String() string

String returns a stable lowercase name for the strategy. These strings are public contract — consumers log and alert on them, so they do not change lightly.

An out-of-range Strategy value renders as "unknown".

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer performs vectored writes to file descriptors using the best available platform strategy. Writer.Writev and Writer.Write are NOT safe for concurrent use; callers must serialise access, including against Writer.Close. A racing Writev and Close is undefined behaviour.

Writer.Close is safe to call concurrently from any goroutine relative to itself (Close is idempotent).

A Writer does not take ownership of the file descriptor passed to Writer.Writev / Writer.Write; the caller retains fd lifecycle responsibility. The fd must remain open for the duration of each call.

For a zero-ceremony, concurrent-safe entry point, use the package-level Writev instead of constructing a Writer.

func New

func New(opts ...Option) (*Writer, error)

New constructs a Writer with the given options. The default strategy is StrategyAuto, which prefers io_uring on capable Linux hosts and falls back to writev(2) on other Unix platforms. On platforms without any vectored-write support (currently Windows), New returns an error wrapping ErrUnsupported.

The returned Writer MUST be released with Writer.Close when no longer needed, regardless of which strategy it uses.

func (*Writer) Close

func (w *Writer) Close() error

Close releases any resources held by the Writer. Close is idempotent and safe to call concurrently with itself. Calling Close concurrently with Writer.Writev / Writer.Write is undefined behaviour; callers must serialise.

func (*Writer) Strategy

func (w *Writer) Strategy() Strategy

Strategy reports the negotiated strategy this Writer is using. The result is stable for the Writer's lifetime and is one of StrategyIouring or StrategyWritev — never StrategyAuto.

func (*Writer) Write

func (w *Writer) Write(fd int, buf []byte) (int, error)

Write is a single-buffer convenience wrapper around Writer.Writev.

func (*Writer) Writev

func (w *Writer) Writev(fd int, bufs [][]byte) (int, error)

Writev writes bufs to fd as a single vectored operation and blocks until completion. Returns bytes written and any error. See the package documentation for short-write and atomicity semantics.

Writev is NOT safe for concurrent use and must not race with Writer.Close. Zero-length elements within bufs are skipped; all-empty bufs return (0, nil) without touching the kernel. len(bufs) must not exceed MaxIovecs.

Example

Explicit Writer: construct your own instance when you want a dedicated logger, a forced strategy, or to avoid contending on the default writer's mutex.

package main

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/axonops/audit/iouring"
)

func main() {
	// Force the syscall.writev path for reproducibility.
	w, err := iouring.New(iouring.WithStrategy(iouring.StrategyWritev))
	if err != nil {
		panic(err)
	}
	defer func() { _ = w.Close() }()

	path := filepath.Join(os.TempDir(), "iouring-example-writer.log")
	f, err := os.OpenFile(path, os.O_CREATE|os.O_APPEND|os.O_WRONLY|os.O_TRUNC, 0o600)
	if err != nil {
		panic(err)
	}
	defer func() {
		_ = f.Close()
		_ = os.Remove(path)
	}()

	if _, err := w.Writev(int(f.Fd()), [][]byte{[]byte("one\n"), []byte("two\n")}); err != nil {
		panic(err)
	}

	got, _ := os.ReadFile(path)
	fmt.Print(string(got))
}
Output:
one
two

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL