dwcas

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 5, 2026 License: MIT Imports: 2 Imported by: 0

README

dwcas

Go Reference Go Report Card Codecov

Languages: English | 简体中文 | 日本語 | Español | Français

Portable 128-bit (double-word) compare-and-swap (CAS) primitive for Go.

At a glance

  • Single atomic 128-bit compare-and-swap (CAS) on amd64/arm64.
  • Intended for lock-free algorithms (versioned state, ABA mitigation, composite updates).
  • Arm64 uses an LSE pair-CAS fast path by default; LL/SC is available via build tag for compatibility.

Install

go get code.hybscloud.com/dwcas

Quick start

// New returns a 16-byte aligned *Uint128 suitable for 128-bit compare-and-swap.
v := dwcas.New(1, 2)

old := dwcas.Uint128{Lo: 1, Hi: 2}
newv := dwcas.Uint128{Lo: 3, Hi: 4}

prev, swapped := v.AcqRel(old, newv)
fmt.Println(prev, swapped, v.Lo, v.Hi)

APIs

Core type
  • type Uint128 struct { Lo uint64; Hi uint64 }
Compare-exchange methods

Each method on *dwcas.Uint128 is a 128-bit compare-and-swap in compare-exchange form.

All methods return:

  • prev: the value observed in memory at the time of the CAS attempt
  • swapped: true if the swap happened

Methods:

  • (*Uint128) Relaxed(old, new Uint128) (prev Uint128, swapped bool)
  • (*Uint128) Acquire(old, new Uint128) (prev Uint128, swapped bool)
  • (*Uint128) Release(old, new Uint128) (prev Uint128, swapped bool)
  • (*Uint128) AcqRel(old, new Uint128) (prev Uint128, swapped bool)
Allocation and placement
  • func New(lo, hi uint64) *Uint128
  • func CanPlaceAlignedUint128(p []byte, off int) bool
  • func PlaceAlignedUint128(p []byte, off int) (n int, u128 *Uint128)

Memory ordering

Ordering is specified per method, with success ordering potentially different from failure ordering:

Method Success Failure
dwcas.Relaxed relaxed relaxed
dwcas.Acquire acquire relaxed
dwcas.Release release relaxed
dwcas.AcqRel acq_rel relaxed

Notes:

  • Some backends are stronger than requested.
    • On amd64, LOCKed operations are at least acquire-release on both success and failure.
    • On arm64 default (LSE) builds, Release/AcqRel place a release barrier before the CAS, which also orders a failed attempt.
Manual barriers

In rare cases, a caller may need an explicit ordering edge outside the CAS128 primitives. dwcas provides three manual barriers:

  • dwcas.BarrierAcquire: arm64 emits DMB ISHLD; amd64 is a compiler barrier only.
  • dwcas.BarrierRelease: arm64 emits DMB ISHST; amd64 is a compiler barrier only.
  • dwcas.BarrierFull: arm64 emits DMB ISH; amd64 is a compiler barrier only.

Alignment requirement

The address of a *dwcas.Uint128 used with these methods must be 16-byte aligned. The pointer must also be non-nil.

  • Default builds perform no runtime checks.
  • Opt-in debug guard: build with -tags=dwcasdebug to panic on nil and misaligned pointers.

Use dwcas.New if you need a heap-allocated 16-byte aligned *Uint128.

If you need to place a *Uint128 inside a caller-provided byte buffer (for example, in a manually managed arena), use the placement helpers. Worst-case required remaining bytes from off is 31.

buf := make([]byte, 256)
off := 7

if !dwcas.CanPlaceAlignedUint128(buf, off) {
	panic("insufficient space")
}

n, v := dwcas.PlaceAlignedUint128(buf, off)
*v = dwcas.Uint128{Lo: 1, Hi: 2}

_, _ = v.Relaxed(dwcas.Uint128{Lo: 1, Hi: 2}, dwcas.Uint128{Lo: 3, Hi: 4})

off += n // n is in [16..31]

Supported architectures and backends

  • amd64: implemented via CMPXCHG16B.
  • arm64:
    • default: LSE pair-CAS (CASP family) (best performance).
    • opt-in LL/SC: -tags=dwcas_llsc (portable baseline via LDXP with STXP or STLXP).

Safety notes

dwcas uses unsafe and architecture-specific assembly.

  • Keep a *Uint128 reachable as a Go pointer. Do not convert it to uintptr and back, and do not store it in untracked memory.
  • A copied Uint128 value does not carry alignment guarantees. Alignment is a property of the address you pass to these methods.

License

MIT — see LICENSE.

©2025 Hayabusa Cloud Co., Ltd.

Documentation

Overview

Package dwcas provides a portable 128-bit (double-word) compare-and-swap (CAS) primitive for Go.

The core operations are compare-and-swap methods on Uint128 that perform a single atomic read-modify-write on a contiguous 16-byte value.

Intended use cases

  • Lock-free algorithms that need to atomically update a value and a version/tag (e.g. ABA mitigation via versioned pointers).
  • Composite state machines where two 64-bit words must move together.
  • Low-level runtime-like data structures (queues, stacks) where a single 128-bit CAS reduces coordination overhead.

Atomicity and memory ordering

Each method is a single atomic 128-bit compare-and-swap on supported architectures.

Return values

All methods return:

  • prev: the value observed in memory at the time of the CAS attempt
  • swapped: true if the swap happened

Ordering contracts

Success ordering vs failure ordering:

  • Relaxed: success = relaxed, failure = relaxed
  • Acquire: success = acquire, failure = relaxed
  • Release: success = release, failure = relaxed
  • AcqRel: success = acq_rel, failure = relaxed

Some architectures and backends may provide stronger ordering than requested. In particular:

  • amd64's LOCKed instructions are at least acquire-release for both success and failure.
  • arm64's default LSE backend uses CASPD plus explicit barriers; release-style methods place the release barrier before the CAS, which also orders a failed attempt.

Manual barriers

This package also exposes manual barriers (BarrierAcquire, BarrierRelease, BarrierFull) for callers who need an explicit ordering edge outside the CAS primitives. On arm64 they map to DMB ISH*; on amd64 they are compiler barriers (not MFENCE).

Alignment

The address of a *Uint128 passed to these methods MUST be 16-byte aligned. Misalignment is unsupported and may fault on some CPUs/instructions.

Helpers:

  • New returns a heap-allocated 16-byte aligned *Uint128.
  • CanPlaceAlignedUint128 / PlaceAlignedUint128 place a 16-byte aligned *Uint128 within a caller-provided byte buffer. The worst-case required remaining bytes from off is 31.

This package intentionally does not perform runtime alignment checks in normal builds. For a debug-only guard, build with `-tags=dwcasdebug` to make these methods panic when called with a misaligned pointer.

Architecture support

  • amd64: implemented via CMPXCHG16B.
  • arm64: implemented via either LSE pair-CAS (CASP family; default) or LL/SC (opt-in).
  • other architectures: building succeeds, but all CAS methods panic at runtime.

Arm64 backend selection

  • default: LSE pair-CAS (CASP family; CASPAL semantics)
  • opt-in: `-tags=dwcas_llsc` (LL/SC via LDXP with STXP or STLXP)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BarrierAcquire

func BarrierAcquire()

BarrierAcquire emits an acquire barrier.

This API is intentionally rare and expert-only. Prefer the ordering variants of Uint128 CAS methods when possible.

Semantics by architecture:

  • arm64: emits DMB ISHLD.
  • amd64: compiler barrier only (prevents compile-time reordering across the call). It is not an MFENCE and is not required for cache coherence.

func BarrierFull

func BarrierFull()

BarrierFull emits a full barrier.

This API is intentionally rare and expert-only. Prefer the ordering variants of Uint128 CAS methods when possible.

Semantics by architecture:

  • arm64: emits DMB ISH.
  • amd64: compiler barrier only (prevents compile-time reordering across the call). It is not an MFENCE and is not required for cache coherence.

func BarrierRelease

func BarrierRelease()

BarrierRelease emits a release barrier.

This API is intentionally rare and expert-only. Prefer the ordering variants of Uint128 CAS methods when possible.

Semantics by architecture:

  • arm64: emits DMB ISHST.
  • amd64: compiler barrier only (prevents compile-time reordering across the call). It is not an MFENCE and is not required for cache coherence.

func CanPlaceAlignedUint128

func CanPlaceAlignedUint128(p []byte, off int) bool

CanPlaceAlignedUint128 reports whether p has enough remaining capacity from off to place a 16-byte aligned *Uint128 at or after p[off].

Worst-case required remaining bytes from off is 31:

  • up to 15 bytes of padding to reach a 16-byte boundary
  • 16 bytes for the Uint128 itself

Types

type Uint128

type Uint128 struct {
	Lo uint64
	Hi uint64
}

Uint128 is a 16-byte value used with 128-bit compare-and-swap.

Layout is stable and contiguous in memory:

word 0: Lo
word 1: Hi

The address of a *Uint128 used with these methods MUST be 16-byte aligned.

Helpers:

func New

func New(lo, hi uint64) *Uint128

New returns a heap-allocated *Uint128 whose address is guaranteed to be 16-byte aligned.

This is primarily a convenience for algorithms that require 16-byte alignment but do not control allocator/layout details (e.g. when a Uint128 cannot be embedded into a manually-aligned struct).

Safety notes:

  • The returned pointer refers to Go-managed memory. Keep the *Uint128 reachable (do not convert it to uintptr and back).
  • Alignment is guaranteed, but only for the returned pointer itself. If you copy the value into another location, you must re-establish 16-byte alignment.

func PlaceAlignedUint128

func PlaceAlignedUint128(p []byte, off int) (n int, u128 *Uint128)

PlaceAlignedUint128 returns a *Uint128 placed within p, starting at or after p[off], such that the returned address is 16-byte aligned.

It returns n, the number of bytes the caller must "consume" starting from off to cover the aligned 16-byte region (including any alignment padding).

The caller is responsible for ensuring CanPlaceAlignedUint128 is true. If not, PlaceAlignedUint128 panics.

Safety notes:

  • The returned pointer refers to p's backing array. Keep p reachable.
  • Do not convert the returned pointer to uintptr and back.

func (*Uint128) AcqRel

func (p *Uint128) AcqRel(old, new Uint128) (prev Uint128, swapped bool)

AcqRel is a 128-bit compare-and-swap (CAS) with acquire-release ordering on success and relaxed ordering on failure.

It always returns:

  • prev: the value observed in memory at the time of the CAS attempt.
  • swapped: true if the swap happened.

Contract:

  • p must be non-nil.
  • p must be 16-byte aligned.
  • On unsupported architectures, AcqRel panics.

func (*Uint128) Acquire

func (p *Uint128) Acquire(old, new Uint128) (prev Uint128, swapped bool)

Acquire is a 128-bit compare-and-swap (CAS) with acquire ordering on success and relaxed ordering on failure.

It always returns:

  • prev: the value observed in memory at the time of the CAS attempt.
  • swapped: true if the swap happened.

Contract:

  • p must be non-nil.
  • p must be 16-byte aligned.
  • On unsupported architectures, Acquire panics.

func (*Uint128) Relaxed

func (p *Uint128) Relaxed(old, new Uint128) (prev Uint128, swapped bool)

Relaxed is a 128-bit compare-and-swap (CAS) with relaxed ordering on both success and failure.

It always returns:

  • prev: the value observed in memory at the time of the CAS attempt.
  • swapped: true if the swap happened.

Contract:

  • p must be non-nil.
  • p must be 16-byte aligned.
  • On unsupported architectures, Relaxed panics.

func (*Uint128) Release

func (p *Uint128) Release(old, new Uint128) (prev Uint128, swapped bool)

Release is a 128-bit compare-and-swap (CAS) with release ordering on success and relaxed ordering on failure.

It always returns:

  • prev: the value observed in memory at the time of the CAS attempt.
  • swapped: true if the swap happened.

Contract:

  • p must be non-nil.
  • p must be 16-byte aligned.
  • On unsupported architectures, Release panics.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL