sckit

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2026 License: MIT Imports: 17 Imported by: 0

README

sckit-go

Pure-Go binding to macOS ScreenCaptureKit. No cgo. Sub-20ms frames. Display, window, app, region, and exclude-list capture — one library, one CLI.

Go Reference


Why

In macOS 15+ Apple deprecated CGDisplayCreateImage — the path every Go screenshot library has used for a decade (kbinani/screenshot 2.3k⭐, go-vgo/robotgo 10.2k⭐, and friends). On macOS 26 (Tahoe) it's gone. The replacement is ScreenCaptureKit, which is all-async, ObjC-block-heavy, and historically ugly to call from Go.

sckit-go closes that gap:

  • No cgo. Uses ebitengine/purego to call a small companion ObjC dylib that ships inside the module via //go:embed. go get and you're done.
  • Universal binary. The embedded dylib runs on both Intel and Apple Silicon out of the box.
  • Modern APIs. Built on SCStream, SCShareableContent, SCScreenshotManager (macOS 14+).
  • Sub-20ms frame latency. Persistent streams hit the display refresh rate cap (~17ms at 60Hz, ~8ms at 120Hz).
  • Idiomatic Go. context.Context on every blocking call, io.Closer resource model, functional options, sealed Target interface.
  • OCR + pixel diff in the same kit. sckit.OCR(png) returns recognized text regions (Vision framework, on-device, ~50–200 ms). sckit.DiffImages(before, after, 16, 16) returns a token-cheap pixel-delta grid for verifying that an action actually changed the screen — no vision-LLM round-trip required for "did anything happen?"

30-second quickstart — the sckit CLI

No Go code required:

go install github.com/LocalKinAI/sckit-go/cmd/sckit@latest
sckit list displays
sckit list windows --all
sckit list apps --json

sckit capture display                          # main display → auto-named PNG
sckit capture display 2 -o ~/Desktop/disp.png
sckit capture window 28533 --no-cursor
sckit capture app com.google.Chrome            # all Chrome windows composed
sckit capture region 100 100 640 480 -o crop.png

sckit stream display -n 60                     # pull 60 frames, report p50/p95
sckit stream display --fps 30 -n 90
sckit stream window 28533 -n 30
sckit stream app com.google.Chrome --fps 10

sckit bench                                    # full benchmark suite
sckit version

Sample sckit bench output on M-series, 1920×1080 display, macOS 26.3:

1. One-shot display capture
   min=130ms  avg=151ms  p50=132ms  p95=225ms

2. Stream open (cold)
   min=80ms   avg=82ms   p50=81ms   p95=85ms

3. Stream steady-state at 60 fps
   min=16ms   avg=17.3ms p50=17.4ms p95=18.2ms
   target = 16.7ms/frame

5. BGRA→RGBA conversion (1920×1080)
   min=2.2ms  avg=2.4ms  p50=2.4ms  p95=2.8ms

Install as a library

go get github.com/LocalKinAI/sckit-go

That's it. The ObjC companion dylib (~147 KB, universal arm64+x86_64) is embedded via //go:embed and auto-extracts to ~/Library/Caches/sckit-go/<content-hash>/libsckit_sync.dylib on first use. No make, no CGO_ENABLED, no PATH juggling.

Power users shipping custom-built or patched dylibs can override:

sckit.DylibPath = "/usr/local/lib/libsckit_sync.dylib"

Permission. First use triggers a macOS "Screen Recording" TCC prompt. Grant it in System Settings → Privacy & Security → Screen Recording, then re-run.


Usage

One-shot screenshot
package main

import (
    "context"
    "github.com/LocalKinAI/sckit-go"
)

func main() {
    ctx := context.Background()
    displays, _ := sckit.ListDisplays(ctx)
    sckit.CaptureToFile(ctx, displays[0], "screenshot.png")
}

Prefer raw image.Image?

img, _ := sckit.Capture(ctx, displays[0])
png.Encode(w, img)
Capture a single window
windows, _ := sckit.ListWindows(ctx)
for _, w := range windows {
    if w.OnScreen && w.App == "Google Chrome" {
        sckit.CaptureToFile(ctx, w, "chrome.png")
        break
    }
}
Capture an entire app (all its windows composed)
chrome := sckit.App{BundleID: "com.google.Chrome"}
sckit.CaptureToFile(ctx, chrome, "chrome.png")
Capture a region (cropped)
region := sckit.Region{
    Display: displays[0],
    Bounds:  image.Rect(100, 100, 900, 700),  // 800×600 crop
}
sckit.CaptureToFile(ctx, region, "crop.png")
Exclude specific windows (hide your own app, etc.)
myWindow := windows[0] // the window you want masked out
target := sckit.Exclude{
    Target:  displays[0],
    Windows: []sckit.Window{myWindow},
}
sckit.CaptureToFile(ctx, target, "desktop-minus-me.png")
Persistent stream (agents, UI automation, mirroring)
stream, err := sckit.NewStream(ctx, displays[0],
    sckit.WithFrameRate(60),
    sckit.WithCursor(true),
)
if err != nil { log.Fatal(err) }
defer stream.Close()

for {
    frameCtx, cancel := context.WithTimeout(ctx, time.Second)
    img, err := stream.NextFrame(frameCtx)
    cancel()
    if errors.Is(err, sckit.ErrTimeout) { continue }
    if err != nil { log.Fatal(err) }
    analyze(img) // *image.RGBA — fresh copy each call
}
Zero-copy BGRA (hot loop)

NextFrame allocates an 8MB RGBA buffer per 4K frame. In hot loops where you'll JPEG-encode or send to a GPU anyway, use NextFrameBGRA:

frame, _ := stream.NextFrameBGRA(ctx)
// frame.Pixels is B,G,R,A,... — valid only until the NEXT call on this Stream
gpuUpload(frame.Pixels, frame.Width, frame.Height)
Channel-style convenience
frames, errs := stream.Frames(ctx)
for img := range frames {
    process(img)
}
if err := <-errs; err != nil { log.Fatal(err) }

What can I capture?

Five Target types, all interchangeable in Capture and NewStream:

Target What it is Example
Display{ID} A whole display Display{ID: 2}
Window{ID} A single window Window{ID: 28533}
App{BundleID} All windows of an app, composed App{BundleID: "com.google.Chrome"}
Region{Display, Bounds} A rectangle within a display Region{Display: d, Bounds: image.Rect(0, 0, 800, 600)}
Exclude{Target, Windows} Wrap any target, mask windows out Exclude{Target: d, Windows: []Window{myWin}}

The Target interface is sealed (unexported method) — only types in this package can satisfy it. This lets us evolve the C-boundary filter shape without worrying about external implementors.


Options

Functional options apply to both Capture and NewStream:

sckit.WithResolution(1920, 1080) // default: target's native size
sckit.WithFrameRate(30)          // streams only, default 60; display-refresh capped
sckit.WithCursor(false)          // default: true
sckit.WithColorSpace(sckit.ColorSpaceDisplayP3)  // default: sRGB
sckit.WithQueueDepth(5)          // SCStream internal buffer count, default 3

Benchmarks

On M-series Mac, 1920×1080 display, macOS 26.3:

Operation p50 p95 Notes
NextFrame steady-state @ 60 fps 17.4 ms 18.2 ms = 1/60s display cap
NextFrame steady-state @ 30 fps 34.0 ms 41.0 ms exactly as configured
NextFrame steady-state @ 10 fps 100.9 ms 102.0 ms exactly as configured
NewStream (cold open) 81 ms 85 ms first call pays ObjC + WindowServer handshake
Capture(Display) one-shot 132 ms 225 ms includes SCShareableContent enumeration
Capture(Window) one-shot 89 ms 108 ms SCScreenshotManager + BGRA copy
ListDisplays 45 ms 75 ms enumerates displays only
ListWindows 40 ms 60 ms with string pool serialization
BGRA→RGBA (pure Go, 1920×1080) 2.4 ms 2.8 ms one conversion per NextFrame

The NextFrame p50 floor is the display refresh interval — no library can go faster than the hardware. On a ProMotion display at 120Hz the same code hits ~8 ms. Use NextFrameBGRA to skip the 2.4ms conversion when you don't need image.Image.

Stability: 3-minute test with stream reopens every 45s produces +72 KB heap growth total. A make stability-24h gate runs the full 24-hour leak detector before every release.


Architecture

  Go code
    │
    │  purego.RegisterLibFunc  (no cgo, no compiler toolchain needed downstream)
    ▼
  libsckit_sync.dylib  (~147KB universal, //go:embed'd)
    │
    │  11 plain C-ABI functions
    │  dispatch_semaphore wraps async block APIs
    ▼
  ScreenCaptureKit.framework + AppKit (CGS init)
Exported C functions (from objc/sckit_sync.m)
Function Purpose
sckit_list_displays Enumerate attached displays
sckit_list_windows Enumerate windows + app/title/bundle strings
sckit_capture_display One-shot screenshot of a display
sckit_capture_window One-shot screenshot of a single window
sckit_capture_app One-shot screenshot of an app's composed windows
sckit_stream_start Open persistent stream for a display
sckit_window_stream_start Open persistent stream for a window
sckit_app_stream_start Open persistent stream for an app
sckit_stream_dims Report effective capture width/height
sckit_stream_next_frame Block until next frame, copy BGRA out
sckit_stream_stop Tear down stream

Each one uses dispatch_semaphore_create + signal + wait to turn ScreenCaptureKit's completion-handler async style into blocking sync calls Go can invoke directly. The stream sink is a 40-line ObjC class implementing SCStreamOutput; it filters on SCStreamFrameInfoStatus so Idle/Blank frames re-deliver the last Complete buffer (the right semantics for static-screen capture).

See docs/API_DESIGN.md for the full design rationale, and docs/adr/ for the decision log.

Why not pure purego (no dylib at all)?

You can call SCShareableContent class methods from Go via purego/objc, but the methods take ObjC ^(args...) blocks as callbacks. purego has experimental block support, but wiring up delegate protocol conformance (SCStreamOutput), bridging CMSampleBuffer, and locking CVPixelBuffer from Go is ~500 lines of fragile boilerplate. A ~900-line dylib is smaller than the alternative, faster to audit, and compiles once.


Status

v0.3.0 (released 2026-05-07) — capture, OCR, and pixel-diff all shipped. Five target kinds, persistent + one-shot capture, OCR via Vision framework, DiffImages for token-cheap action verification. API stable; SemVer-protected from here.

Test Count Pass Coverage
Unit tests 50+ (pure Go)
Integration tests 19 (needs TCC permission)
go test -cover main package 78.8%
staticcheck ✅ 0 warnings
golangci-lint (9 linters) ✅ 0 issues
3-min stability (stream reopens × 4) ✅ +72 KB heap
Platform matrix
Platform Arch Status
macOS 26 (Tahoe) arm64 ✅ Primary dev target
macOS 15 (Sequoia) arm64 Expected to work (CI target)
macOS 14 (Sonoma) arm64 Expected to work (CI target)
macOS 15/14 x86_64 Universal dylib ships x86_64; untested on real hardware
macOS 13 and earlier any ❌ SCScreenshotManager requires macOS 14+

CI runs on macos-14 + macos-15 GitHub Actions runners.


Roadmap

v0.1.0 — Capture (shipped 2026-04-22)
  • Display / window / app / region / exclude capture
  • Display / window / app streaming
  • go:embed dylib + universal binary
  • Functional options + context.Context on every blocking call
  • Zero-copy NextFrameBGRA + channel adapter Stream.Frames
  • 43 unit + 19 integration tests, 78.8% coverage
  • sckit CLI with list, capture, stream, bench, version
  • GitHub Actions CI (macOS 14 + 15)
  • golangci-lint 0 warnings, stability test harness
v0.2.0 — On-device OCR (shipped 2026-04-29)
  • sckit.OCR(imageBytes []byte) ([]TextRegion, error) via VNRecognizeTextRequest (Vision framework)
  • Top-left origin coordinates (matches CGImage / drawing convention)
  • Recognition level: Accurate; language correction: on
  • No additional dylib export — same companion lib
v0.3.0 — Pixel-grid diff (shipped 2026-05-07)
  • sckit.DiffImages(a, b, rows, cols) (*DiffGrid, error) — mean-abs-delta per cell, 0–255 scale
  • DiffGrid.Dirty(threshold) / BoundingBox(threshold) / Render(threshold) (ASCII heatmap for LLM prompts)
  • Pure Go — no dylib changes, lifted from kinclaw skill helpers
v0.4.0 (planned) — Performance + recording
  • Hardware H.264/HEVC encoding via VideoToolbox
  • io.Writer streaming: stream.RecordTo(w, duration) → mp4
  • SIMD BGRA→RGBA via golang.org/x/sys/cpu
  • Benchmark suite in /benchmarks with tracked history
v0.5.0 (planned) — Audio + cancellation
  • SCStreamOutputTypeAudio capture
  • Synchronized A/V streams (PCM + AAC)
  • ctx.Cancel triggers in-flight dylib abort (sckit_stream_cancel)
v1.0.0 — Stable
  • API frozen for 2+ months without breaking changes
  • 100+ external consumers or 500+ stars
  • Programmatic TCC permission request flow
  • Featured in awesome-go / Go Weekly

Comparison: sckit-go vs screenpipe vs kbinani

sckit-go screenpipe kbinani/screenshot
Language Go Rust Go
macOS 15+ support ❌ (broken; API removed)
Scope Library (capture only) Full product (capture + OCR + DB + audio + query) Library (capture only)
Install go get Install app + Rust go get (but broken)
cgo required ❌ (purego) N/A
Window capture
App capture
Region capture (via cropping)
Exclude lists ?
Audio capture ❌ (v0.5)
OCR / text extraction ✅ (v0.2 — Vision framework)
Pixel-grid diff ✅ (v0.3 — DiffImages)
24/7 persistent DB ❌ (out of scope)
License MIT NOASSERTION (custom) MIT
Repo size ~500 KB 407 MB ~200 KB
Go ecosystem native ✅ (was)

sckit-go is Layer 1 (primitive capture). screenpipe is Layer 4 (end-user product). We are complementary, not competitors — the right outcome is for future Go-based products like screenpipe to build on top of sckit-go.


Development

git clone https://github.com/LocalKinAI/sckit-go
cd sckit-go

make help              # list all targets
make dylib             # build universal libsckit_sync.dylib
make build             # go build ./...
make test              # unit tests only
make verify            # build + vet + one capture (CI-style smoke)
make examples          # run every example program
make stability-test    # 10-minute leak detector
make stability-24h     # full pre-release gate (24 hours)
make cli               # build ./sckit CLI binary
make install-cli       # install sckit to $GOBIN
Running tests
# Pure unit tests — no permissions required, runs anywhere:
go test -count=1 ./...

# Integration tests — require Screen Recording permission:
go test -tags integration -count=1 ./...

# Coverage:
go test -tags integration -count=1 -coverprofile=cov.out .
go tool cover -html=cov.out
Linting
go vet ./...
staticcheck ./...                  # go install honnef.co/go/tools/cmd/staticcheck@latest
golangci-lint run                  # https://golangci-lint.run/welcome/install/

License

MIT — see LICENSE. Contributions welcome under the same license.

See CONTRIBUTING.md before filing issues or PRs. See SECURITY.md for security-related reports. See docs/API_DESIGN.md + docs/adr/ for design rationale and historical decisions.


Built by LocalKin AI as the capture layer for KinClaw — open-sourced so nobody else has to rewrite the ScreenCaptureKit binding from scratch.

Documentation

Overview

Package sckit is a pure-Go binding to macOS ScreenCaptureKit.

sckit provides the modern replacement for the deprecated [CGDisplayCreateImage] path removed in macOS 15+. It uses github.com/ebitengine/purego plus a small companion ObjC dylib — no cgo required in downstream projects.

Quick start

displays, _ := sckit.ListDisplays(ctx)
img, _ := sckit.Capture(ctx, displays[0])
png.Encode(w, img)

Persistent stream

stream, _ := sckit.NewStream(ctx, displays[0], sckit.WithFrameRate(60))
defer stream.Close()
for {
    img, err := stream.NextFrame(ctx)
    if err != nil { break }
    process(img)
}

Targets

Every capture function takes a Target describing what to record. Values of Display, Window, App, Region, and Exclude all satisfy Target. The interface is sealed; only types in this package can implement it.

Requirements

macOS 14 (Sonoma) or newer. First use triggers the "Screen Recording" TCC prompt; grant the permission in System Settings → Privacy & Security → Screen Recording, then rerun.

Dylib placement

sckit ships a universal (arm64+x86_64) companion dylib via go:embed. On the first call into the package, the embedded bytes are extracted to ~/Library/Caches/sckit-go/<hash>/libsckit_sync.dylib and Dlopened from there — downstream users never need to manage the dylib themselves. Set DylibPath to a non-empty value before the first call if you ship a custom-built or patched dylib.

Index

Constants

View Source
const Version = "0.3.0"

Version is the semantic-version tag of this package. Kept in sync with git tags; updated per release.

Variables

View Source
var DylibPath = ""

DylibPath is an optional override for the location of libsckit_sync.dylib.

Default behavior (empty DylibPath): sckit extracts its embedded copy of the dylib to the user's cache directory (~/Library/Caches/sckit-go/<hash>/ on macOS) on first use, then Dlopens from there. Downstream users never need to manage the dylib themselves.

Set DylibPath to a non-empty string BEFORE the first call into this package if you ship a custom-built or patched dylib. Must be set before Load — subsequent changes are ignored (Load caches its result).

View Source
var ErrDisplayNotFound = errors.New("sckit: display not found")

ErrDisplayNotFound is returned when a target Display.ID does not match any currently-attached display.

View Source
var ErrNotImplemented = errors.New("sckit: not implemented in this version")

ErrNotImplemented is returned for Target kinds not yet implemented in this release (e.g. Window or App targets before v0.2.0).

View Source
var ErrPermissionDenied = errors.New("sckit: screen recording permission denied")

ErrPermissionDenied is returned when macOS Screen Recording permission has not been granted. Direct users to System Settings → Privacy & Security → Screen Recording.

View Source
var ErrStreamClosed = errors.New("sckit: stream closed")

ErrStreamClosed is returned when a method is called on a Stream after Close.

View Source
var ErrTimeout = errors.New("sckit: timeout")

ErrTimeout is returned when a blocking call exceeded its deadline with no data available (not to be confused with context cancellation, which returns ctx.Err()).

Functions

func Capture

func Capture(ctx context.Context, target Target, opts ...Option) (image.Image, error)

Capture takes a single screenshot of the given target and returns it as an image.Image (concretely an *image.RGBA).

Internally this uses SCScreenshotManager (macOS 14+). Supported targets: Display, Window. App and Region return ErrNotImplemented and arrive in v0.2.0.

func CaptureToFile

func CaptureToFile(ctx context.Context, target Target, path string, opts ...Option) error

CaptureToFile captures a single screenshot and writes it to path. The output format is chosen by the file extension; currently only .png is supported.

func Load

func Load() error

Load explicitly loads the companion dylib. It's idempotent: subsequent calls return the same cached error (or nil).

Resolution order:

  1. If DylibPath is non-empty, use it (user override).
  2. Otherwise, extract the embedded universal dylib to the user's cache directory (~/Library/Caches/sckit-go/<sha256-prefix>/libsckit_sync.dylib on macOS) and Dlopen from there. Extraction is skipped if a file with the matching hash is already present.

Load is called automatically by every public function; the exported form exists so applications can fail fast at startup rather than on the first capture.

func ResolvedDylibPath

func ResolvedDylibPath() string

ResolvedDylibPath returns the filesystem path that Load used (or would use) to Dlopen the dylib. Call after Load for the path actually loaded. Intended for debugging — e.g. telling a user where to check permissions.

Types

type App

type App struct {
	BundleID string // e.g. "com.google.Chrome" — required for capture
	Name     string // display name, e.g. "Google Chrome"
	PID      int32
}

App describes a running application as a capture target. Capturing an App records all of its on-screen windows composed together on a single display (auto-picked as the display owning the largest share of the app's windows).

func ListApps

func ListApps(ctx context.Context) ([]App, error)

ListApps enumerates applications with at least one on-screen window. The result is derived from ListWindows — deduplicated by bundle identifier. BundleID may be empty for privileged system processes.

type ColorSpace

type ColorSpace int

ColorSpace identifies a color space for captured frames.

const (
	// ColorSpaceSRGB is the standard sRGB color space. Default.
	ColorSpaceSRGB ColorSpace = iota
	// ColorSpaceDisplayP3 is Apple's wide-gamut Display P3.
	ColorSpaceDisplayP3
	// ColorSpaceBT709 is the Rec. 709 HD video color space.
	ColorSpaceBT709
)

type DiffGrid added in v0.3.0

type DiffGrid struct {
	// Cells holds the per-cell mean-abs-delta. Cells[r][c] is the
	// average per-pixel intensity delta in that cell. Rows × Cols
	// equals the grid resolution requested in [DiffImages].
	Cells [][]float64
	// Rows / Cols echo the requested resolution (so callers don't
	// have to len() the slice every time).
	Rows, Cols int
	// Bounds is the image rectangle the diff was computed over. Used
	// by [BoundingBox] to map grid cells back to display-local px.
	Bounds image.Rectangle
}

DiffGrid is the result of DiffImages. Cells is a row-major [rows][cols] matrix of mean-abs-delta values per grid cell (0..255 scale). Use DiffGrid.Dirty / DiffGrid.BoundingBox / DiffGrid.Render for the common downstream operations.

func DiffImages added in v0.3.0

func DiffImages(a, b image.Image, rows, cols int) (*DiffGrid, error)

DiffImages compares two images of the same dimensions over a rows × cols grid and returns mean-abs-delta of grayscale intensity per cell. Used as a token-cheap alternative to "ask a vision LLM to compare two screenshots" for action verification — change in any cell above a threshold means the world's reacted; absence means the click was ignored / page didn't update / element didn't appear.

Sampling: every 4th pixel in each axis (per-cell). Keeps diff fast on retina-resolution captures while still catching text-shaped changes (text edges average out at sub-cell scale).

Errors:

  • dimension mismatch: a and b must have identical bounds.

Typical use:

before, _ := sckit.Capture(ctx, target)
// … action happens here, optional sleep …
after, _ := sckit.Capture(ctx, target)
grid, err := sckit.DiffImages(before, after, 16, 16)
if grid.Dirty(8) > 0 {
    bbox, _ := grid.BoundingBox(8)
    fmt.Println("UI changed in", bbox)
}

16×16 is the common default — fine enough to localize one button's worth of change, coarse enough to ignore antialiasing noise.

func (*DiffGrid) BoundingBox added in v0.3.0

func (g *DiffGrid) BoundingBox(threshold float64) (image.Rectangle, bool)

BoundingBox returns the union rectangle of all cells whose value crosses threshold, mapped back to display-local px coordinates (using g.Bounds + cell stride). Returns ok=false when nothing's dirty — caller should report "no change" rather than draw an empty rect.

func (*DiffGrid) Dirty added in v0.3.0

func (g *DiffGrid) Dirty(threshold float64) int

Dirty returns the number of cells whose mean-abs-delta is at or above threshold (0..255). Use this as a cheap "did anything actually change?" check before calling [BoundingBox] / [Render].

func (*DiffGrid) Render added in v0.3.0

func (g *DiffGrid) Render(threshold float64) string

Render produces a textual heatmap of the grid for human / LLM inspection. '#' = dirty (≥ threshold), '.' = warm (≥ threshold/2), ' ' = quiet. One row per grid row, no header / footer — caller adds those if needed. Used by kinclaw's screen.diff_screenshots verb to send a token-cheap visual to the model.

type Display

type Display struct {
	ID     uint32
	Width  int
	Height int
	X, Y   int
}

Display describes an attached physical display. The ID field is a stable CGDirectDisplayID; positions (X, Y) are in the global coordinate space.

func ListDisplays

func ListDisplays(ctx context.Context) ([]Display, error)

ListDisplays returns all currently-attached displays.

type Exclude

type Exclude struct {
	Target  Target
	Windows []Window
}

Exclude wraps any Target and masks out a list of Windows from the captured output. Common use case: screenshotting your own app without including your own capture window in the result.

type Frame

type Frame struct {
	Pixels []byte
	Width  int
	Height int
}

Frame is a view into a Stream's internal BGRA buffer. It is valid only until the next call on the same Stream; do not retain.

Pixel layout: tightly-packed 32-bit BGRA, top-down, no row padding. Pixels has length Width*Height*4.

type Option

type Option func(*config)

Option configures a Capture or NewStream call. Options are applied in the order given; later options override earlier ones.

Options use the functional-options pattern so the API can evolve without breaking existing callers.

func WithColorSpace

func WithColorSpace(cs ColorSpace) Option

WithColorSpace selects the output color space. See ColorSpace. Default: ColorSpaceSRGB.

func WithCursor

func WithCursor(show bool) Option

WithCursor controls whether the hardware cursor is rendered into the captured frames. Default: true.

func WithFrameRate

func WithFrameRate(fps int) Option

WithFrameRate caps the maximum frame delivery rate. Applies to streams only; ignored by Capture. The effective rate is also bounded above by the display's refresh rate.

Default: 60.

func WithQueueDepth

func WithQueueDepth(n int) Option

WithQueueDepth sets the number of frame buffers the underlying SCStream keeps queued. Higher values tolerate consumer lag at the cost of memory; lower values reduce latency.

Default: 3. Valid range: 1–8.

func WithResolution

func WithResolution(width, height int) Option

WithResolution sets the output frame dimensions in pixels. Zero values (the default) mean "use the target's native resolution", which for a Display target means the CGDirectDisplay's pixel width and height.

Non-native values let the dylib downsample server-side, saving bandwidth over the Go/C boundary. Upsampling is not recommended.

type Region

type Region struct {
	Display Display
	Bounds  image.Rectangle
}

Region is a sub-rectangle of a Display, specified in display-local points.

Region target capture arrives in v0.2.0. For v0.1 use full-display capture and crop in Go.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

Stream is a persistent ScreenCaptureKit capture session. A Stream is NOT safe for concurrent use by multiple goroutines; protect with your own mutex if you fan frames out.

Always call Close when done — the underlying SCStream holds a connection to the WindowServer + ReplayKit daemon that will not release on its own.

func NewStream

func NewStream(ctx context.Context, target Target, opts ...Option) (*Stream, error)

NewStream opens a capture stream for the given Target. The call blocks until the underlying SCStream has started (or errored); subsequent frame retrieval is via Stream.NextFrame.

First frame typically arrives within ~150ms of this call returning.

Targets: Display and Window work. App, Region, and Exclude return ErrNotImplemented; those arrive in v0.2.0.

func (*Stream) Close

func (s *Stream) Close() error

Close shuts down the stream and releases all associated resources. Safe to call multiple times and from any goroutine.

func (*Stream) Frames

func (s *Stream) Frames(ctx context.Context) (<-chan image.Image, <-chan error)

Frames returns a convenience channel that delivers frames from the Stream until ctx is canceled or an error occurs. The returned error channel is closed after the frame channel; it yields at most one value (nil on clean ctx-cancel).

A single goroutine is spawned to drive NextFrame; if the consumer falls behind, frames are dropped inside the dylib, not buffered here.

Frames is built on top of [NextFrame] for the common producer pattern and is intentionally lightweight — for full control, call NextFrame directly.

func (*Stream) Height

func (s *Stream) Height() int

Height returns the effective frame height in pixels.

func (*Stream) NextFrame

func (s *Stream) NextFrame(ctx context.Context) (image.Image, error)

NextFrame blocks until the next frame is available, then returns it as a freshly-allocated image.Image (concretely *image.RGBA).

Cancellation: if ctx is canceled or its deadline elapses, NextFrame returns ctx.Err() as soon as the current underlying dylib call completes. v0.1 cannot abort an in-flight dylib call; mid-call cancellation lands with the sckit_stream_cancel dylib entry in v0.2.

func (*Stream) NextFrameBGRA

func (s *Stream) NextFrameBGRA(ctx context.Context) (Frame, error)

NextFrameBGRA returns the next frame as a zero-copy Frame pointing at the Stream's internal buffer. The buffer is overwritten on the next call — do not hold Frame.Pixels past the next NextFrame or NextFrameBGRA call.

Use this path in hot loops where the per-frame image.RGBA allocation and BGRA→RGBA conversion of [NextFrame] is a bottleneck (e.g. real-time VLM ingestion where the next step is JPEG-encoding the frame anyway).

func (*Stream) Width

func (s *Stream) Width() int

Width returns the effective frame width in pixels. May differ from a requested value if the target's native size was smaller.

type Target

type Target interface {
	// contains filtered or unexported methods
}

Target describes what to capture: a Display, Window, App, Region, or an Exclude composition. Target is a sealed interface — only types declared in this package can implement it. Construct targets with struct literals:

sckit.Display{ID: displayID}
sckit.Window{ID: windowID}
sckit.App{BundleID: "com.google.Chrome"}
sckit.Region{Display: d, Bounds: image.Rect(100, 100, 600, 400)}
sckit.Exclude{Target: t, Windows: toHide}

Target corresponds 1:1 to Apple's SCContentFilter abstraction.

type TextRegion added in v0.2.0

type TextRegion struct {
	Text       string  `json:"text"`
	X          int     `json:"x"`
	Y          int     `json:"y"`
	W          int     `json:"w"`
	H          int     `json:"h"`
	Confidence float64 `json:"conf"`
}

TextRegion is one piece of text Vision recognized in an image. The rectangle is in image-pixel coordinates, top-left origin (the convention CGImage / drawing systems use; Vision's native bottom-left coords have been converted).

func OCR added in v0.2.0

func OCR(imageBytes []byte) ([]TextRegion, error)

OCR runs macOS Vision framework's VNRecognizeTextRequest against the provided image bytes (PNG / JPEG / TIFF / BMP — anything NSImage can decode), returning recognized text regions.

Why use this instead of routing screenshots through a vision LLM:

  • Local, offline, free (no API call cost)
  • Fast: ~50-200ms per screen-sized image
  • Deterministic
  • Returns precise pixel-coord bounding boxes for each text region

When you need it: extracting the value displayed in a calculator, reading static labels in a canvas-rendered UI, dumping screen text for fuzzy match before deciding which UI element to click.

When NOT to use it: if you need *understanding* of the screen content (intent / structure / what to do next) — that's still a vision LLM job. OCR returns text + boxes, nothing more.

Recognition level is set to Accurate (slower but higher quality on noisy screen captures); language correction is on. There's no knob for these in v0.2.0 — opinionated default for the agent use case.

Requires macOS 11+ (Vision framework). Returns ([]TextRegion, nil) on success; (nil, error) on decode/recognize failure. An empty slice means no text was recognized.

type Window

type Window struct {
	ID       uint32
	App      string          // owning application name, e.g. "Google Chrome"
	BundleID string          // e.g. "com.google.Chrome"
	Title    string          // window title, may be empty
	Frame    image.Rectangle // in global point coordinates
	OnScreen bool
	Layer    int
	PID      int32 // owning process ID
}

Window describes an individual on-screen window. Enumerate via ListWindows. Capture via Capture with a Window target; stream via NewStream with a Window target.

func ListWindows

func ListWindows(ctx context.Context) ([]Window, error)

ListWindows enumerates windows visible to the capture system. The result includes off-screen, minimized, and menu-bar windows; filter with Window.OnScreen and Window.Layer if you want only the obvious user-facing ones.

Directories

Path Synopsis
cmd
example-app-capture command
example-app-capture — capture every on-screen window of a single application (by bundle ID) composed together.
example-app-capture — capture every on-screen window of a single application (by bundle ID) composed together.
example-capture command
example-capture — minimal screenshot using the public sckit API.
example-capture — minimal screenshot using the public sckit API.
example-region-capture command
example-region-capture — capture a sub-rectangle of a display.
example-region-capture — capture a sub-rectangle of a display.
example-stream command
example-stream — persistent capture, 30 frames, per-frame latency.
example-stream — persistent capture, 30 frames, per-frame latency.
example-window-capture command
example-window-capture — capture a single window by ID and write PNG.
example-window-capture — capture a single window by ID and write PNG.
example-window-list command
example-window-list — print every visible window on-screen, most obvious first (layer 0, onScreen=true).
example-window-list — print every visible window on-screen, most obvious first (layer 0, onScreen=true).
example-window-stream command
example-window-stream — persistent capture of a single window.
example-window-stream — persistent capture of a single window.
poc-capture command
poc-capture — one-shot screenshot via sckit_capture_display, saved as PNG.
poc-capture — one-shot screenshot via sckit_capture_display, saved as PNG.
poc-list-displays command
poc-list-displays — end-to-end test: Go → purego → libsckit_sync.dylib → ScreenCaptureKit → back to Go as a flat [] of display structs.
poc-list-displays — end-to-end test: Go → purego → libsckit_sync.dylib → ScreenCaptureKit → back to Go as a flat [] of display structs.
poc-stream command
poc-stream — persistent SCStream benchmark.
poc-stream — persistent SCStream benchmark.
sckit command
sckit — the canonical command-line interface for the sckit-go library.
sckit — the canonical command-line interface for the sckit-go library.
stability-test command
stability-test — long-running leak + regression detector.
stability-test — long-running leak + regression detector.
internal
dylib
Package dylib embeds the ObjC companion library so downstream users can simply `go get` sckit-go without building C code on their machine.
Package dylib embeds the ObjC companion library so downstream users can simply `go get` sckit-go without building C code on their machine.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL