Documentation
¶
Overview ¶
Package sckit is a pure-Go binding to macOS ScreenCaptureKit.
sckit provides the modern replacement for the deprecated [CGDisplayCreateImage] path removed in macOS 15+. It uses github.com/ebitengine/purego plus a small companion ObjC dylib — no cgo required in downstream projects.
Quick start ¶
displays, _ := sckit.ListDisplays(ctx) img, _ := sckit.Capture(ctx, displays[0]) png.Encode(w, img)
Persistent stream ¶
stream, _ := sckit.NewStream(ctx, displays[0], sckit.WithFrameRate(60))
defer stream.Close()
for {
img, err := stream.NextFrame(ctx)
if err != nil { break }
process(img)
}
Targets ¶
Every capture function takes a Target describing what to record. Values of Display, Window, App, Region, and Exclude all satisfy Target. The interface is sealed; only types in this package can implement it.
Requirements ¶
macOS 14 (Sonoma) or newer. First use triggers the "Screen Recording" TCC prompt; grant the permission in System Settings → Privacy & Security → Screen Recording, then rerun.
Dylib placement ¶
sckit ships a universal (arm64+x86_64) companion dylib via go:embed. On the first call into the package, the embedded bytes are extracted to ~/Library/Caches/sckit-go/<hash>/libsckit_sync.dylib and Dlopened from there — downstream users never need to manage the dylib themselves. Set DylibPath to a non-empty value before the first call if you ship a custom-built or patched dylib.
Index ¶
- Constants
- Variables
- func Capture(ctx context.Context, target Target, opts ...Option) (image.Image, error)
- func CaptureToFile(ctx context.Context, target Target, path string, opts ...Option) error
- func Load() error
- func ResolvedDylibPath() string
- type App
- type ColorSpace
- type DiffGrid
- type Display
- type Exclude
- type Frame
- type Option
- type Region
- type Stream
- func (s *Stream) Close() error
- func (s *Stream) Frames(ctx context.Context) (<-chan image.Image, <-chan error)
- func (s *Stream) Height() int
- func (s *Stream) NextFrame(ctx context.Context) (image.Image, error)
- func (s *Stream) NextFrameBGRA(ctx context.Context) (Frame, error)
- func (s *Stream) Width() int
- type Target
- type TextRegion
- type Window
Constants ¶
const Version = "0.3.0"
Version is the semantic-version tag of this package. Kept in sync with git tags; updated per release.
Variables ¶
var DylibPath = ""
DylibPath is an optional override for the location of libsckit_sync.dylib.
Default behavior (empty DylibPath): sckit extracts its embedded copy of the dylib to the user's cache directory (~/Library/Caches/sckit-go/<hash>/ on macOS) on first use, then Dlopens from there. Downstream users never need to manage the dylib themselves.
Set DylibPath to a non-empty string BEFORE the first call into this package if you ship a custom-built or patched dylib. Must be set before Load — subsequent changes are ignored (Load caches its result).
var ErrDisplayNotFound = errors.New("sckit: display not found")
ErrDisplayNotFound is returned when a target Display.ID does not match any currently-attached display.
var ErrNotImplemented = errors.New("sckit: not implemented in this version")
ErrNotImplemented is returned for Target kinds not yet implemented in this release (e.g. Window or App targets before v0.2.0).
var ErrPermissionDenied = errors.New("sckit: screen recording permission denied")
ErrPermissionDenied is returned when macOS Screen Recording permission has not been granted. Direct users to System Settings → Privacy & Security → Screen Recording.
var ErrStreamClosed = errors.New("sckit: stream closed")
ErrStreamClosed is returned when a method is called on a Stream after Close.
var ErrTimeout = errors.New("sckit: timeout")
ErrTimeout is returned when a blocking call exceeded its deadline with no data available (not to be confused with context cancellation, which returns ctx.Err()).
Functions ¶
func Capture ¶
Capture takes a single screenshot of the given target and returns it as an image.Image (concretely an *image.RGBA).
Internally this uses SCScreenshotManager (macOS 14+). Supported targets: Display, Window. App and Region return ErrNotImplemented and arrive in v0.2.0.
func CaptureToFile ¶
CaptureToFile captures a single screenshot and writes it to path. The output format is chosen by the file extension; currently only .png is supported.
func Load ¶
func Load() error
Load explicitly loads the companion dylib. It's idempotent: subsequent calls return the same cached error (or nil).
Resolution order:
- If DylibPath is non-empty, use it (user override).
- Otherwise, extract the embedded universal dylib to the user's cache directory (~/Library/Caches/sckit-go/<sha256-prefix>/libsckit_sync.dylib on macOS) and Dlopen from there. Extraction is skipped if a file with the matching hash is already present.
Load is called automatically by every public function; the exported form exists so applications can fail fast at startup rather than on the first capture.
func ResolvedDylibPath ¶
func ResolvedDylibPath() string
ResolvedDylibPath returns the filesystem path that Load used (or would use) to Dlopen the dylib. Call after Load for the path actually loaded. Intended for debugging — e.g. telling a user where to check permissions.
Types ¶
type App ¶
type App struct {
BundleID string // e.g. "com.google.Chrome" — required for capture
Name string // display name, e.g. "Google Chrome"
PID int32
}
App describes a running application as a capture target. Capturing an App records all of its on-screen windows composed together on a single display (auto-picked as the display owning the largest share of the app's windows).
type ColorSpace ¶
type ColorSpace int
ColorSpace identifies a color space for captured frames.
const ( // ColorSpaceSRGB is the standard sRGB color space. Default. ColorSpaceSRGB ColorSpace = iota // ColorSpaceDisplayP3 is Apple's wide-gamut Display P3. ColorSpaceDisplayP3 // ColorSpaceBT709 is the Rec. 709 HD video color space. ColorSpaceBT709 )
type DiffGrid ¶ added in v0.3.0
type DiffGrid struct {
// Cells holds the per-cell mean-abs-delta. Cells[r][c] is the
// average per-pixel intensity delta in that cell. Rows × Cols
// equals the grid resolution requested in [DiffImages].
Cells [][]float64
// Rows / Cols echo the requested resolution (so callers don't
// have to len() the slice every time).
Rows, Cols int
// Bounds is the image rectangle the diff was computed over. Used
// by [BoundingBox] to map grid cells back to display-local px.
Bounds image.Rectangle
}
DiffGrid is the result of DiffImages. Cells is a row-major [rows][cols] matrix of mean-abs-delta values per grid cell (0..255 scale). Use DiffGrid.Dirty / DiffGrid.BoundingBox / DiffGrid.Render for the common downstream operations.
func DiffImages ¶ added in v0.3.0
DiffImages compares two images of the same dimensions over a rows × cols grid and returns mean-abs-delta of grayscale intensity per cell. Used as a token-cheap alternative to "ask a vision LLM to compare two screenshots" for action verification — change in any cell above a threshold means the world's reacted; absence means the click was ignored / page didn't update / element didn't appear.
Sampling: every 4th pixel in each axis (per-cell). Keeps diff fast on retina-resolution captures while still catching text-shaped changes (text edges average out at sub-cell scale).
Errors:
- dimension mismatch: a and b must have identical bounds.
Typical use:
before, _ := sckit.Capture(ctx, target)
// … action happens here, optional sleep …
after, _ := sckit.Capture(ctx, target)
grid, err := sckit.DiffImages(before, after, 16, 16)
if grid.Dirty(8) > 0 {
bbox, _ := grid.BoundingBox(8)
fmt.Println("UI changed in", bbox)
}
16×16 is the common default — fine enough to localize one button's worth of change, coarse enough to ignore antialiasing noise.
func (*DiffGrid) BoundingBox ¶ added in v0.3.0
BoundingBox returns the union rectangle of all cells whose value crosses threshold, mapped back to display-local px coordinates (using g.Bounds + cell stride). Returns ok=false when nothing's dirty — caller should report "no change" rather than draw an empty rect.
func (*DiffGrid) Dirty ¶ added in v0.3.0
Dirty returns the number of cells whose mean-abs-delta is at or above threshold (0..255). Use this as a cheap "did anything actually change?" check before calling [BoundingBox] / [Render].
func (*DiffGrid) Render ¶ added in v0.3.0
Render produces a textual heatmap of the grid for human / LLM inspection. '#' = dirty (≥ threshold), '.' = warm (≥ threshold/2), ' ' = quiet. One row per grid row, no header / footer — caller adds those if needed. Used by kinclaw's screen.diff_screenshots verb to send a token-cheap visual to the model.
type Display ¶
Display describes an attached physical display. The ID field is a stable CGDirectDisplayID; positions (X, Y) are in the global coordinate space.
type Exclude ¶
Exclude wraps any Target and masks out a list of Windows from the captured output. Common use case: screenshotting your own app without including your own capture window in the result.
type Frame ¶
Frame is a view into a Stream's internal BGRA buffer. It is valid only until the next call on the same Stream; do not retain.
Pixel layout: tightly-packed 32-bit BGRA, top-down, no row padding. Pixels has length Width*Height*4.
type Option ¶
type Option func(*config)
Option configures a Capture or NewStream call. Options are applied in the order given; later options override earlier ones.
Options use the functional-options pattern so the API can evolve without breaking existing callers.
func WithColorSpace ¶
func WithColorSpace(cs ColorSpace) Option
WithColorSpace selects the output color space. See ColorSpace. Default: ColorSpaceSRGB.
func WithCursor ¶
WithCursor controls whether the hardware cursor is rendered into the captured frames. Default: true.
func WithFrameRate ¶
WithFrameRate caps the maximum frame delivery rate. Applies to streams only; ignored by Capture. The effective rate is also bounded above by the display's refresh rate.
Default: 60.
func WithQueueDepth ¶
WithQueueDepth sets the number of frame buffers the underlying SCStream keeps queued. Higher values tolerate consumer lag at the cost of memory; lower values reduce latency.
Default: 3. Valid range: 1–8.
func WithResolution ¶
WithResolution sets the output frame dimensions in pixels. Zero values (the default) mean "use the target's native resolution", which for a Display target means the CGDirectDisplay's pixel width and height.
Non-native values let the dylib downsample server-side, saving bandwidth over the Go/C boundary. Upsampling is not recommended.
type Region ¶
Region is a sub-rectangle of a Display, specified in display-local points.
Region target capture arrives in v0.2.0. For v0.1 use full-display capture and crop in Go.
type Stream ¶
type Stream struct {
// contains filtered or unexported fields
}
Stream is a persistent ScreenCaptureKit capture session. A Stream is NOT safe for concurrent use by multiple goroutines; protect with your own mutex if you fan frames out.
Always call Close when done — the underlying SCStream holds a connection to the WindowServer + ReplayKit daemon that will not release on its own.
func NewStream ¶
NewStream opens a capture stream for the given Target. The call blocks until the underlying SCStream has started (or errored); subsequent frame retrieval is via Stream.NextFrame.
First frame typically arrives within ~150ms of this call returning.
Targets: Display and Window work. App, Region, and Exclude return ErrNotImplemented; those arrive in v0.2.0.
func (*Stream) Close ¶
Close shuts down the stream and releases all associated resources. Safe to call multiple times and from any goroutine.
func (*Stream) Frames ¶
Frames returns a convenience channel that delivers frames from the Stream until ctx is canceled or an error occurs. The returned error channel is closed after the frame channel; it yields at most one value (nil on clean ctx-cancel).
A single goroutine is spawned to drive NextFrame; if the consumer falls behind, frames are dropped inside the dylib, not buffered here.
Frames is built on top of [NextFrame] for the common producer pattern and is intentionally lightweight — for full control, call NextFrame directly.
func (*Stream) NextFrame ¶
NextFrame blocks until the next frame is available, then returns it as a freshly-allocated image.Image (concretely *image.RGBA).
Cancellation: if ctx is canceled or its deadline elapses, NextFrame returns ctx.Err() as soon as the current underlying dylib call completes. v0.1 cannot abort an in-flight dylib call; mid-call cancellation lands with the sckit_stream_cancel dylib entry in v0.2.
func (*Stream) NextFrameBGRA ¶
NextFrameBGRA returns the next frame as a zero-copy Frame pointing at the Stream's internal buffer. The buffer is overwritten on the next call — do not hold Frame.Pixels past the next NextFrame or NextFrameBGRA call.
Use this path in hot loops where the per-frame image.RGBA allocation and BGRA→RGBA conversion of [NextFrame] is a bottleneck (e.g. real-time VLM ingestion where the next step is JPEG-encoding the frame anyway).
type Target ¶
type Target interface {
// contains filtered or unexported methods
}
Target describes what to capture: a Display, Window, App, Region, or an Exclude composition. Target is a sealed interface — only types declared in this package can implement it. Construct targets with struct literals:
sckit.Display{ID: displayID}
sckit.Window{ID: windowID}
sckit.App{BundleID: "com.google.Chrome"}
sckit.Region{Display: d, Bounds: image.Rect(100, 100, 600, 400)}
sckit.Exclude{Target: t, Windows: toHide}
Target corresponds 1:1 to Apple's SCContentFilter abstraction.
type TextRegion ¶ added in v0.2.0
type TextRegion struct {
Text string `json:"text"`
X int `json:"x"`
Y int `json:"y"`
W int `json:"w"`
H int `json:"h"`
Confidence float64 `json:"conf"`
}
TextRegion is one piece of text Vision recognized in an image. The rectangle is in image-pixel coordinates, top-left origin (the convention CGImage / drawing systems use; Vision's native bottom-left coords have been converted).
func OCR ¶ added in v0.2.0
func OCR(imageBytes []byte) ([]TextRegion, error)
OCR runs macOS Vision framework's VNRecognizeTextRequest against the provided image bytes (PNG / JPEG / TIFF / BMP — anything NSImage can decode), returning recognized text regions.
Why use this instead of routing screenshots through a vision LLM:
- Local, offline, free (no API call cost)
- Fast: ~50-200ms per screen-sized image
- Deterministic
- Returns precise pixel-coord bounding boxes for each text region
When you need it: extracting the value displayed in a calculator, reading static labels in a canvas-rendered UI, dumping screen text for fuzzy match before deciding which UI element to click.
When NOT to use it: if you need *understanding* of the screen content (intent / structure / what to do next) — that's still a vision LLM job. OCR returns text + boxes, nothing more.
Recognition level is set to Accurate (slower but higher quality on noisy screen captures); language correction is on. There's no knob for these in v0.2.0 — opinionated default for the agent use case.
Requires macOS 11+ (Vision framework). Returns ([]TextRegion, nil) on success; (nil, error) on decode/recognize failure. An empty slice means no text was recognized.
type Window ¶
type Window struct {
ID uint32
App string // owning application name, e.g. "Google Chrome"
BundleID string // e.g. "com.google.Chrome"
Title string // window title, may be empty
Frame image.Rectangle // in global point coordinates
OnScreen bool
Layer int
PID int32 // owning process ID
}
Window describes an individual on-screen window. Enumerate via ListWindows. Capture via Capture with a Window target; stream via NewStream with a Window target.
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
example-app-capture
command
example-app-capture — capture every on-screen window of a single application (by bundle ID) composed together.
|
example-app-capture — capture every on-screen window of a single application (by bundle ID) composed together. |
|
example-capture
command
example-capture — minimal screenshot using the public sckit API.
|
example-capture — minimal screenshot using the public sckit API. |
|
example-region-capture
command
example-region-capture — capture a sub-rectangle of a display.
|
example-region-capture — capture a sub-rectangle of a display. |
|
example-stream
command
example-stream — persistent capture, 30 frames, per-frame latency.
|
example-stream — persistent capture, 30 frames, per-frame latency. |
|
example-window-capture
command
example-window-capture — capture a single window by ID and write PNG.
|
example-window-capture — capture a single window by ID and write PNG. |
|
example-window-list
command
example-window-list — print every visible window on-screen, most obvious first (layer 0, onScreen=true).
|
example-window-list — print every visible window on-screen, most obvious first (layer 0, onScreen=true). |
|
example-window-stream
command
example-window-stream — persistent capture of a single window.
|
example-window-stream — persistent capture of a single window. |
|
poc-capture
command
poc-capture — one-shot screenshot via sckit_capture_display, saved as PNG.
|
poc-capture — one-shot screenshot via sckit_capture_display, saved as PNG. |
|
poc-list-displays
command
poc-list-displays — end-to-end test: Go → purego → libsckit_sync.dylib → ScreenCaptureKit → back to Go as a flat [] of display structs.
|
poc-list-displays — end-to-end test: Go → purego → libsckit_sync.dylib → ScreenCaptureKit → back to Go as a flat [] of display structs. |
|
poc-stream
command
poc-stream — persistent SCStream benchmark.
|
poc-stream — persistent SCStream benchmark. |
|
sckit
command
sckit — the canonical command-line interface for the sckit-go library.
|
sckit — the canonical command-line interface for the sckit-go library. |
|
stability-test
command
stability-test — long-running leak + regression detector.
|
stability-test — long-running leak + regression detector. |
|
internal
|
|
|
dylib
Package dylib embeds the ObjC companion library so downstream users can simply `go get` sckit-go without building C code on their machine.
|
Package dylib embeds the ObjC companion library so downstream users can simply `go get` sckit-go without building C code on their machine. |