dicta

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2026 License: Apache-2.0

README

dicta

CI Go Reference Go Report Card License

A Linux/Wayland-first voice dictation daemon written in pure Go.

dicta is two things:

  1. Type-mode — press Pause, talk, and the daemon types the transcribed text into whatever window has focus, committing each utterance on VAD silence. Press Pause again to stop.
  2. Clip-mode — press Scroll Lock, talk, and a small editable panel appears with the cleaned transcript. Press Enter to copy the buffer to the clipboard, Shift+Enter to insert a newline, Esc to cancel.

There is no PTT, no wakeword, no always-on listening. Capture starts when you press a key and stops when the session ends.

Status

Pre-1.0. The full v1 build (phases 1–13 of the design) is functional; this is the docs phase. Use it, but expect rough edges and please file issues.

Why

Speech-to-text is one of the few accessibility tools where Linux still has gaps. Existing options either depend on commercial cloud APIs, require Python toolchains and GPU model files, or assume X11. dicta is a single static Go binary that:

  • Runs anywhere Wayland and PipeWire run.
  • Talks to any Wyoming-protocol ASR server (faster-whisper et al.) by default — no model download in v1.
  • Optionally talks to a local whisper-server (subprocess-managed), or any OpenAI-compatible transcription endpoint.
  • Optionally cleans transcripts with any OpenAI-compatible LLM (llama.cpp's server, vLLM, OpenAI itself).

Architecture in one diagram

   ┌──────────────┐    ┌──────────────┐    ┌────────────────┐    ┌──────────────┐
   │   Pause /    │ →  │    dictad    │ →  │   asrclient    │ →  │  Wyoming /   │
   │ Scroll Lock  │    │              │    │  (Go module)   │    │  whispercpp/ │
   │ (compositor) │    │  audio + VAD │ ←  │                │ ←  │    OpenAI    │
   └──────────────┘    │  state mach. │    └────────────────┘    └──────────────┘
                       │  control sock│
                       │              │    ┌──────────────┐
                       │              │ →  │   ydotool    │  (type-mode)
                       │              │    └──────────────┘
                       │              │    ┌──────────────┐    ┌──────────────┐
                       │              │ ↔  │ dicta-preview│ →  │   wl-copy    │  (clip-mode)
                       │              │    │   (Gio UI)   │    └──────────────┘
                       └──────────────┘    └──────────────┘

dictad is the daemon (long-lived). dicta is a thin CLI that talks to the daemon over a Unix socket. dicta-preview is the clip-mode panel, spawned on demand. ydotoold and the ASR backend are external.

Quick start

1. Install build deps
# Ubuntu / Debian
sudo ./scripts/install-deps-ubuntu.sh

# Fedora
sudo ./scripts/install-deps-fedora.sh

# Arch
sudo ./scripts/install-deps-arch.sh

These install: Go 1.24+, the Gio system libraries (Wayland, xkbcommon, GLES, EGL, libvulkan, libXcursor) for the preview panel, ydotool, and wl-clipboard.

2. Build everything
task build:all

Produces bin/dictad, bin/dicta, and bin/dicta-preview.

3. Install into your home directory
task install:user

Installs to ~/.local/bin and drops the systemd user unit into ~/.config/systemd/user/.

4. Bring up an ASR backend

The default backend is Wyoming. You can run any Wyoming-compatible service — most users want wyoming-faster-whisper. A common setup is its Docker image listening on tcp://localhost:10300.

Other backends:

  • --asr-backend whispercpp — dicta supervises a local whisper-server subprocess. Requires you to install whisper.cpp/whisper-server and a model.
  • --asr-backend openai — point at any OpenAI-compatible /v1/audio/transcriptions endpoint. Requires an API key.

See CONFIGURATION.md for every flag.

5. Configure flags
systemctl --user edit dictad.service
[Service]
ExecStart=
ExecStart=%h/.local/bin/dictad \
    --asr-backend wyoming \
    --asr-wyoming-addr tcp://localhost:10300 \
    --preview-binary %h/.local/bin/dicta-preview
6. Enable and start
systemctl --user enable --now dictad.service
journalctl --user -u dictad.service -f
7. Bind compositor shortcuts
Key What it does Command
Pause Toggle type-mode session dicta toggle_talk --mode type
Scroll Lock Toggle clip-mode panel dicta toggle_talk --mode clip

For GNOME, bind these via gsettings (the Settings GUI tries to nudge you toward chord shortcuts; bypassing it lets you use unmodified single keys). For Sway/Hyprland/KDE, bind in the compositor config.

Optional: LLM cleanup

Off by default. To enable in clip-mode (the preview panel will display cleaned text the user can still edit before pressing Enter):

ExecStart=%h/.local/bin/dictad \
    ... \
    --cleanup-enabled \
    --cleanup-endpoint http://my-llama-server.lan:8080/v1 \
    --cleanup-model qwen3-7b-instruct

The mechanical system prompt is a code constant (cannot be templated by user input). Cleanup is only invoked in clip-mode; type-mode always sends the raw transcript to ydotool.

Optional: audit log (debug mode)

Off by default. JSONL transcripts (and optionally WAV captures) under $XDG_DATA_HOME/dicta/YYYY-MM-DD/:

ExecStart=%h/.local/bin/dictad \
    ... \
    --audit-enabled \
    --audit-keep-audio \
    --audit-retention-days 7

Both --audit-enabled and --audit-keep-audio are required to capture audio. Both default off because both are sensitive by definition.

Hotkey philosophy

v1 ships exactly two compositor bindings (D17 in the design doc): Pause for type-mode, Scroll Lock for clip-mode. There is no global commit or cancel hotkey — clip-mode commits via panel-local Enter and type-mode commits per-utterance via VAD silence. PTT (push-to-talk) and wakeword are out of scope for v1 and are tracked in §14 of the design doc.

Documentation

Building from source (no Taskfile)

# Daemon + CLI (pure Go, static)
CGO_ENABLED=0 go build -o bin/dictad ./cmd/dictad
CGO_ENABLED=0 go build -o bin/dicta ./cmd/dicta

# Preview panel (CGo, Wayland)
go build -tags nox11 -o bin/dicta-preview ./cmd/dicta-preview

The daemon and CLI MUST build with CGO_ENABLED=0 (D13). The MemoryDenyWriteExecute=true flag in the systemd unit relies on this.

Testing

task test       # unit tests
task test:race  # with race detector + goleak
task vet        # go vet
task check      # all of the above

internal/control ships a fuzz target for the wire-protocol parser:

go test -fuzz=FuzzCommandUnmarshal -fuzztime=1m ./internal/control

Contributing

The design doc's §13 lists the open decision points; everything else is locked. If you want to change a locked decision, file an issue explaining why before writing code — these were deliberate.

Bugs, typos, packaging contributions: PRs welcome.

License

Apache-2.0 — see LICENSE.

Directories

Path Synopsis
cmd
dicta command
Command dicta is the thin CLI client.
Command dicta is the thin CLI client.
dictad command
Command dictad is the long-lived dicta daemon.
Command dictad is the long-lived dicta daemon.
internal
asr
Package asr defines the pluggable ASR backend interface and v1 implementations (D2): wyoming (default, TCP), whispercpp (daemon-supervised whisper-server subprocess on loopback HTTP), openai (user-managed HTTP).
Package asr defines the pluggable ASR backend interface and v1 implementations (D2): wyoming (default, TCP), whispercpp (daemon-supervised whisper-server subprocess on loopback HTTP), openai (user-managed HTTP).
audio
Package audio captures microphone input and produces 80 ms / 1280-sample / 2560-byte int16-LE mono frames at 16 kHz (D15).
Package audio captures microphone input and produces 80 ms / 1280-sample / 2560-byte int16-LE mono frames at 16 kHz (D15).
audit
Package audit writes JSONL session records and optional WAV captures with retention managed per config.
Package audit writes JSONL session records and optional WAV captures with retention managed per config.
cleanup
Package cleanup provides an OpenAI-protocol HTTP client for optional LLM cleanup of clip-mode transcripts.
Package cleanup provides an OpenAI-protocol HTTP client for optional LLM cleanup of clip-mode transcripts.
config
Package config loads and validates the typed TOML configuration.
Package config loads and validates the typed TOML configuration.
control
Package control implements the Unix socket server at $XDG_RUNTIME_DIR/dicta.sock (mode 0600).
Package control implements the Unix socket server at $XDG_RUNTIME_DIR/dicta.sock (mode 0600).
dispatch
Package dispatch wraps the external output side-effects: ydotool (type-mode keystroke synthesis), wl-copy (clip-mode clipboard), and notify-send (desktop notifications).
Package dispatch wraps the external output side-effects: ydotool (type-mode keystroke synthesis), wl-copy (clip-mode clipboard), and notify-send (desktop notifications).
errors
Package errors holds shared sentinel errors and error-wrapping helpers.
Package errors holds shared sentinel errors and error-wrapping helpers.
log
Package log is the cross-cutting structured logger.
Package log is the cross-cutting structured logger.
mute
Package mute provides pluggable hardware-mute detection for dicta's --unmute-to-dictate watcher.
Package mute provides pluggable hardware-mute detection for dicta's --unmute-to-dictate watcher.
mute/pcmzero
Package pcmzero implements a mute.Source that infers mute state by checking captured PCM frames for all-zero bytes.
Package pcmzero implements a mute.Source that infers mute state by checking captured PCM frames for all-zero bytes.
mute/pipewire
Package pipewire implements a mute.Source that observes mute state through PipeWire/WirePlumber's user-facing CLI surface (wpctl).
Package pipewire implements a mute.Source that observes mute state through PipeWire/WirePlumber's user-facing CLI surface (wpctl).
whispersup
Package whispersup supervises a local whisper-server subprocess for the whispercpp ASR backend (D2).
Package whispersup supervises a local whisper-server subprocess for the whispercpp ASR backend (D2).
Package proto holds the wire-shape types and one-shot client helpers for the dicta control protocol (§5.6 of the design doc).
Package proto holds the wire-shape types and one-shot client helpers for the dicta control protocol (§5.6 of the design doc).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL