Voxray

module

v0.2.0 Latest Latest Go to latest Published: May 16, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Voxray-AI/Voxray

Links

Open Source Insights

README ¶

Voxray-AI

Build production-ready AI voice agents with a single JSON config. WebSocket & WebRTC · STT → LLM → TTS · Low-latency · Self-hostable

Config-driven Go server for building real-time voice agents. Wire together speech-to-text, LLM, and text-to-speech providers into low-latency streaming pipelines — no audio plumbing required.

Overview

Voxray-AI (github.com/Voxray-AI/Voxray) is a config-driven Go server for building real-time voice agents over WebSocket and WebRTC. It wires together STT, LLM, and TTS providers into low-latency streaming pipelines. Pipelines, providers, and transports are defined via JSON config, making it easy to swap services and deploy to your own infrastructure.

For architecture and pipeline details, see Architecture.

Quick Start

Get the server running end-to-end in under 5 minutes.

1. Prerequisites

go version    # Go 1.25+ required (see go.mod)
gcc --version # only needed for WebRTC/Opus — see Requirements

2. Clone and build

git clone https://github.com/your-org/voxray-ai.git
cd voxray-ai
go build -o voxray ./cmd/voxray
# or: make build

3. Configure

cp config.example.json config.json
# Set your API keys in config.json or via environment variables (e.g. OPENAI_API_KEY)

4. Run

./voxray -config config.json
# Windows: .\voxray.exe -config config.json

You can override config with flags: -config, -transport (webrtc, daily, twilio, telnyx, plivo, exotel), -port, -proxy (public hostname for telephony webhooks), -dialin (Daily PSTN; requires transport=daily). Use -init to scaffold config.json and dirs then exit, or run voxray init [config-path].

5. Connect

Endpoint	Method	Description
`/ws`	GET	WebSocket transport (upgrade)
`/webrtc/offer`	POST	WebRTC signaling (SDP offer/answer)
`/health`	GET	Liveness
`/ready`	GET	Readiness
`/start`	POST	Create session (runner-style WebRTC)
`/sessions/:id/offer`, `/api/v1/sessions/:id/offer`	POST, PATCH	Session SDP offer (after `/start`)
`/telephony/ws`	GET	Telephony media WebSocket (when `runner_transport` is Twilio/Telnyx/Plivo/Exotel)
`/swagger/`	GET	Swagger UI (when built with swag)
`/metrics`	GET	Prometheus metrics

Runner and telephony behavior are detailed in docs/CONNECTIVITY.md.

6. Try the WebRTC browser client (optional)

cd tests/frontend && python -m http.server 3000
# Open http://localhost:3000/webrtc-voice.html, set Server URL to http://localhost:8080, click Start

See tests/frontend/README.md for details.

Features

Low-latency pipelines — STT → LLM → TTS with configurable providers and models
Dual transports — WebSocket (/ws) and WebRTC via SmallWebRTC (/webrtc/offer)
Telephony & Daily.co — Twilio, Telnyx, Plivo, Exotel, and Daily.co (rooms + optional PSTN dial-in); media over WebSocket after provider webhooks
MCP tool integration — optional MCP server (configurable command/args) so the LLM can call tools
Wide provider support — OpenAI, Anthropic, Groq, Sarvam, AWS, Google, ElevenLabs, and more
Plugin system — custom processors and aggregators via an extensible framework
Config-driven — JSON configuration for all pipeline stages; API keys via config or environment variables
Conversation recording — mixed audio per session, uploaded asynchronously to S3
Transcript logging — per-message text logs to Postgres or MySQL
Observability — Prometheus metrics at /metrics
Voice over WebRTC — optional CGO/Opus build for real-time TTS audio delivery

Supported Providers

Provider sets and capability matrix are defined in pkg/services (SupportedSTTProviders, SupportedLLMProviders, SupportedTTSProviders in factory.go). Summary:

Stage	Provider	Notes
STT	OpenAI	Whisper via OpenAI API (e.g. `gpt-4o-mini-transcribe`)
	Groq	—
	Sarvam	Indian languages
	ElevenLabs	—
	AWS	Amazon Transcribe
	Google	Cloud Speech-to-Text
	Whisper	Direct Whisper integration
	Camb	—
	Gradium	—
	Soniox	—
LLM	OpenAI	GPT-4.1, GPT-4o, etc.
	Groq	—
	Grok	—
	Cerebras	—
	AWS	Amazon Bedrock
	Mistral	—
	DeepSeek	—
	Anthropic	Claude
	Google	Gemini
	Google Vertex	ADC-based authentication
	Ollama	Local/self-hosted models
	Qwen	—
	AsyncAI	—
	Fish	—
	Inworld	—
	Minimax	—
	Moondream	—
	OpenPipe	—
TTS	OpenAI	`alloy`, `nova`, etc.
	Groq	—
	Sarvam	Indian languages
	ElevenLabs	—
	AWS	Amazon Polly
	Google	Cloud Text-to-Speech
	Hume	—
	Inworld	—
	Minimax	—
	Neuphonic	—
	XTTS	Self-hosted Coqui XTTS

Architecture

Audio is received from web or native clients over WebSocket or WebRTC, processed through a configurable STT → LLM → TTS pipeline, and streamed back over the same transport. Each stage is pluggable — mix and match providers while keeping a consistent, low-latency pipeline.

flowchart TB
  subgraph Client["Client"]
    Browser["Browser / Native app"]
  end
  subgraph Server["Server"]
    HTTP["HTTP\n/ws, /webrtc/offer\n/metrics"]
  end
  subgraph Transport["Transport"]
    WS["WebSocket"]
    WebRTC["SmallWebRTC"]
  end
  subgraph Pipeline["Pipeline"]
    Runner["Runner"]
    Chain["Processors\nVAD → STT → LLM → TTS → Sink"]
  end
  subgraph Providers["External providers"]
    STT_API["STT API"]
    LLM_API["LLM API"]
    TTS_API["TTS API"]
  end
  Browser --> WS
  Browser --> WebRTC
  WS --> HTTP
  WebRTC --> HTTP
  HTTP --> Runner
  Runner --> Chain
  Chain --> STT_API
  Chain --> LLM_API
  Chain --> TTS_API
  Chain --> WS
  Chain --> WebRTC

Audio flows from clients (browser, runner, telephony, or Daily.co) into the server via WebSocket, SmallWebRTC, or telephony WebSocket. The runner wires each transport to the same pipeline (VAD → STT → LLM → TTS); external STT/LLM/TTS are called from pkg/services. See docs/CONNECTIVITY.md and docs/SYSTEM_ARCHITECTURE.md.

For a deeper dive, see docs/ARCHITECTURE.md and docs/SYSTEM_ARCHITECTURE.md.

Requirements

Go 1.25+ is the only hard requirement for the default (WebSocket-only) build.

go version    # should be 1.25+ (see go.mod)

For voice over WebRTC (TTS audio via Opus), CGO and a C compiler (gcc) must also be on your PATH:

gcc --version # only needed for WebRTC/Opus builds

C compiler on Windows

CGO requires gcc on your PATH. Two options:

WinLibs (winget):

winget install BrechtSanders.WinLibs.POSIX.UCRT --accept-package-agreements
# Restart terminal, then verify:
gcc --version

MSYS2:

Install MSYS2, open MSYS2 UCRT64, then:

pacman -S mingw-w64-ucrt-x86_64-toolchain

Add C:\msys64\ucrt64\bin to PATH and verify with gcc --version.

Without CGO, WebRTC TTS will report opus encoder unavailable (build without cgo) and the server returns 503 for WebRTC offers.

Installation

The default build has no external dependencies. The voice/WebRTC build requires CGO and gcc (see Requirements).

Default build (WebSocket only, no Opus)

go build -o voxray ./cmd/voxray
# or:
make build && make run

Build with voice (WebRTC TTS + Opus)

Linux / macOS:

make build-voice
./voxray -config config.json
# or in one step:
make run-voice ARGS="-config config.json"

Windows (PowerShell):

# Build once, then run:
.\scripts\build-voice.ps1
.\voxray.exe -config config.json

# Or build and run in one step:
.\scripts\run-voice.ps1 -config config.json

Manual (any OS):

CGO_ENABLED=1 go build -o voxray ./cmd/voxray
./voxray -config config.json
# or:
CGO_ENABLED=1 go run ./cmd/voxray -config config.json

After a voice build, WebRTC offers succeed and TTS audio is delivered over the peer connection.

Configuration

Set the config path via the -config flag or the VOXRAY_CONFIG environment variable. Copy config.example.json to config.json to get started.

Top-level keys

Key	Type	Default	Description
`transport`	string	`"websocket"`	`"websocket"`, `"smallwebrtc"`, or `"both"`
`host`	string	`"0.0.0.0"`	Bind host
`port`	int	`8080`	Bind port
`stt_provider`	string	—	STT provider name (e.g. `"openai"`)
`llm_provider`	string	—	LLM provider name (e.g. `"openai"`)
`tts_provider`	string	—	TTS provider name (e.g. `"openai"`)
`api_keys`	object	—	Map of provider → API key
`metrics_enabled`	bool	`true`	Expose Prometheus `/metrics`
`webrtc_ice_servers`	array	—	ICE server config for WebRTC
`rtc_max_duration_secs`	float	`0`	Max lifetime for RTC/WebSocket voice sessions after first inbound audio; `0` disables
`recording`	object	—	S3 conversation recording (see below)
`transcripts`	object	—	Database transcript logging (see below)
`mcp`	object	—	MCP server: `command`, `args`, `tools_filter` (see pkg/config/README.md)

Additional config

Key	Description
`provider`	Default provider for STT/LLM/TTS when task-specific (`stt_provider`, etc.) not set
`runner_transport`	`webrtc` \| `daily` \| `twilio` \| `telnyx` \| `plivo` \| `exotel` \| `livekit` \| `""`
`runner_port`, `proxy_host`, `dialin`	Runner and telephony (e.g. public hostname for webhooks; Daily PSTN dial-in)
`plugins`, `plugin_options`	Pipeline plugins and options (see docs/EXTENSIONS.md)
`turn_detection`, `turn_stop_secs`, `turn_pre_speech_ms`, `turn_max_duration_secs`, `vad_*`, `user_turn_stop_timeout_secs`, `user_idle_timeout_secs`, `turn_async`	Turn detection and VAD
`allow_interruptions`, `interruption_strategy`, `min_words`	Barge-in / interruption behavior
`cors_allowed_origins`, `max_request_body_bytes`, `server_api_key`	Server and optional API key auth
`legacy_errors`, `shutdown_upload_timeout_secs`	Compatibility and shutdown tuning

See config.example.json and examples/voice/README.md for all options.

Recording (S3)

Voxray can record the full mixed conversation audio per session and upload it asynchronously to S3.

"recording": {
  "enable": true,
  "bucket": "your-recordings-bucket",
  "base_path": "recordings/",
  "format": "wav",
  "worker_count": 4
}

Field	Description
`enable`	Turn recording on for all sessions
`bucket`	S3 bucket name
`base_path`	Key prefix inside the bucket (default: `recordings/`)
`format`	File format — currently `wav` (16-bit PCM mono)
`worker_count`	Background uploader thread pool size

Each session is written locally and, on session end, a background job uploads it to:

<base_path>/yyyy/mm/dd/<session-id>.wav

AWS credentials are resolved via the standard AWS SDK v2 chain (env vars, shared config, IAM role, etc.).

Transcripts (Postgres / MySQL)

Persist per-message text transcripts (user and assistant) to a relational database.

Postgres:

"transcripts": {
  "enable": true,
  "driver": "postgres",
  "dsn": "postgres://user:pass@localhost:5432/voxray?sslmode=disable",
  "table_name": "call_transcripts"
}

MySQL:

"transcripts": {
  "enable": true,
  "driver": "mysql",
  "dsn": "user:pass@tcp(localhost:3306)/voxray?parseTime=true",
  "table_name": "call_transcripts"
}

Expected schema (Postgres):

CREATE TABLE call_transcripts (
  id          BIGSERIAL PRIMARY KEY,
  session_id  TEXT NOT NULL,
  role        TEXT NOT NULL,   -- "user" or "assistant"
  text        TEXT NOT NULL,
  seq         BIGINT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

Prometheus metrics

The server exposes a Prometheus-compatible scrape endpoint at /metrics on the same host/port as /ws and /webrtc/offer.

"metrics_enabled": true (default) — records HTTP, WebRTC, STT, LLM, and TTS metrics.
"metrics_enabled": false — disables recording; /metrics returns 204 No Content so Prometheus scrape configs don't break.

Metrics are process-local; Prometheus aggregates across instances using instance/pod labels.

Environment Variables

All config values can be overridden via environment variables. Unknown keys in config JSON are silently ignored.

Server

Variable	Description
`VOXRAY_CONFIG`	Path to config file (alternative to `-config` flag)
`VOXRAY_HOST`	Bind host
`VOXRAY_PORT` / `PORT`	Bind port
`VOXRAY_LOG_LEVEL`	Log level (`debug`, `info`, `warn`, `error`)
`VOXRAY_JSON_LOGS`	`true` to emit structured JSON logs
`VOXRAY_CORS_ORIGINS`	Comma-separated allowed CORS origins
`VOXRAY_MAX_BODY_BYTES`	Max HTTP request body size in bytes
`VOXRAY_SERVER_API_KEY`	Server-level API key for auth
`VOXRAY_PIPELINE_INPUT_QUEUE_CAP`	Input queue capacity for pipeline
`VOXRAY_WS_WRITE_COALESCE_*`	WebSocket write coalescing settings
`VOXRAY_VAD_BATCH_SIZE`	VAD processor batch size
`VOXRAY_DAILY_DIALIN_WEBHOOK_SECRET`	Daily.co dial-in webhook secret

Recording

Variable	Description
`VOXRAY_RECORDING_ENABLE`	`true` to enable S3 recording
`VOXRAY_RECORDING_BUCKET`	S3 bucket name
`VOXRAY_RECORDING_BASE_PATH`	Key prefix inside the bucket
`VOXRAY_RECORDING_FORMAT`	File format (e.g. `wav`)
`VOXRAY_RECORDING_WORKER_COUNT`	Uploader thread pool size
`VOXRAY_RECORDING_QUEUE_CAP`	Upload job queue capacity
`VOXRAY_RECORDING_MAX_RETRIES`	Max upload retry attempts

Transcripts

Variable	Description
`VOXRAY_TRANSCRIPTS_ENABLE`	`true` to enable transcript logging
`VOXRAY_TRANSCRIPTS_DRIVER`	`postgres` or `mysql`
`VOXRAY_TRANSCRIPTS_DSN`	Database connection string
`VOXRAY_TRANSCRIPTS_TABLE`	Target table name

Examples

For provider/model-specific examples, see examples/voice/README.md. For the browser-based WebRTC client, see tests/frontend/README.md.

Complete example `config.json`

Copy this, fill in your API keys, and run:

{
  "transport": "both",
  "host": "0.0.0.0",
  "port": 8080,
  "metrics_enabled": true,

  "stt_provider": "openai",
  "stt_model": "gpt-4o-mini-transcribe",

  "llm_provider": "openai",
  "model": "gpt-4.1-mini",

  "tts_provider": "openai",
  "tts_voice": "alloy",

  "api_keys": {
    "openai": "YOUR_OPENAI_API_KEY"
  },

  "webrtc_ice_servers": [
    "stun:stun.l.google.com:19302"
  ]
}

Run with:

./voxray -config config.json

Then connect at http://localhost:8080/ws (WebSocket) or http://localhost:8080/webrtc/offer (WebRTC).

Use Cases

AI call centers / IVR — conversational agents for inbound and outbound calls with low latency
In-app voice copilots — embed voice agents inside SaaS or productivity apps via WebSocket or WebRTC
Operations and support bots — voicebots for support, ops, and internal tooling on your own infrastructure
Realtime monitoring and control — voice interfaces for dashboards, observability tools, and control systems
On-prem / VPC assistants — self-hosted voice-AI stacks where data must stay within your cloud or datacenter

Roadmap

Near-term

More built-in STT/LLM/TTS providers and opinionated presets for common stacks
Deeper observability, tracing, and debugging tools for real-time pipelines

Planned

Deployment templates (Docker, Kubernetes)
Additional starter agent examples for popular voice-agent scenarios
Expanded documentation on scaling, deployment patterns, and production hardening

Documentation

Repository layout

Package	README
`pkg/pipeline`	Pipeline, runner, source/sink, task, registry
`pkg/transport`	WebSocket, WebRTC, in-memory transports
`pkg/services`	LLM, STT, TTS interfaces and provider factory
`pkg/recording`	Conversation recording and S3 upload
`pkg/metrics`	Prometheus metrics
`pkg/config`	Configuration and env overrides
`pkg/processors`	Voice, echo, filters, aggregators
`pkg/runner`	Session store and runner args
`pkg/utils`	Backoff, notifier, sentence, aggregators
`pkg/frames`	Frame types and serialization
`pkg/audio`	VAD, turn detection, codecs, resample
`scripts`	Build, run, and maintenance scripts

Docs

docs/README.md — documentation index and reading order
docs/API_CLIENT.md — client integration (REST, WebSocket, auth, WebRTC)
docs/ARCHITECTURE.md — high-level architecture and pipeline
docs/SYSTEM_ARCHITECTURE.md — system view and entry points
docs/CONNECTIVITY.md — connectivity and transports
docs/DEPLOYMENT.md — deployment notes
docs/EXTENSIONS.md — extensions and plugins
docs/FRAMEWORKS.md — framework integration
docs/WEBSOCKET_SERVICES.md — WebSocket service reconnection
examples/voice/README.md — minimal voice pipeline and config samples
tests/frontend/README.md — WebRTC voice client

The OpenAPI spec is generated from the codebase (make swagger); Swagger UI is served at /swagger/ when available.

License

This project is licensed under the Apache License 2.0. Attribution details for distribution are provided in NOTICE.

Contributing

Contributions are welcome! Quick development setup:

go test ./...          # run all tests
make lint              # lint (or: ./scripts/pre-commit.sh)
make swagger           # regenerate API docs (requires swag)
make evals             # run eval scenarios (optional)

See CONTRIBUTING.md for full setup, testing, style, and pull request guidelines.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
evals command evals runs eval scenarios from a JSON config (LLM-only pipeline, prompt → assert on output).	evals runs eval scenarios from a JSON config (LLM-only pipeline, prompt → assert on output).
generate-dtmf command generate-dtmf generates DTMF WAV files (0-9, *, #) using pkg/audio.	generate-dtmf generates DTMF WAV files (0-9, *, #) using pkg/audio.
realtime-demo command
docs Package docs Code generated by swaggo/swag.	Package docs Code generated by swaggo/swag.
examples
voice command
pkg
adapters/schemas
api Package api provides shared REST API response envelope and error types.	Package api provides shared REST API response envelope and error types.
audio Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.	Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.
audio/filters
audio/interruptions
audio/mixers
audio/turn Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).	Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
audio/vad
config Package config handles the application configuration, including environment variables and JSON files.	Package config handles the application configuration, including environment variables and JSON files.
evals Package evals provides a Go-native eval runner for voice pipeline scenarios.	Package evals provides a Go-native eval runner for voice pipeline scenarios.
extensions/ivr Package ivr provides Interactive Voice Response (IVR) navigation components for automated IVR phone system navigation using LLM-based decision making and DTMF.	Package ivr provides Interactive Voice Response (IVR) navigation components for automated IVR phone system navigation using LLM-based decision making and DTMF.
extensions/voicemail Package voicemail provides ClassificationProcessor for voicemail vs conversation detection.	Package voicemail provides ClassificationProcessor for voicemail vs conversation detection.
frames Package frames defines DTMF and IVR-related frame types.	Package frames defines DTMF and IVR-related frame types.
frames/proto/wire Package wire provides frame wire formats.	Package wire provides frame wire formats.
frames/serialize Package serialize provides frame serialization interfaces and implementations.	Package serialize provides frame serialization interfaces and implementations.
frames/serialize/exotel Package exotel provides Exotel Media Streams WebSocket protocol serializer.	Package exotel provides Exotel Media Streams WebSocket protocol serializer.
frames/serialize/genesys Package genesys provides Genesys AudioHook WebSocket protocol serializer.	Package genesys provides Genesys AudioHook WebSocket protocol serializer.
frames/serialize/plivo Package plivo provides Plivo Audio Streaming WebSocket protocol serializer.	Package plivo provides Plivo Audio Streaming WebSocket protocol serializer.
frames/serialize/telnyx Package telnyx provides Telnyx WebSocket protocol serializer.	Package telnyx provides Telnyx WebSocket protocol serializer.
frames/serialize/twilio Package twilio provides Twilio Media Streams WebSocket protocol serializer.	Package twilio provides Twilio Media Streams WebSocket protocol serializer.
frames/serialize/vonage Package vonage provides Vonage Audio Connector WebSocket serializer.	Package vonage provides Vonage Audio Connector WebSocket serializer.
logger Package logger provides minimal logging for Voxray.	Package logger provides minimal logging for Voxray.
mcp Package mcp: Client connects to an MCP server, lists tools, converts schemas, and can register them with an LLMServiceWithTools.	Package mcp: Client connects to an MCP server, lists tools, converts schemas, and can register them with an LLMServiceWithTools.
metrics
observers Package observers provides optional metrics (latency, token usage) and OpenTelemetry stub.	Package observers provides optional metrics (latency, token usage) and OpenTelemetry stub.
pipeline Package pipeline provides ParallelPipeline for concurrent frame processing.	Package pipeline provides ParallelPipeline for concurrent frame processing.
plugin
processors Package processors: AIServiceBase provides a base for AI services with settings, Start/Stop/Cancel lifecycle, and optional metrics sync (mirrors upstream ai_service.py).	Package processors: AIServiceBase provides a base for AI services with settings, Start/Stop/Cancel lifecycle, and optional metrics sync (mirrors upstream ai_service.py).
processors/aggregator Package aggregator provides a processor that collects text frames and emits a single aggregated frame (e.g.	Package aggregator provides a processor that collects text frames and emits a single aggregated frame (e.g.
processors/aggregators/dtmf Package dtmf provides a DTMF aggregator that accumulates InputDTMFFrame digits and emits TranscriptionFrame on timeout, termination digit (#), or EndFrame/CancelFrame.	Package dtmf provides a DTMF aggregator that accumulates InputDTMFFrame digits and emits TranscriptionFrame on timeout, termination digit (#), or EndFrame/CancelFrame.
processors/aggregators/gated Package gated provides a gated aggregator that buffers frames when the gate is closed and releases them when the gate opens (custom open/close predicates).	Package gated provides a gated aggregator that buffers frames when the gate is closed and releases them when the gate opens (custom open/close predicates).
processors/aggregators/gatedcontext Package gatedcontext provides a processor that holds LLMContextFrame until a notifier signals release.	Package gatedcontext provides a processor that holds LLMContextFrame until a notifier signals release.
processors/aggregators/llmcontextsummarizer Package llmcontextsummarizer provides a processor that monitors LLM context size and emits LLMContextSummaryRequestFrame when thresholds are exceeded; applies results from LLMContextSummaryResultFrame.	Package llmcontextsummarizer provides a processor that monitors LLM context size and emits LLMContextSummaryRequestFrame when thresholds are exceeded; applies results from LLMContextSummaryResultFrame.
processors/aggregators/llmfullresponse Package llmfullresponse provides a processor that aggregates LLM text between LLMFullResponseStartFrame and LLMFullResponseEndFrame and invokes a callback on completion or interruption.	Package llmfullresponse provides a processor that aggregates LLM text between LLMFullResponseStartFrame and LLMFullResponseEndFrame and invokes a callback on completion or interruption.
processors/aggregators/llmtext Package llmtext provides a processor that converts LLMTextFrame to AggregatedTextFrame using a configurable text aggregator (e.g.	Package llmtext provides a processor that converts LLMTextFrame to AggregatedTextFrame using a configurable text aggregator (e.g.
processors/aggregators/userresponse Package userresponse provides a processor that aggregates TranscriptionFrame into a single TextFrame when the user turn ends (e.g.	Package userresponse provides a processor that aggregates TranscriptionFrame into a single TextFrame when the user turn ends (e.g.
processors/audio Package audio provides audio processors for the pipeline: VAD (voice activity detection) and an audio buffer processor that merges user and bot audio with optional turn-based and buffered callbacks.	Package audio provides audio processors for the pipeline: VAD (voice activity detection) and an audio buffer processor that merges user and bot audio with optional turn-based and buffered callbacks.
processors/echo
processors/filters Package filters provides frame-filtering processors for the pipeline, ported from upstream processors/filters: frame_filter, function_filter, identity_filter, null_filter, stt_mute_filter, wake_check_filter, wake_notifier_filter.	Package filters provides frame-filtering processors for the pipeline, ported from upstream processors/filters: frame_filter, function_filter, identity_filter, null_filter, stt_mute_filter, wake_check_filter, wake_notifier_filter.
processors/frameworks Package frameworks provides processor integrations for external runtimes and the RTVI protocol, ported from upstream processors/frameworks.	Package frameworks provides processor integrations for external runtimes and the RTVI protocol, ported from upstream processors/frameworks.
processors/frameworks/rtvi Package rtvi implements the RTVI (Real-Time Voice Interface) protocol processor and message types.	Package rtvi implements the RTVI (Real-Time Voice Interface) protocol processor and message types.
processors/logger
processors/voice Package voice provides processors that wire STT, LLM, and TTS into a pipeline.	Package voice provides processors that wire STT, LLM, and TTS into a pipeline.
realtime Package realtime provides realtime session implementations (OpenAI Realtime API and shim).	Package realtime provides realtime session implementations (OpenAI Realtime API and shim).
recording
runner Package runner provides Redis-backed session store for horizontal scaling.	Package runner provides Redis-backed session store for horizontal scaling.
runner/daily Package daily provides Daily.co room and meeting token creation via the REST API (runner Daily integration).	Package daily provides Daily.co room and meeting token creation via the REST API (runner Daily integration).
runner/livekit Package livekit provides LiveKit room URL and agent token configuration from environment (runner Livekit integration).	Package livekit provides LiveKit room URL and agent token configuration from environment (runner Livekit integration).
server Package server provides transport servers (e.g.	Package server provides transport servers (e.g.
services Package services defines interfaces and implementations for LLM, STT, and TTS.	Package services defines interfaces and implementations for LLM, STT, and TTS.
services/anthropic
services/asyncai
services/aws
services/camb Package camb provides Camb AI speech-to-text.	Package camb provides Camb AI speech-to-text.
services/cerebras Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.	Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
services/deepseek Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.	Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
services/elevenlabs
services/fish
services/google Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.	Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
services/gradium Package gradium provides Gradium speech-to-text (WebSocket or REST).	Package gradium provides Gradium speech-to-text (WebSocket or REST).
services/grok Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.	Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
services/groq Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.	Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
services/hume Package hume provides Hume (Hume AI) text-to-speech.	Package hume provides Hume (Hume AI) text-to-speech.
services/inworld Package inworld provides Inworld text-to-speech (and LLM).	Package inworld provides Inworld text-to-speech (and LLM).
services/llmapi Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.	Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
services/minimax Package minimax provides Minimax text-to-speech.	Package minimax provides Minimax text-to-speech.
services/mistral Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.	Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
services/mock Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.	Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
services/moondream
services/neuphonic Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).	Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
services/ollama Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).	Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
services/openai Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.	Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
services/openpipe Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.	Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
services/qwen Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.	Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
services/sarvam Package sarvam provides Sarvam AI TTS and STT service implementations.	Package sarvam provides Sarvam AI TTS and STT service implementations.
services/soniox Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).	Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
services/stt Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).	Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
services/tts Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).	Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
services/whisper Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.	Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
services/xtts Package xtts provides Coqui XTTS text-to-speech via local streaming server.	Package xtts provides Coqui XTTS text-to-speech via local streaming server.
sync/notifier Package notifier provides a one-shot notifier for gate synchronization.	Package notifier provides a one-shot notifier for gate synchronization.
transcripts
transport Package transport defines an optional base for transports with common fields (name, logger).	Package transport defines an optional base for transports with common fields (name, logger).
transport/memory Package memory provides an in-memory transport for testing and stress testing.	Package memory provides an in-memory transport for testing and stress testing.
transport/smallwebrtc Package smallwebrtc provides a WebRTC transport for Voxray using pion/webrtc.	Package smallwebrtc provides a WebRTC transport for Voxray using pion/webrtc.
transport/websocket Package websocket provides WebSocket transport (server and client) for Voxray.	Package websocket provides WebSocket transport (server and client) for Voxray.
transport/whatsapp Package whatsapp provides WhatsApp Cloud API client and transport for Voxray.	Package whatsapp provides WhatsApp Cloud API client and transport for Voxray.
utils Package utils provides shared utilities (backoff, etc.).	Package utils provides shared utilities (backoff, etc.).
utils/notifier Package notifier provides a simple signal that one goroutine can wait on and another can trigger.	Package notifier provides a simple signal that one goroutine can wait on and another can trigger.
utils/patternaggregator Package patternaggregator provides XML-style tag aggregation for LLM text streams.	Package patternaggregator provides XML-style tag aggregation for LLM text streams.
utils/sentence Package sentence provides helpers for sentence-boundary detection in aggregated text.	Package sentence provides helpers for sentence-boundary detection in aggregated text.
utils/textaggregator Package textaggregator provides an interface and implementations for aggregating incremental text (e.g.	Package textaggregator provides an interface and implementations for aggregating incremental text (e.g.