Voxray

module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0

README

Voxray-AI

Go License Go Reference Go Report Card codecov

Build production-ready AI voice agents with a single JSON config. WebSocket & WebRTC · STT → LLM → TTS · Low-latency · Self-hostable

Config-driven Go server for building real-time voice agents. Wire together speech-to-text, LLM, and text-to-speech providers into low-latency streaming pipelines — no audio plumbing required.


Table of Contents


Overview

Voxray-AI (github.com/Voxray-AI/Voxray) is a config-driven Go server for building real-time voice agents over WebSocket and WebRTC. It wires together STT, LLM, and TTS providers into low-latency streaming pipelines. Pipelines, providers, and transports are defined via JSON config, making it easy to swap services and deploy to your own infrastructure.

For architecture and pipeline details, see Architecture.


Quick Start

Get the server running end-to-end in under 5 minutes.

1. Prerequisites

go version    # Go 1.25+ required (see go.mod)
gcc --version # only needed for WebRTC/Opus — see Requirements

2. Clone and build

git clone https://github.com/your-org/voxray-ai.git
cd voxray-ai
go build -o voxray ./cmd/voxray
# or: make build

3. Configure

cp config.example.json config.json
# Set your API keys in config.json or via environment variables (e.g. OPENAI_API_KEY)

4. Run

./voxray -config config.json
# Windows: .\voxray.exe -config config.json

You can override config with flags: -config, -transport (webrtc, daily, twilio, telnyx, plivo, exotel), -port, -proxy (public hostname for telephony webhooks), -dialin (Daily PSTN; requires transport=daily). Use -init to scaffold config.json and dirs then exit, or run voxray init [config-path].

5. Connect

Endpoint Method Description
/ws GET WebSocket transport (upgrade)
/webrtc/offer POST WebRTC signaling (SDP offer/answer)
/health GET Liveness
/ready GET Readiness
/start POST Create session (runner-style WebRTC)
/sessions/:id/offer, /api/v1/sessions/:id/offer POST, PATCH Session SDP offer (after /start)
/telephony/ws GET Telephony media WebSocket (when runner_transport is Twilio/Telnyx/Plivo/Exotel)
/swagger/ GET Swagger UI (when built with swag)
/metrics GET Prometheus metrics

Runner and telephony behavior are detailed in docs/CONNECTIVITY.md.

6. Try the WebRTC browser client (optional)

cd tests/frontend && python -m http.server 3000
# Open http://localhost:3000/webrtc-voice.html, set Server URL to http://localhost:8080, click Start

See tests/frontend/README.md for details.


Features

  • Low-latency pipelines — STT → LLM → TTS with configurable providers and models
  • Dual transports — WebSocket (/ws) and WebRTC via SmallWebRTC (/webrtc/offer)
  • Telephony & Daily.co — Twilio, Telnyx, Plivo, Exotel, and Daily.co (rooms + optional PSTN dial-in); media over WebSocket after provider webhooks
  • MCP tool integration — optional MCP server (configurable command/args) so the LLM can call tools
  • Wide provider support — OpenAI, Anthropic, Groq, Sarvam, AWS, Google, ElevenLabs, and more
  • Plugin system — custom processors and aggregators via an extensible framework
  • Config-driven — JSON configuration for all pipeline stages; API keys via config or environment variables
  • Conversation recording — mixed audio per session, uploaded asynchronously to S3
  • Transcript logging — per-message text logs to Postgres or MySQL
  • Observability — Prometheus metrics at /metrics
  • Voice over WebRTC — optional CGO/Opus build for real-time TTS audio delivery

Supported Providers

Provider sets and capability matrix are defined in pkg/services (SupportedSTTProviders, SupportedLLMProviders, SupportedTTSProviders in factory.go). Summary:

Stage Provider Notes
STT OpenAI Whisper via OpenAI API (e.g. gpt-4o-mini-transcribe)
Groq
Sarvam Indian languages
ElevenLabs
AWS Amazon Transcribe
Google Cloud Speech-to-Text
Whisper Direct Whisper integration
Camb
Gradium
Soniox
LLM OpenAI GPT-4.1, GPT-4o, etc.
Groq
Grok
Cerebras
AWS Amazon Bedrock
Mistral
DeepSeek
Anthropic Claude
Google Gemini
Google Vertex ADC-based authentication
Ollama Local/self-hosted models
Qwen
AsyncAI
Fish
Inworld
Minimax
Moondream
OpenPipe
TTS OpenAI alloy, nova, etc.
Groq
Sarvam Indian languages
ElevenLabs
AWS Amazon Polly
Google Cloud Text-to-Speech
Hume
Inworld
Minimax
Neuphonic
XTTS Self-hosted Coqui XTTS

Architecture

Audio is received from web or native clients over WebSocket or WebRTC, processed through a configurable STT → LLM → TTS pipeline, and streamed back over the same transport. Each stage is pluggable — mix and match providers while keeping a consistent, low-latency pipeline.

flowchart TB
  subgraph Client["Client"]
    Browser["Browser / Native app"]
  end
  subgraph Server["Server"]
    HTTP["HTTP\n/ws, /webrtc/offer\n/metrics"]
  end
  subgraph Transport["Transport"]
    WS["WebSocket"]
    WebRTC["SmallWebRTC"]
  end
  subgraph Pipeline["Pipeline"]
    Runner["Runner"]
    Chain["Processors\nVAD → STT → LLM → TTS → Sink"]
  end
  subgraph Providers["External providers"]
    STT_API["STT API"]
    LLM_API["LLM API"]
    TTS_API["TTS API"]
  end
  Browser --> WS
  Browser --> WebRTC
  WS --> HTTP
  WebRTC --> HTTP
  HTTP --> Runner
  Runner --> Chain
  Chain --> STT_API
  Chain --> LLM_API
  Chain --> TTS_API
  Chain --> WS
  Chain --> WebRTC

Audio flows from clients (browser, runner, telephony, or Daily.co) into the server via WebSocket, SmallWebRTC, or telephony WebSocket. The runner wires each transport to the same pipeline (VAD → STT → LLM → TTS); external STT/LLM/TTS are called from pkg/services. See docs/CONNECTIVITY.md and docs/SYSTEM_ARCHITECTURE.md.

For a deeper dive, see docs/ARCHITECTURE.md and docs/SYSTEM_ARCHITECTURE.md.


Requirements

Go 1.25+ is the only hard requirement for the default (WebSocket-only) build.

go version    # should be 1.25+ (see go.mod)

For voice over WebRTC (TTS audio via Opus), CGO and a C compiler (gcc) must also be on your PATH:

gcc --version # only needed for WebRTC/Opus builds
C compiler on Windows

CGO requires gcc on your PATH. Two options:

WinLibs (winget):

winget install BrechtSanders.WinLibs.POSIX.UCRT --accept-package-agreements
# Restart terminal, then verify:
gcc --version

MSYS2:

Install MSYS2, open MSYS2 UCRT64, then:

pacman -S mingw-w64-ucrt-x86_64-toolchain

Add C:\msys64\ucrt64\bin to PATH and verify with gcc --version.

Without CGO, WebRTC TTS will report opus encoder unavailable (build without cgo) and the server returns 503 for WebRTC offers.


Installation

The default build has no external dependencies. The voice/WebRTC build requires CGO and gcc (see Requirements).

Default build (WebSocket only, no Opus)
go build -o voxray ./cmd/voxray
# or:
make build && make run
Build with voice (WebRTC TTS + Opus)

Linux / macOS:

make build-voice
./voxray -config config.json
# or in one step:
make run-voice ARGS="-config config.json"

Windows (PowerShell):

# Build once, then run:
.\scripts\build-voice.ps1
.\voxray.exe -config config.json

# Or build and run in one step:
.\scripts\run-voice.ps1 -config config.json

Manual (any OS):

CGO_ENABLED=1 go build -o voxray ./cmd/voxray
./voxray -config config.json
# or:
CGO_ENABLED=1 go run ./cmd/voxray -config config.json

After a voice build, WebRTC offers succeed and TTS audio is delivered over the peer connection.


Configuration

Set the config path via the -config flag or the VOXRAY_CONFIG environment variable. Copy config.example.json to config.json to get started.

Top-level keys
Key Type Default Description
transport string "websocket" "websocket", "smallwebrtc", or "both"
host string "0.0.0.0" Bind host
port int 8080 Bind port
stt_provider string STT provider name (e.g. "openai")
llm_provider string LLM provider name (e.g. "openai")
tts_provider string TTS provider name (e.g. "openai")
api_keys object Map of provider → API key
metrics_enabled bool true Expose Prometheus /metrics
webrtc_ice_servers array ICE server config for WebRTC
rtc_max_duration_secs float 0 Max lifetime for RTC/WebSocket voice sessions after first inbound audio; 0 disables
recording object S3 conversation recording (see below)
transcripts object Database transcript logging (see below)
mcp object MCP server: command, args, tools_filter (see pkg/config/README.md)
Additional config
Key Description
provider Default provider for STT/LLM/TTS when task-specific (stt_provider, etc.) not set
runner_transport webrtc | daily | twilio | telnyx | plivo | exotel | livekit | ""
runner_port, proxy_host, dialin Runner and telephony (e.g. public hostname for webhooks; Daily PSTN dial-in)
plugins, plugin_options Pipeline plugins and options (see docs/EXTENSIONS.md)
turn_detection, turn_stop_secs, turn_pre_speech_ms, turn_max_duration_secs, vad_*, user_turn_stop_timeout_secs, user_idle_timeout_secs, turn_async Turn detection and VAD
allow_interruptions, interruption_strategy, min_words Barge-in / interruption behavior
cors_allowed_origins, max_request_body_bytes, server_api_key Server and optional API key auth
legacy_errors, shutdown_upload_timeout_secs Compatibility and shutdown tuning

See config.example.json and examples/voice/README.md for all options.

Recording (S3)

Voxray can record the full mixed conversation audio per session and upload it asynchronously to S3.

"recording": {
  "enable": true,
  "bucket": "your-recordings-bucket",
  "base_path": "recordings/",
  "format": "wav",
  "worker_count": 4
}
Field Description
enable Turn recording on for all sessions
bucket S3 bucket name
base_path Key prefix inside the bucket (default: recordings/)
format File format — currently wav (16-bit PCM mono)
worker_count Background uploader thread pool size

Each session is written locally and, on session end, a background job uploads it to:

<base_path>/yyyy/mm/dd/<session-id>.wav

AWS credentials are resolved via the standard AWS SDK v2 chain (env vars, shared config, IAM role, etc.).

Transcripts (Postgres / MySQL)

Persist per-message text transcripts (user and assistant) to a relational database.

Postgres:

"transcripts": {
  "enable": true,
  "driver": "postgres",
  "dsn": "postgres://user:pass@localhost:5432/voxray?sslmode=disable",
  "table_name": "call_transcripts"
}

MySQL:

"transcripts": {
  "enable": true,
  "driver": "mysql",
  "dsn": "user:pass@tcp(localhost:3306)/voxray?parseTime=true",
  "table_name": "call_transcripts"
}

Expected schema (Postgres):

CREATE TABLE call_transcripts (
  id          BIGSERIAL PRIMARY KEY,
  session_id  TEXT NOT NULL,
  role        TEXT NOT NULL,   -- "user" or "assistant"
  text        TEXT NOT NULL,
  seq         BIGINT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);
Prometheus metrics

The server exposes a Prometheus-compatible scrape endpoint at /metrics on the same host/port as /ws and /webrtc/offer.

  • "metrics_enabled": true (default) — records HTTP, WebRTC, STT, LLM, and TTS metrics.
  • "metrics_enabled": false — disables recording; /metrics returns 204 No Content so Prometheus scrape configs don't break.

Metrics are process-local; Prometheus aggregates across instances using instance/pod labels.


Environment Variables

All config values can be overridden via environment variables. Unknown keys in config JSON are silently ignored.

Server
Variable Description
VOXRAY_CONFIG Path to config file (alternative to -config flag)
VOXRAY_HOST Bind host
VOXRAY_PORT / PORT Bind port
VOXRAY_LOG_LEVEL Log level (debug, info, warn, error)
VOXRAY_JSON_LOGS true to emit structured JSON logs
VOXRAY_CORS_ORIGINS Comma-separated allowed CORS origins
VOXRAY_MAX_BODY_BYTES Max HTTP request body size in bytes
VOXRAY_SERVER_API_KEY Server-level API key for auth
VOXRAY_PIPELINE_INPUT_QUEUE_CAP Input queue capacity for pipeline
VOXRAY_WS_WRITE_COALESCE_* WebSocket write coalescing settings
VOXRAY_VAD_BATCH_SIZE VAD processor batch size
VOXRAY_DAILY_DIALIN_WEBHOOK_SECRET Daily.co dial-in webhook secret
Recording
Variable Description
VOXRAY_RECORDING_ENABLE true to enable S3 recording
VOXRAY_RECORDING_BUCKET S3 bucket name
VOXRAY_RECORDING_BASE_PATH Key prefix inside the bucket
VOXRAY_RECORDING_FORMAT File format (e.g. wav)
VOXRAY_RECORDING_WORKER_COUNT Uploader thread pool size
VOXRAY_RECORDING_QUEUE_CAP Upload job queue capacity
VOXRAY_RECORDING_MAX_RETRIES Max upload retry attempts
Transcripts
Variable Description
VOXRAY_TRANSCRIPTS_ENABLE true to enable transcript logging
VOXRAY_TRANSCRIPTS_DRIVER postgres or mysql
VOXRAY_TRANSCRIPTS_DSN Database connection string
VOXRAY_TRANSCRIPTS_TABLE Target table name

Examples

For provider/model-specific examples, see examples/voice/README.md. For the browser-based WebRTC client, see tests/frontend/README.md.

Complete example config.json

Copy this, fill in your API keys, and run:

{
  "transport": "both",
  "host": "0.0.0.0",
  "port": 8080,
  "metrics_enabled": true,

  "stt_provider": "openai",
  "stt_model": "gpt-4o-mini-transcribe",

  "llm_provider": "openai",
  "model": "gpt-4.1-mini",

  "tts_provider": "openai",
  "tts_voice": "alloy",

  "api_keys": {
    "openai": "YOUR_OPENAI_API_KEY"
  },

  "webrtc_ice_servers": [
    "stun:stun.l.google.com:19302"
  ]
}

Run with:

./voxray -config config.json

Then connect at http://localhost:8080/ws (WebSocket) or http://localhost:8080/webrtc/offer (WebRTC).


Use Cases

  • AI call centers / IVR — conversational agents for inbound and outbound calls with low latency
  • In-app voice copilots — embed voice agents inside SaaS or productivity apps via WebSocket or WebRTC
  • Operations and support bots — voicebots for support, ops, and internal tooling on your own infrastructure
  • Realtime monitoring and control — voice interfaces for dashboards, observability tools, and control systems
  • On-prem / VPC assistants — self-hosted voice-AI stacks where data must stay within your cloud or datacenter

Roadmap

Near-term

  • More built-in STT/LLM/TTS providers and opinionated presets for common stacks
  • Deeper observability, tracing, and debugging tools for real-time pipelines

Planned

  • Deployment templates (Docker, Kubernetes)
  • Additional starter agent examples for popular voice-agent scenarios
  • Expanded documentation on scaling, deployment patterns, and production hardening

Documentation

Repository layout
Package README
pkg/pipeline Pipeline, runner, source/sink, task, registry
pkg/transport WebSocket, WebRTC, in-memory transports
pkg/services LLM, STT, TTS interfaces and provider factory
pkg/recording Conversation recording and S3 upload
pkg/metrics Prometheus metrics
pkg/config Configuration and env overrides
pkg/processors Voice, echo, filters, aggregators
pkg/runner Session store and runner args
pkg/utils Backoff, notifier, sentence, aggregators
pkg/frames Frame types and serialization
pkg/audio VAD, turn detection, codecs, resample
scripts Build, run, and maintenance scripts
Docs

The OpenAPI spec is generated from the codebase (make swagger); Swagger UI is served at /swagger/ when available.


License

This project is licensed under the Apache License 2.0. Attribution details for distribution are provided in NOTICE.


Contributing

Contributions are welcome! Quick development setup:

go test ./...          # run all tests
make lint              # lint (or: ./scripts/pre-commit.sh)
make swagger           # regenerate API docs (requires swag)
make evals             # run eval scenarios (optional)

See CONTRIBUTING.md for full setup, testing, style, and pull request guidelines.

Directories

Path Synopsis
cmd
evals command
evals runs eval scenarios from a JSON config (LLM-only pipeline, prompt → assert on output).
evals runs eval scenarios from a JSON config (LLM-only pipeline, prompt → assert on output).
generate-dtmf command
generate-dtmf generates DTMF WAV files (0-9, *, #) using pkg/audio.
generate-dtmf generates DTMF WAV files (0-9, *, #) using pkg/audio.
realtime-demo command
Package docs Code generated by swaggo/swag.
Package docs Code generated by swaggo/swag.
voice command
pkg
api
Package api provides shared REST API response envelope and error types.
Package api provides shared REST API response envelope and error types.
audio
Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.
Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.
audio/turn
Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
config
Package config handles the application configuration, including environment variables and JSON files.
Package config handles the application configuration, including environment variables and JSON files.
evals
Package evals provides a Go-native eval runner for voice pipeline scenarios.
Package evals provides a Go-native eval runner for voice pipeline scenarios.
extensions/ivr
Package ivr provides Interactive Voice Response (IVR) navigation components for automated IVR phone system navigation using LLM-based decision making and DTMF.
Package ivr provides Interactive Voice Response (IVR) navigation components for automated IVR phone system navigation using LLM-based decision making and DTMF.
extensions/voicemail
Package voicemail provides ClassificationProcessor for voicemail vs conversation detection.
Package voicemail provides ClassificationProcessor for voicemail vs conversation detection.
frames
Package frames defines DTMF and IVR-related frame types.
Package frames defines DTMF and IVR-related frame types.
frames/proto/wire
Package wire provides frame wire formats.
Package wire provides frame wire formats.
frames/serialize
Package serialize provides frame serialization interfaces and implementations.
Package serialize provides frame serialization interfaces and implementations.
frames/serialize/exotel
Package exotel provides Exotel Media Streams WebSocket protocol serializer.
Package exotel provides Exotel Media Streams WebSocket protocol serializer.
frames/serialize/genesys
Package genesys provides Genesys AudioHook WebSocket protocol serializer.
Package genesys provides Genesys AudioHook WebSocket protocol serializer.
frames/serialize/plivo
Package plivo provides Plivo Audio Streaming WebSocket protocol serializer.
Package plivo provides Plivo Audio Streaming WebSocket protocol serializer.
frames/serialize/telnyx
Package telnyx provides Telnyx WebSocket protocol serializer.
Package telnyx provides Telnyx WebSocket protocol serializer.
frames/serialize/twilio
Package twilio provides Twilio Media Streams WebSocket protocol serializer.
Package twilio provides Twilio Media Streams WebSocket protocol serializer.
frames/serialize/vonage
Package vonage provides Vonage Audio Connector WebSocket serializer.
Package vonage provides Vonage Audio Connector WebSocket serializer.
logger
Package logger provides minimal logging for Voxray.
Package logger provides minimal logging for Voxray.
mcp
Package mcp: Client connects to an MCP server, lists tools, converts schemas, and can register them with an LLMServiceWithTools.
Package mcp: Client connects to an MCP server, lists tools, converts schemas, and can register them with an LLMServiceWithTools.
observers
Package observers provides optional metrics (latency, token usage) and OpenTelemetry stub.
Package observers provides optional metrics (latency, token usage) and OpenTelemetry stub.
pipeline
Package pipeline provides ParallelPipeline for concurrent frame processing.
Package pipeline provides ParallelPipeline for concurrent frame processing.
processors
Package processors: AIServiceBase provides a base for AI services with settings, Start/Stop/Cancel lifecycle, and optional metrics sync (mirrors upstream ai_service.py).
Package processors: AIServiceBase provides a base for AI services with settings, Start/Stop/Cancel lifecycle, and optional metrics sync (mirrors upstream ai_service.py).
processors/aggregator
Package aggregator provides a processor that collects text frames and emits a single aggregated frame (e.g.
Package aggregator provides a processor that collects text frames and emits a single aggregated frame (e.g.
processors/aggregators/dtmf
Package dtmf provides a DTMF aggregator that accumulates InputDTMFFrame digits and emits TranscriptionFrame on timeout, termination digit (#), or EndFrame/CancelFrame.
Package dtmf provides a DTMF aggregator that accumulates InputDTMFFrame digits and emits TranscriptionFrame on timeout, termination digit (#), or EndFrame/CancelFrame.
processors/aggregators/gated
Package gated provides a gated aggregator that buffers frames when the gate is closed and releases them when the gate opens (custom open/close predicates).
Package gated provides a gated aggregator that buffers frames when the gate is closed and releases them when the gate opens (custom open/close predicates).
processors/aggregators/gatedcontext
Package gatedcontext provides a processor that holds LLMContextFrame until a notifier signals release.
Package gatedcontext provides a processor that holds LLMContextFrame until a notifier signals release.
processors/aggregators/llmcontextsummarizer
Package llmcontextsummarizer provides a processor that monitors LLM context size and emits LLMContextSummaryRequestFrame when thresholds are exceeded; applies results from LLMContextSummaryResultFrame.
Package llmcontextsummarizer provides a processor that monitors LLM context size and emits LLMContextSummaryRequestFrame when thresholds are exceeded; applies results from LLMContextSummaryResultFrame.
processors/aggregators/llmfullresponse
Package llmfullresponse provides a processor that aggregates LLM text between LLMFullResponseStartFrame and LLMFullResponseEndFrame and invokes a callback on completion or interruption.
Package llmfullresponse provides a processor that aggregates LLM text between LLMFullResponseStartFrame and LLMFullResponseEndFrame and invokes a callback on completion or interruption.
processors/aggregators/llmtext
Package llmtext provides a processor that converts LLMTextFrame to AggregatedTextFrame using a configurable text aggregator (e.g.
Package llmtext provides a processor that converts LLMTextFrame to AggregatedTextFrame using a configurable text aggregator (e.g.
processors/aggregators/userresponse
Package userresponse provides a processor that aggregates TranscriptionFrame into a single TextFrame when the user turn ends (e.g.
Package userresponse provides a processor that aggregates TranscriptionFrame into a single TextFrame when the user turn ends (e.g.
processors/audio
Package audio provides audio processors for the pipeline: VAD (voice activity detection) and an audio buffer processor that merges user and bot audio with optional turn-based and buffered callbacks.
Package audio provides audio processors for the pipeline: VAD (voice activity detection) and an audio buffer processor that merges user and bot audio with optional turn-based and buffered callbacks.
processors/filters
Package filters provides frame-filtering processors for the pipeline, ported from upstream processors/filters: frame_filter, function_filter, identity_filter, null_filter, stt_mute_filter, wake_check_filter, wake_notifier_filter.
Package filters provides frame-filtering processors for the pipeline, ported from upstream processors/filters: frame_filter, function_filter, identity_filter, null_filter, stt_mute_filter, wake_check_filter, wake_notifier_filter.
processors/frameworks
Package frameworks provides processor integrations for external runtimes and the RTVI protocol, ported from upstream processors/frameworks.
Package frameworks provides processor integrations for external runtimes and the RTVI protocol, ported from upstream processors/frameworks.
processors/frameworks/rtvi
Package rtvi implements the RTVI (Real-Time Voice Interface) protocol processor and message types.
Package rtvi implements the RTVI (Real-Time Voice Interface) protocol processor and message types.
processors/voice
Package voice provides processors that wire STT, LLM, and TTS into a pipeline.
Package voice provides processors that wire STT, LLM, and TTS into a pipeline.
realtime
Package realtime provides realtime session implementations (OpenAI Realtime API and shim).
Package realtime provides realtime session implementations (OpenAI Realtime API and shim).
runner
Package runner provides Redis-backed session store for horizontal scaling.
Package runner provides Redis-backed session store for horizontal scaling.
runner/daily
Package daily provides Daily.co room and meeting token creation via the REST API (runner Daily integration).
Package daily provides Daily.co room and meeting token creation via the REST API (runner Daily integration).
runner/livekit
Package livekit provides LiveKit room URL and agent token configuration from environment (runner Livekit integration).
Package livekit provides LiveKit room URL and agent token configuration from environment (runner Livekit integration).
server
Package server provides transport servers (e.g.
Package server provides transport servers (e.g.
services
Package services defines interfaces and implementations for LLM, STT, and TTS.
Package services defines interfaces and implementations for LLM, STT, and TTS.
services/camb
Package camb provides Camb AI speech-to-text.
Package camb provides Camb AI speech-to-text.
services/cerebras
Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
services/deepseek
Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
services/google
Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
services/gradium
Package gradium provides Gradium speech-to-text (WebSocket or REST).
Package gradium provides Gradium speech-to-text (WebSocket or REST).
services/grok
Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
services/groq
Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
services/hume
Package hume provides Hume (Hume AI) text-to-speech.
Package hume provides Hume (Hume AI) text-to-speech.
services/inworld
Package inworld provides Inworld text-to-speech (and LLM).
Package inworld provides Inworld text-to-speech (and LLM).
services/llmapi
Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
services/minimax
Package minimax provides Minimax text-to-speech.
Package minimax provides Minimax text-to-speech.
services/mistral
Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
services/mock
Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
services/neuphonic
Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
services/ollama
Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
services/openai
Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
services/openpipe
Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
services/qwen
Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
services/sarvam
Package sarvam provides Sarvam AI TTS and STT service implementations.
Package sarvam provides Sarvam AI TTS and STT service implementations.
services/soniox
Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
services/stt
Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
services/tts
Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
services/whisper
Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
services/xtts
Package xtts provides Coqui XTTS text-to-speech via local streaming server.
Package xtts provides Coqui XTTS text-to-speech via local streaming server.
sync/notifier
Package notifier provides a one-shot notifier for gate synchronization.
Package notifier provides a one-shot notifier for gate synchronization.
transport
Package transport defines an optional base for transports with common fields (name, logger).
Package transport defines an optional base for transports with common fields (name, logger).
transport/memory
Package memory provides an in-memory transport for testing and stress testing.
Package memory provides an in-memory transport for testing and stress testing.
transport/smallwebrtc
Package smallwebrtc provides a WebRTC transport for Voxray using pion/webrtc.
Package smallwebrtc provides a WebRTC transport for Voxray using pion/webrtc.
transport/websocket
Package websocket provides WebSocket transport (server and client) for Voxray.
Package websocket provides WebSocket transport (server and client) for Voxray.
transport/whatsapp
Package whatsapp provides WhatsApp Cloud API client and transport for Voxray.
Package whatsapp provides WhatsApp Cloud API client and transport for Voxray.
utils
Package utils provides shared utilities (backoff, etc.).
Package utils provides shared utilities (backoff, etc.).
utils/notifier
Package notifier provides a simple signal that one goroutine can wait on and another can trigger.
Package notifier provides a simple signal that one goroutine can wait on and another can trigger.
utils/patternaggregator
Package patternaggregator provides XML-style tag aggregation for LLM text streams.
Package patternaggregator provides XML-style tag aggregation for LLM text streams.
utils/sentence
Package sentence provides helpers for sentence-boundary detection in aggregated text.
Package sentence provides helpers for sentence-boundary detection in aggregated text.
utils/textaggregator
Package textaggregator provides an interface and implementations for aggregating incremental text (e.g.
Package textaggregator provides an interface and implementations for aggregating incremental text (e.g.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL