meowcaller

package module
v0.0.0-...-0f1265d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2026 License: MIT Imports: 35 Imported by: 0

README

meowcaller

Go Reference

meowcaller is a Go library for the WhatsApp Web VoIP stack. It is 100% pure GO without CGO and it has minimal dependencies. It includes the novel proprietary audio codec MLOW written and validated completely in GO. In turn, meowcaller does not rely on any native bindings and can run everywhere that GO can.

Discussion

Matrix room: #meowcaller:matrix.org.

Discord channel: #meowcaller in the WhiskeySockets Discord server.

You can find the underlying spec in the WhatsApp Calls Research Group. We are under process of standardizing the spec and moving away from whatsapp-rust source of truth comments.

Usage

The godoc includes docs for all methods.

There's a range of examples in the examples directory.

The API is easy to approach and implement: attach a Source to send media, a Sink to receive it, and register callbacks for call events.

A 12-line example to show the power and simplicity of the library:

// wa is a whatsmeow.Client
client := meowcaller.NewClient(wa)

client.OnIncomingCall(func(call *meowcaller.Call) {
    call.Answer()

    if mp3, err := meowcaller.MP3File("hello.mp3"); err == nil {
        call.Play(mp3)               // stream audio to the caller
    }
    if wav, err := meowcaller.WAVRecorder("caller.wav"); err == nil {
        call.Receive(wav)            // record their voice
    }
    if h264, err := meowcaller.AnnexBRecorder("caller.h264"); err == nil {
        call.ReceiveVideo(h264)      // record their video
    }
})

// Placing a call is just as short:
call, _ := client.Call(ctx, "+15551234567")
call.Receive(meowcaller.SinkFunc(func(pcm []float32) { /* the peer's audio */ }))

Features

Core VoIP features are present:

  • Outbound calls
  • Inbound calls
  • Audio calls (the pure-Go MLow codec)
  • Video calls (ported from WaCalls; see Credits)

Things that are not yet implemented:

  • Opus codec fallback for clients not using MLOW (in progress; testing edge cases)
  • Mid-call audio→video upgrade (from-start video calls work; the upgrade handshake is WIP)
  • Group calls (WIP)
  • Call signalling features (raise hand, lobby, reactions)

Credits

meowcaller relies heavily on primitives that are implemented in the WhatsApp Calls Research Group. I thank all the developers who have contributed to it.

meowcaller's video call implementation is built on the design of WaCalls by jotadev66. WaCalls in turn vendors meowcaller's MLow impl.

Sponsoring and contribution

You may contribute to the maintenance of this library by sponsoring its maintainers on GitHub.

You may also submit pull requests and issues where relevant, given you follow the contributor Code of Conduct.

License

This repository follows the MIT license, as stated in the LICENSE file

Documentation

Index

Constants

View Source
const (
	// SampleRate is the codec's fixed sample rate (16 kHz mono).
	SampleRate = 16000
	// FrameSamples is the per-frame sample count (60 ms at 16 kHz).
	FrameSamples = 960
)

Audio in meowcaller is 16 kHz mono float32 PCM carried in 60 ms frames — the rate and framing the MLow codec encodes. Every AudioSource and AudioSink speaks this format; the built-in decoders convert foreign formats (WAV/MP3/Opus) into it.

Variables

This section is empty.

Functions

This section is empty.

Types

type AudioCodec

type AudioCodec int8

AudioCodec identifies the wire audio codec negotiated for a call. WhatsApp 1:1 audio is carried in RTP payload type 120 regardless of codec; the codec itself is chosen by signaling (the server's voip_settings), not the RTP payload type.

const (
	// AudioCodecMlow is Meta's 16 kHz MLow codec (the default).
	AudioCodecMlow AudioCodec = iota
	// AudioCodecOpus is RFC 6716 Opus.
	AudioCodecOpus
)

func (AudioCodec) String

func (c AudioCodec) String() string

String renders the codec name for logs.

type AudioSink

type AudioSink interface {
	// WriteFrame consumes one decoded mono frame from the peer.
	WriteFrame(frame []float32) error
	// Close flushes and releases the sink. Safe to call more than once.
	Close() error
}

AudioSink consumes the 16 kHz mono PCM frames decoded from the peer's audio. Attach one with Call.Receive; built-ins record to a WAV file or forward to a callback. A CGO Speaker() sink lives in the meowcaller/audio/malgo subpackage.

func WAVRecorder

func WAVRecorder(path string) (AudioSink, error)

WAVRecorder creates an AudioSink that records the decoded 16 kHz mono peer audio to a 16-bit PCM WAV file at path. Close finalizes the header size fields.

type AudioSource

type AudioSource interface {
	// ReadFrame returns the next FrameSamples-long mono frame, or io.EOF at the end.
	ReadFrame() ([]float32, error)
	// Close releases any decoder/file resources. Safe to call more than once.
	Close() error
}

AudioSource yields successive 16 kHz mono PCM frames of FrameSamples to play into a call. ReadFrame returns io.EOF when the source is exhausted (a Player then fires OnFinish). Built-in sources decode WAV/MP3/Opus/raw PCM; attach one to a call via a Player (Call.Subscribe / Call.Play).

func MP3File

func MP3File(path string) (AudioSource, error)

MP3File streams an MP3 file as 16 kHz mono FrameSamples frames, downmixing the decoder's s16le stereo output to mono and resampling to 16 kHz.

func OpusFile

func OpusFile(path string) (AudioSource, error)

OpusFile streams an Ogg/Opus file as 16 kHz mono FrameSamples frames. It reads the Ogg pages, decodes each Opus packet to 16 kHz mono PCM, and frames the result.

func PCMStream

func PCMStream(r io.ReadCloser) AudioSource

PCMStream plays raw s16le mono 16 kHz PCM read from r. r is closed when the source is exhausted or Close is called.

func WAVFile

func WAVFile(path string) (AudioSource, error)

WAVFile streams a RIFF/WAVE file as 16 kHz mono FrameSamples frames, downmixing and resampling as needed. 16-bit PCM is supported.

type Call

type Call struct {
	// contains filtered or unexported fields
}

Call is one live 1:1 call. Place one with Client.Call, or receive one (unanswered) in an OnIncomingCall listener. Attach outbound audio with Subscribe/Play and inbound audio with Receive, and lifecycle listeners with OnReady/OnEnd/OnStateChange. All methods are safe for concurrent use.

func (*Call) Answer

func (c *Call) Answer() error

Answer accepts an inbound call (preaccept + accept) and brings media up. No-op error if the call is not in a ringing state.

func (*Call) Hangup

func (c *Call) Hangup() error

Hangup ends the call (either direction) and tears down media.

func (*Call) ID

func (c *Call) ID() string

ID returns the call-id (32 uppercase hex chars).

func (*Call) IsVideo

func (c *Call) IsVideo() bool

IsVideo reports whether the inbound offer advertised video. Attach a VideoSink with ReceiveVideo to receive the peer's H.264.

func (*Call) OnEnd

func (c *Call) OnEnd(fn func(reason string))

OnEnd registers a callback fired when the call ends, with a short reason string.

func (*Call) OnReady

func (c *Call) OnReady(fn func())

OnReady registers a callback fired once media is flowing (relay bound, first frames exchanged).

func (*Call) OnStateChange

func (c *Call) OnStateChange(fn func(CallPhase))

OnStateChange registers a callback fired on each phase transition.

func (*Call) OnVideoState

func (c *Call) OnVideoState(fn func(VideoState))

OnVideoState registers a callback fired for each inbound <video> state stanza — the peer's video on/off, the audio→video upgrade, and device orientation (rotate by Orientation × 90°).

func (*Call) Peer

func (c *Call) Peer() types.JID

Peer returns the remote party's LID.

func (*Call) Play

func (c *Call) Play(src AudioSource) *Player

Play is a shortcut: it creates a Player, subscribes it, starts src, and returns the Player (use it for Pause/Stop/OnFinish).

func (*Call) Receive

func (c *Call) Receive(sink AudioSink)

Receive attaches a sink for the peer's decoded audio (16 kHz mono frames), replacing any previous one. Without a sink the inbound audio is decoded and discarded.

func (*Call) ReceiveVideo

func (c *Call) ReceiveVideo(sink VideoSink)

ReceiveVideo attaches a sink for the peer's H.264 video, delivered as Annex-B access units (one per frame, reassembled on the RTP marker), replacing any previous one. Without a sink the inbound video is discarded. The video analog of Receive; AnnexBRecorder records to a .h264 file, or use VideoSinkFunc to forward to a callback.

NOT VALIDATED: the inbound-video media path is unproven (no captured video-RTP vector).

func (*Call) Reject

func (c *Call) Reject() error

Reject declines an inbound call.

func (*Call) SendVideo

func (c *Call) SendVideo(accessUnit []byte) error

SendVideo sends one already-encoded H.264 access unit (Annex-B) to the peer — fed from an external encoder (browser WebCodecs, ffmpeg, hardware). Returns an error if the call has no active video media yet. meowcaller does not encode pixels (no pure-Go H.264 encoder); this is the video analog of writing a sample to a track.

NOT VALIDATED: the video send media path is unproven.

func (*Call) State

func (c *Call) State() CallPhase

State returns the call's current phase.

func (*Call) Subscribe

func (c *Call) Subscribe(p *Player)

Subscribe attaches p as the call's outbound audio player, replacing any previous one. While the player is Playing, its source frames are encoded and sent to the peer; otherwise silence is sent (the call must keep sending to hold the relay bridge).

type CallDirection

type CallDirection int

CallDirection is the originating direction of a call.

const (
	CallDirectionOutgoing CallDirection = iota
	CallDirectionIncoming
)

type CallPhase

type CallPhase int

CallPhase is the lifecycle phase of a call.

const (
	CallPhaseIdle CallPhase = iota
	CallPhaseCalling
	CallPhaseRinging
	CallPhaseConnecting
	CallPhaseActive
	CallPhaseEnded
)

type CallRegistry

type CallRegistry struct {
	// contains filtered or unexported fields
}

CallRegistry is a thread-safe map of active calls keyed by call-id, each optionally holding the cancel handle for its running media task.

func NewCallRegistry

func NewCallRegistry() *CallRegistry

NewCallRegistry returns an empty registry.

func (*CallRegistry) AbortAll

func (r *CallRegistry) AbortAll() int

AbortAll cancels every call's media task and clears the registry, returning the number cleared. Call on disconnect/reconnect.

func (*CallRegistry) ActiveCount

func (r *CallRegistry) ActiveCount() int

ActiveCount returns the number of registered calls.

func (*CallRegistry) Insert

func (r *CallRegistry) Insert(session *CallSession) bool

Insert registers a new call; returns false if the id already exists.

func (*CallRegistry) Phase

func (r *CallRegistry) Phase(callID string) (CallPhase, bool)

Phase returns the call's current phase, and whether the call is known.

func (*CallRegistry) Remove

func (r *CallRegistry) Remove(callID string) bool

Remove deletes a call, cancelling its media task; true if it existed.

func (*CallRegistry) SetMediaTask

func (r *CallRegistry) SetMediaTask(callID string, cancel context.CancelFunc)

SetMediaTask attaches (or replaces, cancelling the old) the media task's cancel handle for a call. If the call is unknown (e.g. already removed), the handle is cancelled immediately so its task can't outlive the call.

func (*CallRegistry) Snapshot

func (r *CallRegistry) Snapshot(callID string) (CallSession, bool)

Snapshot returns a copy of the call's session, and whether it is known.

func (*CallRegistry) Transition

func (r *CallRegistry) Transition(callID string, next CallPhase) bool

Transition advances a call's phase; false if unknown or the move is illegal.

type CallSession

type CallSession struct {
	CallID      string
	PeerJID     types.JID
	CallCreator types.JID
	Direction   CallDirection
	IsVideo     bool
	// contains filtered or unexported fields
}

CallSession is the per-call signaling state with validated phase transitions.

func NewIncomingSession

func NewIncomingSession(callID string, peerJID, callCreator types.JID, opts ...Option) *CallSession

NewIncomingSession starts an incoming call session in the Ringing phase.

func NewOutgoingSession

func NewOutgoingSession(callID string, peerJID, callCreator types.JID, opts ...Option) *CallSession

NewOutgoingSession starts an outgoing call session in the Idle phase.

func (*CallSession) IsActive

func (s *CallSession) IsActive() bool

IsActive reports whether the call is in the Active phase.

func (*CallSession) IsEnded

func (s *CallSession) IsEnded() bool

IsEnded reports whether the call has ended.

func (*CallSession) Phase

func (s *CallSession) Phase() CallPhase

Phase returns the current lifecycle phase.

func (*CallSession) TransitionTo

func (s *CallSession) TransitionTo(next CallPhase) bool

TransitionTo attempts a phase transition, returning false (no-op) if illegal. Ended is reachable from anything except Ended.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is the managed entry point to the WhatsApp 1:1 calling stack. It wraps a connected *whatsmeow.Client and drives the whole call lifecycle — signaling, keying, relay election, and media — under the hood, behind a small surface: place a call with Call, handle inbound calls from an OnIncomingCall listener, and attach a Player (outbound audio) and a sink (inbound audio) to each Call.

The library never configures logging; pass WithLogger to surface its debug/trace.

func NewClient

func NewClient(wa *whatsmeow.Client, opts ...Option) *Client

NewClient wraps a connected whatsmeow client and installs the call event handlers. Construct it before the whatsmeow client connects so the low-level <ack>/<call> interception is in place before the receive loop starts.

func (*Client) Call

func (c *Client) Call(ctx context.Context, target string) (*Call, error)

Call places a 1:1 call to target (a phone number, a phone JID, or an @lid JID), returning the live Call once the offer is on the wire. Attach a Player and listeners to the returned Call; media starts automatically once the peer answers and the relay endpoint arrives.

func (*Client) OnIncomingCall

func (c *Client) OnIncomingCall(fn func(*Call))

OnIncomingCall registers the listener fired for each inbound call offer. The handler receives a Call that has not been answered yet; call Answer or Reject on it. Only the most recently registered listener is used.

type MediaPipeline

type MediaPipeline struct {
	// contains filtered or unexported fields
}

MediaPipeline composes the outbound (protect) and inbound (unprotect) E2E 1:1 media path. SFrame is omitted (plain Opus inside WAHKDF SRTP).

func NewMediaPipeline

func NewMediaPipeline(callKey []byte, selfJID, peerJID string, ssrc, samplesPerPacket uint32, opts ...Option) (*MediaPipeline, error)

NewMediaPipeline derives both directions from the 32-byte callKey: send keys from the self LID, recv keys from the peer LID (an interop-load-bearing convention).

func (*MediaPipeline) ProtectAudio

func (p *MediaPipeline) ProtectAudio(opusPayload []byte) ([]byte, error)

ProtectAudio wraps an Opus payload in an RTP WARP header, E2E-SRTP encrypts, and appends the WARP MI tag.

func (*MediaPipeline) ProtectRTP

func (p *MediaPipeline) ProtectRTP(header *rtp.RtpHeader, payload []byte) ([]byte, error)

ProtectRTP E2E-SRTP encrypts payload under the send keys for a caller-built RTP header and appends the WARP MI tag. It is the generic form of ProtectAudio used by the video send path, which manages its own PT-97 sequencer (header) per WaCalls.

NOT VALIDATED: the video send media path is unproven.

func (*MediaPipeline) UnprotectAudio

func (p *MediaPipeline) UnprotectAudio(packet []byte) (rtp.RtpHeader, []byte, bool)

UnprotectAudio strips the WARP MI tag (not verified), parses the header, and decrypts the payload, guessing the ROC from the recv tracker. ok=false on a malformed packet.

type Option

type Option func(*config)

Option configures optional, non-behavioral aspects of the call/media types — currently the diagnostic logger. The zero configuration logs nothing.

func WithDiagnostics

func WithDiagnostics(rec *diag.Recorder) Option

WithDiagnostics attaches a developer-only *diag.Recorder that dumps exact, per-category call diagnostics (including raw secrets and media) to JSONL files. This is an opt-in maintainer carve-out from the library's sanitized logging and must never be enabled in production. Without it the recorder is nil and every diag emit is a no-op at zero cost.

func WithLogger

func WithLogger(l zerolog.Logger) Option

WithLogger sets the zerolog logger for debug/trace diagnostics. The library never configures logging itself; without this option the types are silent at zero cost. Pass the logger from a context, e.g. WithLogger(*zerolog.Ctx(ctx)).

type Player

type Player struct {
	// contains filtered or unexported fields
}

Player streams an AudioSource into a Call — the discord.js AudioPlayer analogue. Attach it with Call.Subscribe (or Call.Play as a shortcut); while Playing, the call pulls frames from the source on the codec's 60 ms cadence. When the source reaches io.EOF the player returns to PlayerIdle and fires OnFinish (queue the next source there for gapless playback).

func NewPlayer

func NewPlayer() *Player

NewPlayer returns an idle Player.

func (*Player) OnFinish

func (p *Player) OnFinish(fn func())

OnFinish registers a callback fired when the active source is exhausted (the player transitions to PlayerIdle). Replaces any previous callback.

func (*Player) Pause

func (p *Player) Pause()

Pause suspends playback; the call sends silence until Resume.

func (*Player) Play

func (p *Player) Play(src AudioSource)

Play sets src as the active source and starts playback, replacing (and closing) any current source.

func (*Player) Resume

func (p *Player) Resume()

Resume continues a paused player.

func (*Player) State

func (p *Player) State() PlayerState

State returns the current PlayerState.

func (*Player) Stop

func (p *Player) Stop()

Stop halts playback and closes the active source.

type PlayerState

type PlayerState int

PlayerState is a Player's lifecycle state.

const (
	// PlayerIdle means no source is playing (none set, stopped, or finished).
	PlayerIdle PlayerState = iota
	// PlayerPlaying means the active source is streaming into the call.
	PlayerPlaying
	// PlayerPaused means playback is suspended (silence is sent in the meantime).
	PlayerPaused
)

type SinkFunc

type SinkFunc func(frame []float32)

SinkFunc adapts a plain function to an AudioSink (Close is a no-op).

func (SinkFunc) Close

func (f SinkFunc) Close() error

Close is a no-op for SinkFunc.

func (SinkFunc) WriteFrame

func (f SinkFunc) WriteFrame(frame []float32) error

WriteFrame calls f.

type VideoSink

type VideoSink interface {
	// WriteVideo consumes one Annex-B H.264 access unit from the peer.
	WriteVideo(accessUnit []byte) error
	// Close flushes and releases the sink. Safe to call more than once.
	Close() error
}

VideoSink consumes the encoded H.264 access units received from the peer. Attach one with Call.ReceiveVideo; the built-in AnnexBRecorder records to a raw .h264 file, or use VideoSinkFunc to forward to a callback. Without a sink the peer's video is discarded.

func AnnexBRecorder

func AnnexBRecorder(path string) (VideoSink, error)

AnnexBRecorder creates a VideoSink that records the peer's H.264 to a raw Annex-B .h264 file at path (playable directly by ffmpeg/VLC). Close finalizes it — the video analog of WAVRecorder.

type VideoSinkFunc

type VideoSinkFunc func(accessUnit []byte)

VideoSinkFunc adapts a plain function to a VideoSink (Close is a no-op).

func (VideoSinkFunc) Close

func (f VideoSinkFunc) Close() error

Close is a no-op for VideoSinkFunc.

func (VideoSinkFunc) WriteVideo

func (f VideoSinkFunc) WriteVideo(accessUnit []byte) error

WriteVideo calls f.

type VideoState

type VideoState struct {
	// Active reports the peer's camera is on (state == 1).
	Active bool
	// Upgrade reports a mid-call audio→video upgrade (state == 11).
	Upgrade bool
	// Orientation is the peer's device orientation (0..3); rotate the rendered video by
	// Orientation × 90° to display upright.
	Orientation int
	// Raw is the unmapped "state" attribute value.
	Raw int
}

VideoState is the peer's video state from a mid-call <video> stanza, delivered to Call.OnVideoState.

Directories

Path Synopsis
audio
malgo module
Package diag is a developer-only diagnostic recorder for the meowcaller VoIP stack.
Package diag is a developer-only diagnostic recorder for the meowcaller VoIP stack.
examples
mlow command
Command mlowtest encodes raw PCM to an MLow .bin and decodes an MLow .bin back to audio, so you can record from a mic and listen to the reconstruction for quality.
Command mlowtest encodes raw PCM to an MLow .bin and decodes an MLow .bin back to audio, so you can record from a mic and listen to the reconstruction for quality.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL