README
ΒΆ
VideoAsCode (vac)
Convert Marp presentations with voiceovers to video files.
This tool takes a Marp markdown presentation with voiceover text (inline comments or JSON transcript), generates speech using text-to-speech (TTS), and creates a synchronized video recording of the presentation with optional subtitles.
Powered by OmniVoice - a unified interface for TTS/STT providers. Additional providers beyond those listed below are available via OmniVoice. Tested with:
- ElevenLabs - Known for high-quality AI voices (TTS and STT available)
- Deepgram - Known for fast, accurate transcription (STT and TTS available)
Both providers offer TTS and STT capabilities. You can use either one for both functions, though vac defaults to ElevenLabs for voice generation and Deepgram for subtitle transcription based on their respective strengths. See the OmniVoice repository for the full list of supported providers.
Features
- π Parse Marp presentations with voiceover in HTML comments
- π JSON transcript support for multi-language voiceovers
- ποΈ Text-to-speech via OmniVoice (ElevenLabs, Deepgram)
- π£οΈ Multi-language support with per-slide voice configuration
- πΌοΈ Image-based rendering using Marp PNG export for reliable output
- π¬ Video generation with synchronized audio using ffmpeg
- π» Cross-platform support (macOS, Linux, Windows)
- β±οΈ Pause directives like
[PAUSE:1000]for timing control - βοΈ Full orchestration - entire process automated in Go
- βΆοΈ YouTube-ready combined video output with optional transitions
- π Udemy-ready individual slide videos for course lectures
- π Decoupled workflow - generate audio and video separately
- π Subtitle generation - SRT/VTT from audio via OmniVoice (Deepgram STT)
- π Browser demo recording - automated browser interactions with voiceover
- π― TTS audio caching - reuse generated audio across runs
- β‘ Hardware acceleration - fast encoding with VideoToolbox (macOS)
Installation
Prerequisites
-
Go 1.21+
go version -
ffmpeg (for video recording and processing)
# macOS brew install ffmpeg # Linux sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html -
Marp CLI (for rendering presentations)
npm install -g @marp-team/marp-cli -
ElevenLabs API Key (for TTS)
- Sign up at ElevenLabs
- Get your API key from the dashboard
-
Deepgram API Key (for subtitle generation)
- Sign up at Deepgram
- Get your API key from the console
Build from Source
git clone https://github.com/grokify/videoascode
cd videoascode
go build -o bin/vac ./cmd/vac
Usage
vac provides two main command groups:
Marp Slides:
vac slides video- Full pipeline: parse slides, generate TTS, record, combinevac slides tts- Generate audio from JSON transcript
Browser Recording:
vac browser video- Record browser demo with TTS voiceovervac browser record- Record browser demo (silent, no audio)
Utilities:
vac subtitle- Generate subtitles from audio using STT
Quick Start (Full Pipeline)
# Set API keys
export ELEVENLABS_API_KEY="your-elevenlabs-key" # For TTS
export DEEPGRAM_API_KEY="your-deepgram-key" # For subtitles (optional)
# Using inline voiceover comments
vac slides video --input slides.md --output video.mp4
Two-Step Workflow (Recommended for Multi-Language)
# Step 1: Generate audio from transcript
vac slides tts --transcript transcript.json --output audio/en-US/ --lang en-US
# Step 2: Generate video with pre-generated audio
vac slides video --input slides.md --manifest audio/en-US/manifest.json --output video/en-US.mp4
Command: vac slides tts
Generate audio files from a JSON transcript.
vac slides tts [flags]
Flags:
-t, --transcript string Transcript JSON file (required)
-o, --output string Output directory for audio files (default "audio")
-l, --lang string Language/locale code (e.g., en-US, es-ES)
--provider string TTS provider: elevenlabs or deepgram
Output:
audio/{lang}/slide_000.mp3,slide_001.mp3, ... (one per slide)audio/{lang}/manifest.json(timing information for video recording)
Example:
# Generate audio for Spanish
vac slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
Command: vac slides video
Generate video from Marp presentation.
vac slides video [flags]
Flags:
-i, --input string Input Marp markdown file (required)
-o, --output string Output video file (default "output.mp4")
-m, --manifest string Audio manifest file (from 'vac slides tts')
-k, --api-key string ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
-v, --voice string ElevenLabs voice ID (default: Adam)
--width int Video width (default 1920)
--height int Video height (default 1080)
--fps int Frame rate (default 30)
--transition float Transition duration in seconds
--subtitles string Subtitle file to embed (SRT or VTT)
--subtitles-lang string Subtitle language code (auto-detected from filename)
--output-individual string Directory for individual slide videos
--workdir string Working directory for temp files
--screen-device string Screen capture device (macOS)
--check Check dependencies and exit
Command: vac subtitle
Generate subtitle files (SRT/VTT) from audio files using speech-to-text.
vac subtitle [flags]
Flags:
-a, --audio string Audio directory containing manifest.json (required)
-o, --output string Output directory for subtitle files (default "subtitles")
-l, --lang string Language code (auto-detected from manifest if not specified)
--provider string STT provider: deepgram or elevenlabs (default: deepgram)
--individual Also generate individual subtitle files per slide
Output:
subtitles/{lang}.srt- SRT format subtitle filesubtitles/{lang}.vtt- WebVTT format subtitle file
Example:
# Generate French subtitles (language auto-detected from manifest)
vac subtitle --audio audio/fr-FR/
# Generate with explicit language and custom output
vac subtitle --audio audio/zh-Hans/ --lang zh-Hans --output subs/
Command: vac browser video
Record browser-driven demos with AI-generated voiceover. This command automates browser interactions (navigation, clicks, scrolling) while generating synchronized narration.
vac browser video [flags]
Flags:
-c, --config string Configuration file (YAML/JSON) with browser segments (required)
-o, --output string Output video file (default "output.mp4")
-a, --audio-dir string Save/reuse audio tracks in this directory (per-language subdirs)
-p, --provider string TTS provider: elevenlabs or deepgram (default: auto-detect)
-v, --voice string TTS voice ID (default: from config or provider default)
-l, --lang string Languages to generate, comma-separated (default "en-US")
--elevenlabs-api-key ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
--deepgram-api-key Deepgram API key (or use DEEPGRAM_API_KEY env var)
--width int Video width (default 1920)
--height int Video height (default 1080)
--fps int Video frame rate (default 30)
--headless Run browser in headless mode
--transition float Transition duration between segments (seconds)
--subtitles Generate subtitles from voiceover timing (no STT)
--subtitles-stt Generate word-level subtitles using STT (requires API)
--subtitles-burn Burn subtitles into video (permanent, requires FFmpeg with libass)
--no-audio Generate video without audio (TTS still used for timing/subtitles)
--fast Use hardware-accelerated encoding (VideoToolbox on macOS)
--limit int Limit to first N segments (for testing)
--limit-steps int Limit browser segments to first N steps (for testing)
--workdir string Working directory for temp files
Command: vac browser record
Record browser session without audio (silent recording).
vac browser record [flags]
Flags:
-c, --config string Configuration file (YAML/JSON) with segments
-s, --steps string Steps file (JSON/YAML) defining browser actions
-u, --url string Starting URL for the browser
-o, --output string Output video file (default "recording.mp4")
--width int Browser viewport width (default 1920)
--height int Browser viewport height (default 1080)
--fps int Video frame rate (default 30)
--headless Run browser in headless mode
-t, --timing string Output timing JSON file for transcript sync
--workdir string Working directory for temp files
Key Features:
- Multi-language support: Generate videos in multiple languages with
--lang en-US,fr-FR,zh-Hans - Audio caching: Use
--audio-dirto cache TTS audio and skip regeneration on subsequent runs - Pace to longest language: Video timing automatically matches the longest audio across all languages
- Per-voiceover timing: Each browser step is paced to its corresponding voiceover duration
- Subtitle generation: Create SRT/VTT subtitles from voiceover timing or word-level STT
Example Config (demo.yaml):
metadata:
title: "Product Demo"
defaultLanguage: "en-US"
defaultVoice:
provider: "elevenlabs"
voiceId: "pNInz6obpgDQGcFmaJgB"
segments:
- id: "segment_000"
type: "browser"
browser:
url: "https://example.com"
steps:
- action: "wait"
duration: 1000
voiceover:
en-US: "Welcome to our product demo."
fr-FR: "Bienvenue dans notre dΓ©monstration."
- action: "click"
selector: "#login-button"
voiceover:
en-US: "Click the login button to get started."
fr-FR: "Cliquez sur le bouton de connexion."
- action: "scroll"
scrollY: 500
voiceover:
en-US: "Scroll down to see more features."
fr-FR: "Faites dΓ©filer pour voir plus de fonctionnalitΓ©s."
Example Usage:
# Basic browser demo recording
vac browser video --config demo.yaml --output demo.mp4
# Multi-language with audio caching
vac browser video --config demo.yaml --output demo.mp4 \
--audio-dir ./audio --lang en-US,fr-FR,zh-Hans
# With subtitles burned into video (requires FFmpeg with libass)
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn
# Video with burned subtitles but no audio (for silent demos)
vac browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn --no-audio
# Using Deepgram instead of ElevenLabs
vac browser video --config demo.yaml --output demo.mp4 \
--provider deepgram
# Silent browser recording (no audio)
vac browser record --url https://example.com --steps demo.json --output demo.mp4
# Fast encoding with hardware acceleration (macOS VideoToolbox)
vac browser video --config demo.yaml --output demo.mp4 --fast
# Test first 2 segments only (faster iteration)
vac browser video --config demo.yaml --output demo.mp4 --limit 2
# Test first 3 steps of browser segment (faster iteration)
vac browser video --config demo.yaml --output demo.mp4 --limit-steps 3
Output Structure:
When using --audio-dir and multiple languages:
project/
βββ demo.yaml
βββ demo.mp4 # Primary language video
βββ demo_fr-FR.mp4 # French version
βββ demo_zh-Hans.mp4 # Chinese version
βββ demo.srt # Subtitles (if --subtitles)
βββ audio/
βββ en-US/
β βββ segment_000.mp3
β βββ segment_000.json # Cached timing metadata
β βββ combined.mp3
βββ fr-FR/
β βββ ...
βββ zh-Hans/
βββ ...
Examples
Full pipeline with inline voiceovers:
vac slides video \
--input presentation.md \
--output youtube_video.mp4 \
--transition 0.5
Multi-language workflow:
# Step 1: Generate audio for each language (directory matches locale code)
vac slides tts --transcript transcript.json --output audio/en-US/ --lang en-US
vac slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
vac slides tts --transcript transcript.json --output audio/zh-Hans/ --lang zh-Hans
# Step 2: Generate subtitles for each language (uses Deepgram STT)
vac subtitle --audio audio/en-US/
vac subtitle --audio audio/es-ES/
vac subtitle --audio audio/zh-Hans/
# Step 3: Generate videos with embedded subtitles
vac slides video --input slides.md --manifest audio/en-US/manifest.json \
--output video/en-US.mp4 --subtitles subtitles/en-US.srt
vac slides video --input slides.md --manifest audio/es-ES/manifest.json \
--output video/es-ES.mp4 --subtitles subtitles/es-ES.srt
vac slides video --input slides.md --manifest audio/zh-Hans/manifest.json \
--output video/zh-Hans.mp4 --subtitles subtitles/zh-Hans.srt
Directory structure (locale codes enable automation):
project/
βββ presentation.md
βββ transcript.json
βββ audio/
β βββ en-US/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ es-ES/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ zh-Hans/
β βββ manifest.json
β βββ slide_*.mp3
βββ subtitles/
β βββ en-US.srt
β βββ en-US.vtt
β βββ es-ES.srt
β βββ es-ES.vtt
β βββ zh-Hans.srt
β βββ zh-Hans.vtt
βββ video/
βββ en-US.mp4
βββ es-ES.mp4
βββ zh-Hans.mp4
Generate individual videos for Udemy:
vac slides video \
--input presentation.md \
--output combined.mp4 \
--output-individual ./udemy_videos/
Check Dependencies
vac slides video --check
This will verify that all required tools (ffmpeg, marp) are installed.
Voiceover Formats
vac supports two voiceover formats:
- Inline HTML comments - Simple, single-language
- JSON transcript - Multi-language, advanced TTS control
Option 1: Inline Voiceover Comments
Add voiceover text in HTML comments before or after slide content:
---
marp: true
---
<!--
This is the voiceover for the first slide.
It will be converted to speech using ElevenLabs.
[PAUSE:1000]
You can add pause directives for timing control.
-->
# First Slide
This is the visible content
---
<!--
Voiceover for slide 2...
-->
# Second Slide
More content
Pause Directives
Use [PAUSE:milliseconds] to add pauses in the voiceover:
<!--
First sentence.
[PAUSE:1000]
Second sentence after a 1-second pause.
[PAUSE:2000]
Third sentence after a 2-second pause.
-->
The pause directives are automatically removed from the spoken text.
Option 2: JSON Transcript
For multi-language support and advanced TTS configuration, use a JSON transcript file:
{
"version": "1.0",
"metadata": {
"title": "My Presentation",
"defaultLanguage": "en-US",
"defaultVoice": {
"provider": "elevenlabs",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"voiceName": "Adam",
"model": "eleven_multilingual_v2",
"stability": 0.5,
"similarityBoost": 0.75
},
"defaultVenue": "youtube"
},
"slides": [
{
"index": 0,
"title": "Title Slide",
"transcripts": {
"en-US": {
"segments": [
{ "text": "Welcome to the presentation.", "pause": 500 },
{ "text": "Let's get started." }
]
},
"es-ES": {
"voice": {
"voiceId": "onwK4e9ZLuTAKqWW03F9",
"voiceName": "Daniel"
},
"segments": [
{ "text": "Bienvenido a la presentaciΓ³n.", "pause": 500 },
{ "text": "Comencemos." }
]
}
}
}
]
}
Transcript Features
| Feature | Description |
|---|---|
| Multi-language | Per-slide transcripts for each locale (en-US, es-ES, etc.) |
| Voice override | Different voice per language or segment |
| Pause control | Pause after each segment (milliseconds) |
| Venue presets | Optimized settings for YouTube, Udemy, Coursera |
| TTS parameters | Stability, similarity boost, style exaggeration |
Audio Manifest
When using vac tts, a manifest is generated with timing info:
{
"version": "1.0",
"language": "en-US",
"generatedAt": "2024-01-01T12:00:00Z",
"slides": [
{
"index": 0,
"audioFile": "slide_000.mp3",
"audioDurationMs": 5200,
"pauseDurationMs": 500,
"totalDurationMs": 5700
}
]
}
This manifest is used by vac video --manifest for precise slide timing.
How It Works
Pipeline Overview
vac supports two workflows:
Workflow A: Marp Slides - Full Pipeline (inline voiceovers)
presentation.md β Parse β TTS β Render β Record β Combine β video.mp4
Workflow B: Marp Slides - Two-Step (JSON transcript)
Step 1: transcript.json β vac slides tts β audio/{lang}/*.mp3 + manifest.json
Step 2: presentation.md + manifest.json β vac slides video β video/{lang}.mp4
Workflow C: Browser Demo with Voiceover
config.yaml β vac browser video β TTS + Record + Combine β demo.mp4
Detailed Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INPUT OPTIONS β
β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β A: presentation.md β OR β B: transcript.json (multi-language) β β
β β (inline voiceovers) β β + presentation.md β β
β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 1: Parse / Load Transcript β
β β’ A: Extract voiceover from HTML comments + parse [PAUSE:ms] β
β β’ B: Load transcript.json, select language, resolve voice config β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 2: Generate Audio (OmniVoice TTS) β
β β’ Send voiceover text to TTS provider (ElevenLabs) β
β β’ Apply voice settings (stability, similarity, style) β
β β’ Output: audio/{lang}/slide_000.mp3, slide_001.mp3, ... β
β β’ Output: audio/{lang}/manifest.json (timing for video recording) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 3: Render HTML (Marp CLI) β
β β’ Execute: marp presentation.md -o presentation.html --html β
β β’ Creates navigable HTML presentation with all slides β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 4: Record Slides (Browser + ffmpeg) β
β β’ Launch headless browser via Rod (Chromium) β
β β’ Load HTML presentation β
β β’ For each slide: β
β ββ Navigate to slide β
β ββ Record for: audioDurationMs + pauseDurationMs (from manifest) β
β ββ Save: video/slide_000.mp4, slide_001.mp4, ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 5: Combine Videos (ffmpeg) β
β β’ Concatenate all slide videos in sequence β
β β’ Optional: Apply crossfade transitions (--transition flag) β
β β’ Output: video.mp4 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 6: Export Individual Videos (Optional) β
β β’ Copy individual slide videos to output directory β
β β’ For Udemy courses: --output-individual ./lectures/ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step Details
| Step | Component | Tool | Input | Output |
|---|---|---|---|---|
| 1 | Parser | Go | slides.md |
Slides + voiceovers |
| 2 | TTS | OmniVoice (ElevenLabs) | Voiceover text | slide_*.mp3 |
| 3 | Renderer | Marp CLI | slides.md |
presentation.html |
| 4 | Recorder | Rod + ffmpeg | HTML + MP3 | slide_*.mp4 |
| 5 | Combiner | ffmpeg | slide_*.mp4 |
output.mp4 |
| 6 | Exporter | Go | slide_*.mp4 |
Individual files |
Architecture
vac/
βββ cmd/vac/ # CLI (Cobra-based)
β βββ main.go # Entry point
β βββ root.go # Root command
β βββ tts.go # TTS subcommand
β βββ video.go # Video subcommand
βββ pkg/
β βββ parser/ # Marp markdown parser
β βββ transcript/ # JSON transcript types
β βββ tts/ # TTS generation + manifest
β βββ omnivoice/ # OmniVoice TTS/STT provider wrappers
β βββ renderer/ # Marp HTML renderer & browser control
β βββ audio/ # Audio utilities
β βββ video/ # Video recording & combination
β βββ orchestrator/ # Main workflow coordinator
βββ examples/ # Example presentations
β βββ intro/ # Self-documenting example
β βββ presentation.md
β βββ transcript.json
β βββ README.md
βββ docs/ # MkDocs documentation
Platform-Specific Recording
macOS (including Apple Silicon M1/M2/M3)
Fully compatible with Apple Silicon Macs. Uses avfoundation for screen capture:
ffmpeg -f avfoundation -i "<device>:none" ...
Required Permissions
Screen Recording permission is required. Before running vac, grant permission to your terminal app:
- Open System Settings (or System Preferences on older macOS)
- Navigate to Privacy & Security > Screen Recording
- Enable your terminal application (Terminal, iTerm2, VS Code, etc.)
- Restart the terminal after granting permission
Without this permission, ffmpeg will fail with "Could not find video device" or similar errors.
Screen Device Auto-Detection
The tool automatically detects the correct screen capture device. On Macs with external displays or connected iPhones, the device number varies. To list available devices:
ffmpeg -f avfoundation -list_devices true -i ""
You can manually specify the device if needed:
vac video --input slides.md --output video.mp4 --screen-device "4:none"
Linux
Uses x11grab for screen capture:
ffmpeg -f x11grab -i ":0.0" ...
Windows
Uses gdigrab for screen capture:
ffmpeg -f gdigrab -i "desktop" ...
Output Format & Platform Compatibility
Videos are encoded with settings optimized for direct upload to YouTube and Udemy - no re-encoding required.
Video Specifications
| Setting | Value | Notes |
|---|---|---|
| Container | MP4 | Universal compatibility |
| Video Codec | H.264 (libx264) | Required by YouTube & Udemy |
| Resolution | 1920x1080 | Full HD (configurable) |
| Frame Rate | 30fps | Standard (configurable) |
| Quality | CRF 23 | Good quality/size balance |
| Pixel Format | yuv420p | Maximum compatibility |
| Audio Codec | AAC | Required by both platforms |
| Audio Bitrate | 192kbps | Clear speech audio |
YouTube Upload
The combined video (--output) is ready for direct upload:
- Includes optional crossfade transitions (
--transition 0.5) - Single file containing all slides with narration
- No processing or re-encoding needed
Udemy Upload
Individual slide videos (--output-individual) are designed for Udemy courses:
- Each slide saved as separate file (slide_000.mp4, slide_001.mp4, etc.)
- Upload as individual lectures in your course curriculum
- Sequential naming for easy organization
Tip for Udemy: Udemy recommends lectures be 2+ minutes. For short slides, consider:
- Adding longer pause directives (
[PAUSE:5000]) - Combining related slides into single lectures
- Using more detailed voiceover scripts
Examples
The examples/ directory contains self-contained examples:
examples/
βββ intro/ # Introduction to vac
β βββ presentation.md # Marp markdown source (13 slides)
β βββ transcript.json # Multi-language transcript (en-US, en-GB, es-ES)
β βββ README.md # Detailed usage instructions
β βββ audio/ # Generated audio (after running tts)
β βββ en-US/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ es-ES/
β βββ manifest.json
β βββ slide_*.mp3
βββ README.md
Running the Intro Example
Option A: Full pipeline (inline voiceovers)
vac slides video \
--input examples/intro/presentation.md \
--output examples/intro/output.mp4
Option B: Two-step with transcript (multi-language)
# Generate audio for English
vac slides tts \
--transcript examples/intro/transcript.json \
--output examples/intro/audio/en-US/ \
--lang en-US
# Generate video
vac slides video \
--input examples/intro/presentation.md \
--manifest examples/intro/audio/en-US/manifest.json \
--output examples/intro/video/en-US.mp4
# Generate Spanish version
vac slides tts \
--transcript examples/intro/transcript.json \
--output examples/intro/audio/es-ES/ \
--lang es-ES
vac slides video \
--input examples/intro/presentation.md \
--manifest examples/intro/audio/es-ES/manifest.json \
--output examples/intro/video/es-ES.mp4
The intro example is a self-documenting presentation that explains what vac does - using vac itself.
Additional Example
See example_presentation.md for a complete example with:
- Custom Marp theme
- Voiceover comments on each slide
- Pause directives for timing
Troubleshooting
"ffmpeg not found"
Install ffmpeg using your package manager (see Prerequisites)
"marp CLI not found"
Install Marp CLI: npm install -g @marp-team/marp-cli
"ElevenLabs API error"
- Verify your API key is correct
- Check your ElevenLabs account has sufficient credits
- Ensure you have access to the voice ID you specified
Subtitle burning fails (--subtitles-burn)
The --subtitles-burn flag requires FFmpeg compiled with libass support. If you see an error like "FFmpeg subtitles filter not available", your FFmpeg installation needs to be updated.
Check if your FFmpeg has subtitle support:
ffmpeg -filters 2>&1 | grep subtitles
If nothing is returned, install FFmpeg with libass:
# macOS: Use homebrew-ffmpeg tap (includes libass by default)
brew uninstall ffmpeg
brew tap homebrew-ffmpeg/ffmpeg
brew install homebrew-ffmpeg/ffmpeg/ffmpeg
# Linux (Ubuntu/Debian)
sudo apt install ffmpeg libass-dev
# Verify installation
ffmpeg -filters 2>&1 | grep subtitles
# Should show: subtitles V->V Render text subtitles...
Alternative: Use --subtitles without --subtitles-burn to generate a separate .srt file that video players can load.
Recording issues on macOS
"recording failed: exit status 1" or ffmpeg errors:
-
Grant Screen Recording permission (most common issue):
- Go to System Settings > Privacy & Security > Screen Recording
- Enable your terminal app (Terminal, iTerm2, VS Code, etc.)
- Restart your terminal after granting permission
-
Verify ffmpeg can access the screen:
ffmpeg -f avfoundation -list_devices true -i ""You should see "Capture screen 0" or similar in the output.
-
Use verbose mode to see ffmpeg errors:
vac video --input slides.md --output video.mp4 --verbose -
Other tips:
- Ensure the browser window is visible during recording
- Try reducing video resolution if performance is poor
- Manually specify screen device with
--screen-device "1:none"
Development
Running Tests
go test ./...
Building
go build -o bin/vac ./cmd/vac
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details
Acknowledgments
- Marp - Markdown presentation ecosystem
- OmniVoice - Unified TTS/STT provider interface
- ElevenLabs - AI voice generation (TTS)
- Deepgram - Speech-to-text (STT) for subtitles
- Rod - Browser automation framework
- ffmpeg - Multimedia processing
Related Projects
- Marptalk - Node.js-based Marp-to-video tool using Google Cloud TTS. Features browser-based TTS fallback for quick iteration without API costs, YouTube chapter markers generation, and LLM-assisted presentation drafting.
Roadmap
- Custom voice settings (stability, similarity, style)
- Video transitions between slides
- Individual slide video export (for Udemy)
- JSON transcript for multi-language support
- Decoupled TTS workflow (separate audio generation)
- Audio manifest with timing information
- Progress bar during conversion
- Add subtitle/caption generation
- Browser demo recording with voiceover
- Multi-language video generation
- TTS audio caching for faster iterations
- Support for background music
- Batch processing of multiple presentations
- Web UI for easier configuration
- Export to different video formats
- Avatar integration (HeyGen, Synthesia)
Directories
ΒΆ
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
vac
command
|
|
|
pkg
|
|
|
media
Package media provides utilities for working with audio and video files.
|
Package media provides utilities for working with audio and video files. |
|
omnivoice/stt
Package stt provides OmniVoice-based speech-to-text for marp2video.
|
Package stt provides OmniVoice-based speech-to-text for marp2video. |
|
omnivoice/tts
Package tts provides OmniVoice-based text-to-speech for marp2video.
|
Package tts provides OmniVoice-based text-to-speech for marp2video. |
|
segment
Package segment provides abstractions for content units that can be rendered to video.
|
Package segment provides abstractions for content units that can be rendered to video. |
|
source
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML).
|
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML). |