README
ΒΆ
marp2video
Convert Marp presentations with voiceovers to video files.
This tool takes a Marp markdown presentation with voiceover text (inline comments or JSON transcript), generates speech using text-to-speech (TTS), and creates a synchronized video recording of the presentation with optional subtitles.
Powered by OmniVoice - a unified interface for TTS/STT providers. Tested with:
- ElevenLabs - Known for high-quality AI voices (TTS and STT available)
- Deepgram - Known for fast, accurate transcription (STT and TTS available)
Both providers offer TTS and STT capabilities. You can use either one for both functions, though marp2video defaults to ElevenLabs for voice generation and Deepgram for subtitle transcription based on their respective strengths.
Features
- π Parse Marp presentations with voiceover in HTML comments
- π JSON transcript support for multi-language voiceovers
- ποΈ Text-to-speech via OmniVoice (ElevenLabs, Deepgram)
- π£οΈ Multi-language support with per-slide voice configuration
- πΌοΈ Image-based rendering using Marp PNG export for reliable output
- π¬ Video generation with synchronized audio using ffmpeg
- π» Cross-platform support (macOS, Linux, Windows)
- β±οΈ Pause directives like
[PAUSE:1000]for timing control - βοΈ Full orchestration - entire process automated in Go
- βΆοΈ YouTube-ready combined video output with optional transitions
- π Udemy-ready individual slide videos for course lectures
- π Decoupled workflow - generate audio and video separately
- π Subtitle generation - SRT/VTT from audio via OmniVoice (Deepgram STT)
- π Browser demo recording - automated browser interactions with voiceover
- π― TTS audio caching - reuse generated audio across runs
- β‘ Hardware acceleration - fast encoding with VideoToolbox (macOS)
Installation
Prerequisites
-
Go 1.21+
go version -
ffmpeg (for video recording and processing)
# macOS brew install ffmpeg # Linux sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html -
Marp CLI (for rendering presentations)
npm install -g @marp-team/marp-cli -
ElevenLabs API Key (for TTS)
- Sign up at ElevenLabs
- Get your API key from the dashboard
-
Deepgram API Key (for subtitle generation)
- Sign up at Deepgram
- Get your API key from the console
Build from Source
git clone https://github.com/grokify/marp2video
cd marp2video
go build -o bin/marp2video ./cmd/marp2video
Usage
marp2video provides two main command groups:
Marp Slides:
marp2video slides video- Full pipeline: parse slides, generate TTS, record, combinemarp2video slides tts- Generate audio from JSON transcript
Browser Recording:
marp2video browser video- Record browser demo with TTS voiceovermarp2video browser record- Record browser demo (silent, no audio)
Utilities:
marp2video subtitle- Generate subtitles from audio using STT
Quick Start (Full Pipeline)
# Set API keys
export ELEVENLABS_API_KEY="your-elevenlabs-key" # For TTS
export DEEPGRAM_API_KEY="your-deepgram-key" # For subtitles (optional)
# Using inline voiceover comments
marp2video slides video --input slides.md --output video.mp4
Two-Step Workflow (Recommended for Multi-Language)
# Step 1: Generate audio from transcript
marp2video slides tts --transcript transcript.json --output audio/en-US/ --lang en-US
# Step 2: Generate video with pre-generated audio
marp2video slides video --input slides.md --manifest audio/en-US/manifest.json --output video/en-US.mp4
Command: marp2video slides tts
Generate audio files from a JSON transcript.
marp2video slides tts [flags]
Flags:
-t, --transcript string Transcript JSON file (required)
-o, --output string Output directory for audio files (default "audio")
-l, --lang string Language/locale code (e.g., en-US, es-ES)
--provider string TTS provider: elevenlabs or deepgram
Output:
audio/{lang}/slide_000.mp3,slide_001.mp3, ... (one per slide)audio/{lang}/manifest.json(timing information for video recording)
Example:
# Generate audio for Spanish
marp2video slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
Command: marp2video slides video
Generate video from Marp presentation.
marp2video slides video [flags]
Flags:
-i, --input string Input Marp markdown file (required)
-o, --output string Output video file (default "output.mp4")
-m, --manifest string Audio manifest file (from 'marp2video slides tts')
-k, --api-key string ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
-v, --voice string ElevenLabs voice ID (default: Adam)
--width int Video width (default 1920)
--height int Video height (default 1080)
--fps int Frame rate (default 30)
--transition float Transition duration in seconds
--subtitles string Subtitle file to embed (SRT or VTT)
--subtitles-lang string Subtitle language code (auto-detected from filename)
--output-individual string Directory for individual slide videos
--workdir string Working directory for temp files
--screen-device string Screen capture device (macOS)
--check Check dependencies and exit
Command: marp2video subtitle
Generate subtitle files (SRT/VTT) from audio files using speech-to-text.
marp2video subtitle [flags]
Flags:
-a, --audio string Audio directory containing manifest.json (required)
-o, --output string Output directory for subtitle files (default "subtitles")
-l, --lang string Language code (auto-detected from manifest if not specified)
--provider string STT provider: deepgram or elevenlabs (default: deepgram)
--individual Also generate individual subtitle files per slide
Output:
subtitles/{lang}.srt- SRT format subtitle filesubtitles/{lang}.vtt- WebVTT format subtitle file
Example:
# Generate French subtitles (language auto-detected from manifest)
marp2video subtitle --audio audio/fr-FR/
# Generate with explicit language and custom output
marp2video subtitle --audio audio/zh-Hans/ --lang zh-Hans --output subs/
Command: marp2video browser video
Record browser-driven demos with AI-generated voiceover. This command automates browser interactions (navigation, clicks, scrolling) while generating synchronized narration.
marp2video browser video [flags]
Flags:
-c, --config string Configuration file (YAML/JSON) with browser segments (required)
-o, --output string Output video file (default "output.mp4")
-a, --audio-dir string Save/reuse audio tracks in this directory (per-language subdirs)
-p, --provider string TTS provider: elevenlabs or deepgram (default: auto-detect)
-v, --voice string TTS voice ID (default: from config or provider default)
-l, --lang string Languages to generate, comma-separated (default "en-US")
--elevenlabs-api-key ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
--deepgram-api-key Deepgram API key (or use DEEPGRAM_API_KEY env var)
--width int Video width (default 1920)
--height int Video height (default 1080)
--fps int Video frame rate (default 30)
--headless Run browser in headless mode
--transition float Transition duration between segments (seconds)
--subtitles Generate subtitles from voiceover timing (no STT)
--subtitles-stt Generate word-level subtitles using STT (requires API)
--subtitles-burn Burn subtitles into video (permanent, requires FFmpeg with libass)
--no-audio Generate video without audio (TTS still used for timing/subtitles)
--fast Use hardware-accelerated encoding (VideoToolbox on macOS)
--limit int Limit to first N segments (for testing)
--limit-steps int Limit browser segments to first N steps (for testing)
--workdir string Working directory for temp files
Command: marp2video browser record
Record browser session without audio (silent recording).
marp2video browser record [flags]
Flags:
-c, --config string Configuration file (YAML/JSON) with segments
-s, --steps string Steps file (JSON/YAML) defining browser actions
-u, --url string Starting URL for the browser
-o, --output string Output video file (default "recording.mp4")
--width int Browser viewport width (default 1920)
--height int Browser viewport height (default 1080)
--fps int Video frame rate (default 30)
--headless Run browser in headless mode
-t, --timing string Output timing JSON file for transcript sync
--workdir string Working directory for temp files
Key Features:
- Multi-language support: Generate videos in multiple languages with
--lang en-US,fr-FR,zh-Hans - Audio caching: Use
--audio-dirto cache TTS audio and skip regeneration on subsequent runs - Pace to longest language: Video timing automatically matches the longest audio across all languages
- Per-voiceover timing: Each browser step is paced to its corresponding voiceover duration
- Subtitle generation: Create SRT/VTT subtitles from voiceover timing or word-level STT
Example Config (demo.yaml):
metadata:
title: "Product Demo"
defaultLanguage: "en-US"
defaultVoice:
provider: "elevenlabs"
voiceId: "pNInz6obpgDQGcFmaJgB"
segments:
- id: "segment_000"
type: "browser"
browser:
url: "https://example.com"
steps:
- action: "wait"
duration: 1000
voiceover:
en-US: "Welcome to our product demo."
fr-FR: "Bienvenue dans notre dΓ©monstration."
- action: "click"
selector: "#login-button"
voiceover:
en-US: "Click the login button to get started."
fr-FR: "Cliquez sur le bouton de connexion."
- action: "scroll"
scrollY: 500
voiceover:
en-US: "Scroll down to see more features."
fr-FR: "Faites dΓ©filer pour voir plus de fonctionnalitΓ©s."
Example Usage:
# Basic browser demo recording
marp2video browser video --config demo.yaml --output demo.mp4
# Multi-language with audio caching
marp2video browser video --config demo.yaml --output demo.mp4 \
--audio-dir ./audio --lang en-US,fr-FR,zh-Hans
# With subtitles burned into video (requires FFmpeg with libass)
marp2video browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn
# Video with burned subtitles but no audio (for silent demos)
marp2video browser video --config demo.yaml --output demo.mp4 \
--subtitles --subtitles-burn --no-audio
# Using Deepgram instead of ElevenLabs
marp2video browser video --config demo.yaml --output demo.mp4 \
--provider deepgram
# Silent browser recording (no audio)
marp2video browser record --url https://example.com --steps demo.json --output demo.mp4
# Fast encoding with hardware acceleration (macOS VideoToolbox)
marp2video browser video --config demo.yaml --output demo.mp4 --fast
# Test first 2 segments only (faster iteration)
marp2video browser video --config demo.yaml --output demo.mp4 --limit 2
# Test first 3 steps of browser segment (faster iteration)
marp2video browser video --config demo.yaml --output demo.mp4 --limit-steps 3
Output Structure:
When using --audio-dir and multiple languages:
project/
βββ demo.yaml
βββ demo.mp4 # Primary language video
βββ demo_fr-FR.mp4 # French version
βββ demo_zh-Hans.mp4 # Chinese version
βββ demo.srt # Subtitles (if --subtitles)
βββ audio/
βββ en-US/
β βββ segment_000.mp3
β βββ segment_000.json # Cached timing metadata
β βββ combined.mp3
βββ fr-FR/
β βββ ...
βββ zh-Hans/
βββ ...
Examples
Full pipeline with inline voiceovers:
marp2video slides video \
--input presentation.md \
--output youtube_video.mp4 \
--transition 0.5
Multi-language workflow:
# Step 1: Generate audio for each language (directory matches locale code)
marp2video slides tts --transcript transcript.json --output audio/en-US/ --lang en-US
marp2video slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
marp2video slides tts --transcript transcript.json --output audio/zh-Hans/ --lang zh-Hans
# Step 2: Generate subtitles for each language (uses Deepgram STT)
marp2video subtitle --audio audio/en-US/
marp2video subtitle --audio audio/es-ES/
marp2video subtitle --audio audio/zh-Hans/
# Step 3: Generate videos with embedded subtitles
marp2video slides video --input slides.md --manifest audio/en-US/manifest.json \
--output video/en-US.mp4 --subtitles subtitles/en-US.srt
marp2video slides video --input slides.md --manifest audio/es-ES/manifest.json \
--output video/es-ES.mp4 --subtitles subtitles/es-ES.srt
marp2video slides video --input slides.md --manifest audio/zh-Hans/manifest.json \
--output video/zh-Hans.mp4 --subtitles subtitles/zh-Hans.srt
Directory structure (locale codes enable automation):
project/
βββ presentation.md
βββ transcript.json
βββ audio/
β βββ en-US/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ es-ES/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ zh-Hans/
β βββ manifest.json
β βββ slide_*.mp3
βββ subtitles/
β βββ en-US.srt
β βββ en-US.vtt
β βββ es-ES.srt
β βββ es-ES.vtt
β βββ zh-Hans.srt
β βββ zh-Hans.vtt
βββ video/
βββ en-US.mp4
βββ es-ES.mp4
βββ zh-Hans.mp4
Generate individual videos for Udemy:
marp2video slides video \
--input presentation.md \
--output combined.mp4 \
--output-individual ./udemy_videos/
Check Dependencies
marp2video slides video --check
This will verify that all required tools (ffmpeg, marp) are installed.
Voiceover Formats
marp2video supports two voiceover formats:
- Inline HTML comments - Simple, single-language
- JSON transcript - Multi-language, advanced TTS control
Option 1: Inline Voiceover Comments
Add voiceover text in HTML comments before or after slide content:
---
marp: true
---
<!--
This is the voiceover for the first slide.
It will be converted to speech using ElevenLabs.
[PAUSE:1000]
You can add pause directives for timing control.
-->
# First Slide
This is the visible content
---
<!--
Voiceover for slide 2...
-->
# Second Slide
More content
Pause Directives
Use [PAUSE:milliseconds] to add pauses in the voiceover:
<!--
First sentence.
[PAUSE:1000]
Second sentence after a 1-second pause.
[PAUSE:2000]
Third sentence after a 2-second pause.
-->
The pause directives are automatically removed from the spoken text.
Option 2: JSON Transcript
For multi-language support and advanced TTS configuration, use a JSON transcript file:
{
"version": "1.0",
"metadata": {
"title": "My Presentation",
"defaultLanguage": "en-US",
"defaultVoice": {
"provider": "elevenlabs",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"voiceName": "Adam",
"model": "eleven_multilingual_v2",
"stability": 0.5,
"similarityBoost": 0.75
},
"defaultVenue": "youtube"
},
"slides": [
{
"index": 0,
"title": "Title Slide",
"transcripts": {
"en-US": {
"segments": [
{ "text": "Welcome to the presentation.", "pause": 500 },
{ "text": "Let's get started." }
]
},
"es-ES": {
"voice": {
"voiceId": "onwK4e9ZLuTAKqWW03F9",
"voiceName": "Daniel"
},
"segments": [
{ "text": "Bienvenido a la presentaciΓ³n.", "pause": 500 },
{ "text": "Comencemos." }
]
}
}
}
]
}
Transcript Features
| Feature | Description |
|---|---|
| Multi-language | Per-slide transcripts for each locale (en-US, es-ES, etc.) |
| Voice override | Different voice per language or segment |
| Pause control | Pause after each segment (milliseconds) |
| Venue presets | Optimized settings for YouTube, Udemy, Coursera |
| TTS parameters | Stability, similarity boost, style exaggeration |
Audio Manifest
When using marp2video tts, a manifest is generated with timing info:
{
"version": "1.0",
"language": "en-US",
"generatedAt": "2024-01-01T12:00:00Z",
"slides": [
{
"index": 0,
"audioFile": "slide_000.mp3",
"audioDurationMs": 5200,
"pauseDurationMs": 500,
"totalDurationMs": 5700
}
]
}
This manifest is used by marp2video video --manifest for precise slide timing.
How It Works
Pipeline Overview
marp2video supports two workflows:
Workflow A: Marp Slides - Full Pipeline (inline voiceovers)
presentation.md β Parse β TTS β Render β Record β Combine β video.mp4
Workflow B: Marp Slides - Two-Step (JSON transcript)
Step 1: transcript.json β marp2video slides tts β audio/{lang}/*.mp3 + manifest.json
Step 2: presentation.md + manifest.json β marp2video slides video β video/{lang}.mp4
Workflow C: Browser Demo with Voiceover
config.yaml β marp2video browser video β TTS + Record + Combine β demo.mp4
Detailed Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INPUT OPTIONS β
β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β A: presentation.md β OR β B: transcript.json (multi-language) β β
β β (inline voiceovers) β β + presentation.md β β
β βββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 1: Parse / Load Transcript β
β β’ A: Extract voiceover from HTML comments + parse [PAUSE:ms] β
β β’ B: Load transcript.json, select language, resolve voice config β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 2: Generate Audio (OmniVoice TTS) β
β β’ Send voiceover text to TTS provider (ElevenLabs) β
β β’ Apply voice settings (stability, similarity, style) β
β β’ Output: audio/{lang}/slide_000.mp3, slide_001.mp3, ... β
β β’ Output: audio/{lang}/manifest.json (timing for video recording) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 3: Render HTML (Marp CLI) β
β β’ Execute: marp presentation.md -o presentation.html --html β
β β’ Creates navigable HTML presentation with all slides β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 4: Record Slides (Browser + ffmpeg) β
β β’ Launch headless browser via Rod (Chromium) β
β β’ Load HTML presentation β
β β’ For each slide: β
β ββ Navigate to slide β
β ββ Record for: audioDurationMs + pauseDurationMs (from manifest) β
β ββ Save: video/slide_000.mp4, slide_001.mp4, ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 5: Combine Videos (ffmpeg) β
β β’ Concatenate all slide videos in sequence β
β β’ Optional: Apply crossfade transitions (--transition flag) β
β β’ Output: video.mp4 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 6: Export Individual Videos (Optional) β
β β’ Copy individual slide videos to output directory β
β β’ For Udemy courses: --output-individual ./lectures/ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step Details
| Step | Component | Tool | Input | Output |
|---|---|---|---|---|
| 1 | Parser | Go | slides.md |
Slides + voiceovers |
| 2 | TTS | OmniVoice (ElevenLabs) | Voiceover text | slide_*.mp3 |
| 3 | Renderer | Marp CLI | slides.md |
presentation.html |
| 4 | Recorder | Rod + ffmpeg | HTML + MP3 | slide_*.mp4 |
| 5 | Combiner | ffmpeg | slide_*.mp4 |
output.mp4 |
| 6 | Exporter | Go | slide_*.mp4 |
Individual files |
Architecture
marp2video/
βββ cmd/marp2video/ # CLI (Cobra-based)
β βββ main.go # Entry point
β βββ root.go # Root command
β βββ tts.go # TTS subcommand
β βββ video.go # Video subcommand
βββ pkg/
β βββ parser/ # Marp markdown parser
β βββ transcript/ # JSON transcript types
β βββ tts/ # TTS generation + manifest
β βββ omnivoice/ # OmniVoice TTS/STT provider wrappers
β βββ renderer/ # Marp HTML renderer & browser control
β βββ audio/ # Audio utilities
β βββ video/ # Video recording & combination
β βββ orchestrator/ # Main workflow coordinator
βββ examples/ # Example presentations
β βββ intro/ # Self-documenting example
β βββ presentation.md
β βββ transcript.json
β βββ README.md
βββ docs/ # MkDocs documentation
Platform-Specific Recording
macOS (including Apple Silicon M1/M2/M3)
Fully compatible with Apple Silicon Macs. Uses avfoundation for screen capture:
ffmpeg -f avfoundation -i "<device>:none" ...
Required Permissions
Screen Recording permission is required. Before running marp2video, grant permission to your terminal app:
- Open System Settings (or System Preferences on older macOS)
- Navigate to Privacy & Security > Screen Recording
- Enable your terminal application (Terminal, iTerm2, VS Code, etc.)
- Restart the terminal after granting permission
Without this permission, ffmpeg will fail with "Could not find video device" or similar errors.
Screen Device Auto-Detection
The tool automatically detects the correct screen capture device. On Macs with external displays or connected iPhones, the device number varies. To list available devices:
ffmpeg -f avfoundation -list_devices true -i ""
You can manually specify the device if needed:
marp2video video --input slides.md --output video.mp4 --screen-device "4:none"
Linux
Uses x11grab for screen capture:
ffmpeg -f x11grab -i ":0.0" ...
Windows
Uses gdigrab for screen capture:
ffmpeg -f gdigrab -i "desktop" ...
Output Format & Platform Compatibility
Videos are encoded with settings optimized for direct upload to YouTube and Udemy - no re-encoding required.
Video Specifications
| Setting | Value | Notes |
|---|---|---|
| Container | MP4 | Universal compatibility |
| Video Codec | H.264 (libx264) | Required by YouTube & Udemy |
| Resolution | 1920x1080 | Full HD (configurable) |
| Frame Rate | 30fps | Standard (configurable) |
| Quality | CRF 23 | Good quality/size balance |
| Pixel Format | yuv420p | Maximum compatibility |
| Audio Codec | AAC | Required by both platforms |
| Audio Bitrate | 192kbps | Clear speech audio |
YouTube Upload
The combined video (--output) is ready for direct upload:
- Includes optional crossfade transitions (
--transition 0.5) - Single file containing all slides with narration
- No processing or re-encoding needed
Udemy Upload
Individual slide videos (--output-individual) are designed for Udemy courses:
- Each slide saved as separate file (slide_000.mp4, slide_001.mp4, etc.)
- Upload as individual lectures in your course curriculum
- Sequential naming for easy organization
Tip for Udemy: Udemy recommends lectures be 2+ minutes. For short slides, consider:
- Adding longer pause directives (
[PAUSE:5000]) - Combining related slides into single lectures
- Using more detailed voiceover scripts
Examples
The examples/ directory contains self-contained examples:
examples/
βββ intro/ # Introduction to marp2video
β βββ presentation.md # Marp markdown source (13 slides)
β βββ transcript.json # Multi-language transcript (en-US, en-GB, es-ES)
β βββ README.md # Detailed usage instructions
β βββ audio/ # Generated audio (after running tts)
β βββ en-US/
β β βββ manifest.json
β β βββ slide_*.mp3
β βββ es-ES/
β βββ manifest.json
β βββ slide_*.mp3
βββ README.md
Running the Intro Example
Option A: Full pipeline (inline voiceovers)
marp2video slides video \
--input examples/intro/presentation.md \
--output examples/intro/output.mp4
Option B: Two-step with transcript (multi-language)
# Generate audio for English
marp2video slides tts \
--transcript examples/intro/transcript.json \
--output examples/intro/audio/en-US/ \
--lang en-US
# Generate video
marp2video slides video \
--input examples/intro/presentation.md \
--manifest examples/intro/audio/en-US/manifest.json \
--output examples/intro/video/en-US.mp4
# Generate Spanish version
marp2video slides tts \
--transcript examples/intro/transcript.json \
--output examples/intro/audio/es-ES/ \
--lang es-ES
marp2video slides video \
--input examples/intro/presentation.md \
--manifest examples/intro/audio/es-ES/manifest.json \
--output examples/intro/video/es-ES.mp4
The intro example is a self-documenting presentation that explains what marp2video does - using marp2video itself.
Additional Example
See example_presentation.md for a complete example with:
- Custom Marp theme
- Voiceover comments on each slide
- Pause directives for timing
Troubleshooting
"ffmpeg not found"
Install ffmpeg using your package manager (see Prerequisites)
"marp CLI not found"
Install Marp CLI: npm install -g @marp-team/marp-cli
"ElevenLabs API error"
- Verify your API key is correct
- Check your ElevenLabs account has sufficient credits
- Ensure you have access to the voice ID you specified
Subtitle burning fails (--subtitles-burn)
The --subtitles-burn flag requires FFmpeg compiled with libass support. If you see an error like "FFmpeg subtitles filter not available", your FFmpeg installation needs to be updated.
Check if your FFmpeg has subtitle support:
ffmpeg -filters 2>&1 | grep subtitles
If nothing is returned, install FFmpeg with libass:
# macOS: Use homebrew-ffmpeg tap (includes libass by default)
brew uninstall ffmpeg
brew tap homebrew-ffmpeg/ffmpeg
brew install homebrew-ffmpeg/ffmpeg/ffmpeg
# Linux (Ubuntu/Debian)
sudo apt install ffmpeg libass-dev
# Verify installation
ffmpeg -filters 2>&1 | grep subtitles
# Should show: subtitles V->V Render text subtitles...
Alternative: Use --subtitles without --subtitles-burn to generate a separate .srt file that video players can load.
Recording issues on macOS
"recording failed: exit status 1" or ffmpeg errors:
-
Grant Screen Recording permission (most common issue):
- Go to System Settings > Privacy & Security > Screen Recording
- Enable your terminal app (Terminal, iTerm2, VS Code, etc.)
- Restart your terminal after granting permission
-
Verify ffmpeg can access the screen:
ffmpeg -f avfoundation -list_devices true -i ""You should see "Capture screen 0" or similar in the output.
-
Use verbose mode to see ffmpeg errors:
marp2video video --input slides.md --output video.mp4 --verbose -
Other tips:
- Ensure the browser window is visible during recording
- Try reducing video resolution if performance is poor
- Manually specify screen device with
--screen-device "1:none"
Development
Running Tests
go test ./...
Building
go build -o bin/marp2video ./cmd/marp2video
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details
Acknowledgments
- Marp - Markdown presentation ecosystem
- OmniVoice - Unified TTS/STT provider interface
- ElevenLabs - AI voice generation (TTS)
- Deepgram - Speech-to-text (STT) for subtitles
- Rod - Browser automation framework
- ffmpeg - Multimedia processing
Related Projects
- Marptalk - Node.js-based Marp-to-video tool using Google Cloud TTS. Features browser-based TTS fallback for quick iteration without API costs, YouTube chapter markers generation, and LLM-assisted presentation drafting.
Roadmap
- Custom voice settings (stability, similarity, style)
- Video transitions between slides
- Individual slide video export (for Udemy)
- JSON transcript for multi-language support
- Decoupled TTS workflow (separate audio generation)
- Audio manifest with timing information
- Progress bar during conversion
- Add subtitle/caption generation
- Browser demo recording with voiceover
- Multi-language video generation
- TTS audio caching for faster iterations
- Support for background music
- Batch processing of multiple presentations
- Web UI for easier configuration
- Export to different video formats
- Avatar integration (HeyGen, Synthesia)
Directories
ΒΆ
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
marp2video
command
|
|
|
pkg
|
|
|
media
Package media provides utilities for working with audio and video files.
|
Package media provides utilities for working with audio and video files. |
|
omnivoice/stt
Package stt provides OmniVoice-based speech-to-text for marp2video.
|
Package stt provides OmniVoice-based speech-to-text for marp2video. |
|
omnivoice/tts
Package tts provides OmniVoice-based text-to-speech for marp2video.
|
Package tts provides OmniVoice-based text-to-speech for marp2video. |
|
segment
Package segment provides abstractions for content units that can be rendered to video.
|
Package segment provides abstractions for content units that can be rendered to video. |
|
source
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML).
|
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML). |