marp2video

module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2026 License: MIT

README ΒΆ

marp2video

Build Status Lint Status Go Report Card Docs Visualization License

Convert Marp presentations with voiceovers to video files.

This tool takes a Marp markdown presentation with voiceover text (inline comments or JSON transcript), generates speech using text-to-speech (TTS), and creates a synchronized video recording of the presentation with optional subtitles.

Powered by OmniVoice - a unified interface for TTS/STT providers. Tested with:

  • ElevenLabs - Known for high-quality AI voices (TTS and STT available)
  • Deepgram - Known for fast, accurate transcription (STT and TTS available)

Both providers offer TTS and STT capabilities. You can use either one for both functions, though marp2video defaults to ElevenLabs for voice generation and Deepgram for subtitle transcription based on their respective strengths.

Features

  • πŸ“ Parse Marp presentations with voiceover in HTML comments
  • 🌐 JSON transcript support for multi-language voiceovers
  • πŸŽ™οΈ Text-to-speech via OmniVoice (ElevenLabs, Deepgram)
  • πŸ—£οΈ Multi-language support with per-slide voice configuration
  • πŸ–ΌοΈ Image-based rendering using Marp PNG export for reliable output
  • 🎬 Video generation with synchronized audio using ffmpeg
  • πŸ’» Cross-platform support (macOS, Linux, Windows)
  • ⏱️ Pause directives like [PAUSE:1000] for timing control
  • βš™οΈ Full orchestration - entire process automated in Go
  • ▢️ YouTube-ready combined video output with optional transitions
  • πŸŽ“ Udemy-ready individual slide videos for course lectures
  • πŸ”„ Decoupled workflow - generate audio and video separately
  • πŸ“œ Subtitle generation - SRT/VTT from audio via OmniVoice (Deepgram STT)
  • 🌐 Browser demo recording - automated browser interactions with voiceover
  • 🎯 TTS audio caching - reuse generated audio across runs
  • ⚑ Hardware acceleration - fast encoding with VideoToolbox (macOS)

Installation

Prerequisites
  1. Go 1.21+

    go version
    
  2. ffmpeg (for video recording and processing)

    # macOS
    brew install ffmpeg
    
    # Linux
    sudo apt install ffmpeg
    
    # Windows
    # Download from https://ffmpeg.org/download.html
    
  3. Marp CLI (for rendering presentations)

    npm install -g @marp-team/marp-cli
    
  4. ElevenLabs API Key (for TTS)

    • Sign up at ElevenLabs
    • Get your API key from the dashboard
  5. Deepgram API Key (for subtitle generation)

    • Sign up at Deepgram
    • Get your API key from the console
Build from Source
git clone https://github.com/grokify/marp2video
cd marp2video
go build -o bin/marp2video ./cmd/marp2video

Usage

marp2video provides two main command groups:

Marp Slides:

  • marp2video slides video - Full pipeline: parse slides, generate TTS, record, combine
  • marp2video slides tts - Generate audio from JSON transcript

Browser Recording:

  • marp2video browser video - Record browser demo with TTS voiceover
  • marp2video browser record - Record browser demo (silent, no audio)

Utilities:

  • marp2video subtitle - Generate subtitles from audio using STT
Quick Start (Full Pipeline)
# Set API keys
export ELEVENLABS_API_KEY="your-elevenlabs-key"  # For TTS
export DEEPGRAM_API_KEY="your-deepgram-key"      # For subtitles (optional)

# Using inline voiceover comments
marp2video slides video --input slides.md --output video.mp4
# Step 1: Generate audio from transcript
marp2video slides tts --transcript transcript.json --output audio/en-US/ --lang en-US

# Step 2: Generate video with pre-generated audio
marp2video slides video --input slides.md --manifest audio/en-US/manifest.json --output video/en-US.mp4
Command: marp2video slides tts

Generate audio files from a JSON transcript.

marp2video slides tts [flags]

Flags:
  -t, --transcript string   Transcript JSON file (required)
  -o, --output string       Output directory for audio files (default "audio")
  -l, --lang string         Language/locale code (e.g., en-US, es-ES)
      --provider string     TTS provider: elevenlabs or deepgram

Output:

  • audio/{lang}/slide_000.mp3, slide_001.mp3, ... (one per slide)
  • audio/{lang}/manifest.json (timing information for video recording)

Example:

# Generate audio for Spanish
marp2video slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
Command: marp2video slides video

Generate video from Marp presentation.

marp2video slides video [flags]

Flags:
  -i, --input string              Input Marp markdown file (required)
  -o, --output string             Output video file (default "output.mp4")
  -m, --manifest string           Audio manifest file (from 'marp2video slides tts')
  -k, --api-key string            ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
  -v, --voice string              ElevenLabs voice ID (default: Adam)
      --width int                 Video width (default 1920)
      --height int                Video height (default 1080)
      --fps int                   Frame rate (default 30)
      --transition float          Transition duration in seconds
      --subtitles string          Subtitle file to embed (SRT or VTT)
      --subtitles-lang string     Subtitle language code (auto-detected from filename)
      --output-individual string  Directory for individual slide videos
      --workdir string            Working directory for temp files
      --screen-device string      Screen capture device (macOS)
      --check                     Check dependencies and exit
Command: marp2video subtitle

Generate subtitle files (SRT/VTT) from audio files using speech-to-text.

marp2video subtitle [flags]

Flags:
  -a, --audio string        Audio directory containing manifest.json (required)
  -o, --output string       Output directory for subtitle files (default "subtitles")
  -l, --lang string         Language code (auto-detected from manifest if not specified)
      --provider string     STT provider: deepgram or elevenlabs (default: deepgram)
      --individual          Also generate individual subtitle files per slide

Output:

  • subtitles/{lang}.srt - SRT format subtitle file
  • subtitles/{lang}.vtt - WebVTT format subtitle file

Example:

# Generate French subtitles (language auto-detected from manifest)
marp2video subtitle --audio audio/fr-FR/

# Generate with explicit language and custom output
marp2video subtitle --audio audio/zh-Hans/ --lang zh-Hans --output subs/
Command: marp2video browser video

Record browser-driven demos with AI-generated voiceover. This command automates browser interactions (navigation, clicks, scrolling) while generating synchronized narration.

marp2video browser video [flags]

Flags:
  -c, --config string           Configuration file (YAML/JSON) with browser segments (required)
  -o, --output string           Output video file (default "output.mp4")
  -a, --audio-dir string        Save/reuse audio tracks in this directory (per-language subdirs)
  -p, --provider string         TTS provider: elevenlabs or deepgram (default: auto-detect)
  -v, --voice string            TTS voice ID (default: from config or provider default)
  -l, --lang string             Languages to generate, comma-separated (default "en-US")
      --elevenlabs-api-key      ElevenLabs API key (or use ELEVENLABS_API_KEY env var)
      --deepgram-api-key        Deepgram API key (or use DEEPGRAM_API_KEY env var)
      --width int               Video width (default 1920)
      --height int              Video height (default 1080)
      --fps int                 Video frame rate (default 30)
      --headless                Run browser in headless mode
      --transition float        Transition duration between segments (seconds)
      --subtitles               Generate subtitles from voiceover timing (no STT)
      --subtitles-stt           Generate word-level subtitles using STT (requires API)
      --subtitles-burn          Burn subtitles into video (permanent, requires FFmpeg with libass)
      --no-audio                Generate video without audio (TTS still used for timing/subtitles)
      --fast                    Use hardware-accelerated encoding (VideoToolbox on macOS)
      --limit int               Limit to first N segments (for testing)
      --limit-steps int         Limit browser segments to first N steps (for testing)
      --workdir string          Working directory for temp files
Command: marp2video browser record

Record browser session without audio (silent recording).

marp2video browser record [flags]

Flags:
  -c, --config string     Configuration file (YAML/JSON) with segments
  -s, --steps string      Steps file (JSON/YAML) defining browser actions
  -u, --url string        Starting URL for the browser
  -o, --output string     Output video file (default "recording.mp4")
      --width int         Browser viewport width (default 1920)
      --height int        Browser viewport height (default 1080)
      --fps int           Video frame rate (default 30)
      --headless          Run browser in headless mode
  -t, --timing string     Output timing JSON file for transcript sync
      --workdir string    Working directory for temp files

Key Features:

  • Multi-language support: Generate videos in multiple languages with --lang en-US,fr-FR,zh-Hans
  • Audio caching: Use --audio-dir to cache TTS audio and skip regeneration on subsequent runs
  • Pace to longest language: Video timing automatically matches the longest audio across all languages
  • Per-voiceover timing: Each browser step is paced to its corresponding voiceover duration
  • Subtitle generation: Create SRT/VTT subtitles from voiceover timing or word-level STT

Example Config (demo.yaml):

metadata:
  title: "Product Demo"
  defaultLanguage: "en-US"

defaultVoice:
  provider: "elevenlabs"
  voiceId: "pNInz6obpgDQGcFmaJgB"

segments:
  - id: "segment_000"
    type: "browser"
    browser:
      url: "https://example.com"
      steps:
        - action: "wait"
          duration: 1000
          voiceover:
            en-US: "Welcome to our product demo."
            fr-FR: "Bienvenue dans notre dΓ©monstration."
        - action: "click"
          selector: "#login-button"
          voiceover:
            en-US: "Click the login button to get started."
            fr-FR: "Cliquez sur le bouton de connexion."
        - action: "scroll"
          scrollY: 500
          voiceover:
            en-US: "Scroll down to see more features."
            fr-FR: "Faites dΓ©filer pour voir plus de fonctionnalitΓ©s."

Example Usage:

# Basic browser demo recording
marp2video browser video --config demo.yaml --output demo.mp4

# Multi-language with audio caching
marp2video browser video --config demo.yaml --output demo.mp4 \
  --audio-dir ./audio --lang en-US,fr-FR,zh-Hans

# With subtitles burned into video (requires FFmpeg with libass)
marp2video browser video --config demo.yaml --output demo.mp4 \
  --subtitles --subtitles-burn

# Video with burned subtitles but no audio (for silent demos)
marp2video browser video --config demo.yaml --output demo.mp4 \
  --subtitles --subtitles-burn --no-audio

# Using Deepgram instead of ElevenLabs
marp2video browser video --config demo.yaml --output demo.mp4 \
  --provider deepgram

# Silent browser recording (no audio)
marp2video browser record --url https://example.com --steps demo.json --output demo.mp4

# Fast encoding with hardware acceleration (macOS VideoToolbox)
marp2video browser video --config demo.yaml --output demo.mp4 --fast

# Test first 2 segments only (faster iteration)
marp2video browser video --config demo.yaml --output demo.mp4 --limit 2

# Test first 3 steps of browser segment (faster iteration)
marp2video browser video --config demo.yaml --output demo.mp4 --limit-steps 3

Output Structure:

When using --audio-dir and multiple languages:

project/
β”œβ”€β”€ demo.yaml
β”œβ”€β”€ demo.mp4              # Primary language video
β”œβ”€β”€ demo_fr-FR.mp4        # French version
β”œβ”€β”€ demo_zh-Hans.mp4      # Chinese version
β”œβ”€β”€ demo.srt              # Subtitles (if --subtitles)
└── audio/
    β”œβ”€β”€ en-US/
    β”‚   β”œβ”€β”€ segment_000.mp3
    β”‚   β”œβ”€β”€ segment_000.json    # Cached timing metadata
    β”‚   └── combined.mp3
    β”œβ”€β”€ fr-FR/
    β”‚   └── ...
    └── zh-Hans/
        └── ...
Examples

Full pipeline with inline voiceovers:

marp2video slides video \
  --input presentation.md \
  --output youtube_video.mp4 \
  --transition 0.5

Multi-language workflow:

# Step 1: Generate audio for each language (directory matches locale code)
marp2video slides tts --transcript transcript.json --output audio/en-US/ --lang en-US
marp2video slides tts --transcript transcript.json --output audio/es-ES/ --lang es-ES
marp2video slides tts --transcript transcript.json --output audio/zh-Hans/ --lang zh-Hans

# Step 2: Generate subtitles for each language (uses Deepgram STT)
marp2video subtitle --audio audio/en-US/
marp2video subtitle --audio audio/es-ES/
marp2video subtitle --audio audio/zh-Hans/

# Step 3: Generate videos with embedded subtitles
marp2video slides video --input slides.md --manifest audio/en-US/manifest.json \
  --output video/en-US.mp4 --subtitles subtitles/en-US.srt
marp2video slides video --input slides.md --manifest audio/es-ES/manifest.json \
  --output video/es-ES.mp4 --subtitles subtitles/es-ES.srt
marp2video slides video --input slides.md --manifest audio/zh-Hans/manifest.json \
  --output video/zh-Hans.mp4 --subtitles subtitles/zh-Hans.srt

Directory structure (locale codes enable automation):

project/
β”œβ”€β”€ presentation.md
β”œβ”€β”€ transcript.json
β”œβ”€β”€ audio/
β”‚   β”œβ”€β”€ en-US/
β”‚   β”‚   β”œβ”€β”€ manifest.json
β”‚   β”‚   └── slide_*.mp3
β”‚   β”œβ”€β”€ es-ES/
β”‚   β”‚   β”œβ”€β”€ manifest.json
β”‚   β”‚   └── slide_*.mp3
β”‚   └── zh-Hans/
β”‚       β”œβ”€β”€ manifest.json
β”‚       └── slide_*.mp3
β”œβ”€β”€ subtitles/
β”‚   β”œβ”€β”€ en-US.srt
β”‚   β”œβ”€β”€ en-US.vtt
β”‚   β”œβ”€β”€ es-ES.srt
β”‚   β”œβ”€β”€ es-ES.vtt
β”‚   β”œβ”€β”€ zh-Hans.srt
β”‚   └── zh-Hans.vtt
└── video/
    β”œβ”€β”€ en-US.mp4
    β”œβ”€β”€ es-ES.mp4
    └── zh-Hans.mp4

Generate individual videos for Udemy:

marp2video slides video \
  --input presentation.md \
  --output combined.mp4 \
  --output-individual ./udemy_videos/
Check Dependencies
marp2video slides video --check

This will verify that all required tools (ffmpeg, marp) are installed.

Voiceover Formats

marp2video supports two voiceover formats:

  1. Inline HTML comments - Simple, single-language
  2. JSON transcript - Multi-language, advanced TTS control
Option 1: Inline Voiceover Comments

Add voiceover text in HTML comments before or after slide content:

---
marp: true
---

<!--
This is the voiceover for the first slide.
It will be converted to speech using ElevenLabs.
[PAUSE:1000]
You can add pause directives for timing control.
-->

# First Slide

This is the visible content

---

<!--
Voiceover for slide 2...
-->

# Second Slide

More content
Pause Directives

Use [PAUSE:milliseconds] to add pauses in the voiceover:

<!--
First sentence.
[PAUSE:1000]
Second sentence after a 1-second pause.
[PAUSE:2000]
Third sentence after a 2-second pause.
-->

The pause directives are automatically removed from the spoken text.

Option 2: JSON Transcript

For multi-language support and advanced TTS configuration, use a JSON transcript file:

{
  "version": "1.0",
  "metadata": {
    "title": "My Presentation",
    "defaultLanguage": "en-US",
    "defaultVoice": {
      "provider": "elevenlabs",
      "voiceId": "pNInz6obpgDQGcFmaJgB",
      "voiceName": "Adam",
      "model": "eleven_multilingual_v2",
      "stability": 0.5,
      "similarityBoost": 0.75
    },
    "defaultVenue": "youtube"
  },
  "slides": [
    {
      "index": 0,
      "title": "Title Slide",
      "transcripts": {
        "en-US": {
          "segments": [
            { "text": "Welcome to the presentation.", "pause": 500 },
            { "text": "Let's get started." }
          ]
        },
        "es-ES": {
          "voice": {
            "voiceId": "onwK4e9ZLuTAKqWW03F9",
            "voiceName": "Daniel"
          },
          "segments": [
            { "text": "Bienvenido a la presentaciΓ³n.", "pause": 500 },
            { "text": "Comencemos." }
          ]
        }
      }
    }
  ]
}
Transcript Features
Feature Description
Multi-language Per-slide transcripts for each locale (en-US, es-ES, etc.)
Voice override Different voice per language or segment
Pause control Pause after each segment (milliseconds)
Venue presets Optimized settings for YouTube, Udemy, Coursera
TTS parameters Stability, similarity boost, style exaggeration
Audio Manifest

When using marp2video tts, a manifest is generated with timing info:

{
  "version": "1.0",
  "language": "en-US",
  "generatedAt": "2024-01-01T12:00:00Z",
  "slides": [
    {
      "index": 0,
      "audioFile": "slide_000.mp3",
      "audioDurationMs": 5200,
      "pauseDurationMs": 500,
      "totalDurationMs": 5700
    }
  ]
}

This manifest is used by marp2video video --manifest for precise slide timing.

How It Works

Pipeline Overview

marp2video supports two workflows:

Workflow A: Marp Slides - Full Pipeline (inline voiceovers)

presentation.md β†’ Parse β†’ TTS β†’ Render β†’ Record β†’ Combine β†’ video.mp4

Workflow B: Marp Slides - Two-Step (JSON transcript)

Step 1: transcript.json β†’ marp2video slides tts β†’ audio/{lang}/*.mp3 + manifest.json
Step 2: presentation.md + manifest.json β†’ marp2video slides video β†’ video/{lang}.mp4

Workflow C: Browser Demo with Voiceover

config.yaml β†’ marp2video browser video β†’ TTS + Record + Combine β†’ demo.mp4
Detailed Pipeline
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  INPUT OPTIONS                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ A: presentation.md      β”‚ OR β”‚ B: transcript.json (multi-language) β”‚ β”‚
β”‚  β”‚    (inline voiceovers)  β”‚    β”‚    + presentation.md                β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 1: Parse / Load Transcript                                        β”‚
β”‚  β€’ A: Extract voiceover from HTML comments + parse [PAUSE:ms]           β”‚
β”‚  β€’ B: Load transcript.json, select language, resolve voice config       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 2: Generate Audio (OmniVoice TTS)                                 β”‚
β”‚  β€’ Send voiceover text to TTS provider (ElevenLabs)                     β”‚
β”‚  β€’ Apply voice settings (stability, similarity, style)                  β”‚
β”‚  β€’ Output: audio/{lang}/slide_000.mp3, slide_001.mp3, ...               β”‚
β”‚  β€’ Output: audio/{lang}/manifest.json (timing for video recording)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 3: Render HTML (Marp CLI)                                         β”‚
β”‚  β€’ Execute: marp presentation.md -o presentation.html --html            β”‚
β”‚  β€’ Creates navigable HTML presentation with all slides                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 4: Record Slides (Browser + ffmpeg)                               β”‚
β”‚  β€’ Launch headless browser via Rod (Chromium)                           β”‚
β”‚  β€’ Load HTML presentation                                               β”‚
β”‚  β€’ For each slide:                                                      β”‚
β”‚    β”œβ”€ Navigate to slide                                                 β”‚
β”‚    β”œβ”€ Record for: audioDurationMs + pauseDurationMs (from manifest)     β”‚
β”‚    └─ Save: video/slide_000.mp4, slide_001.mp4, ...                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 5: Combine Videos (ffmpeg)                                        β”‚
β”‚  β€’ Concatenate all slide videos in sequence                             β”‚
β”‚  β€’ Optional: Apply crossfade transitions (--transition flag)            β”‚
β”‚  β€’ Output: video.mp4                                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STEP 6: Export Individual Videos (Optional)                            β”‚
β”‚  β€’ Copy individual slide videos to output directory                     β”‚
β”‚  β€’ For Udemy courses: --output-individual ./lectures/                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Step Details
Step Component Tool Input Output
1 Parser Go slides.md Slides + voiceovers
2 TTS OmniVoice (ElevenLabs) Voiceover text slide_*.mp3
3 Renderer Marp CLI slides.md presentation.html
4 Recorder Rod + ffmpeg HTML + MP3 slide_*.mp4
5 Combiner ffmpeg slide_*.mp4 output.mp4
6 Exporter Go slide_*.mp4 Individual files

Architecture

marp2video/
β”œβ”€β”€ cmd/marp2video/          # CLI (Cobra-based)
β”‚   β”œβ”€β”€ main.go              # Entry point
β”‚   β”œβ”€β”€ root.go              # Root command
β”‚   β”œβ”€β”€ tts.go               # TTS subcommand
β”‚   └── video.go             # Video subcommand
β”œβ”€β”€ pkg/
β”‚   β”œβ”€β”€ parser/              # Marp markdown parser
β”‚   β”œβ”€β”€ transcript/          # JSON transcript types
β”‚   β”œβ”€β”€ tts/                 # TTS generation + manifest
β”‚   β”œβ”€β”€ omnivoice/           # OmniVoice TTS/STT provider wrappers
β”‚   β”œβ”€β”€ renderer/            # Marp HTML renderer & browser control
β”‚   β”œβ”€β”€ audio/               # Audio utilities
β”‚   β”œβ”€β”€ video/               # Video recording & combination
β”‚   └── orchestrator/        # Main workflow coordinator
β”œβ”€β”€ examples/                # Example presentations
β”‚   └── intro/               # Self-documenting example
β”‚       β”œβ”€β”€ presentation.md
β”‚       β”œβ”€β”€ transcript.json
β”‚       └── README.md
└── docs/                    # MkDocs documentation

Platform-Specific Recording

macOS (including Apple Silicon M1/M2/M3)

Fully compatible with Apple Silicon Macs. Uses avfoundation for screen capture:

ffmpeg -f avfoundation -i "<device>:none" ...
Required Permissions

Screen Recording permission is required. Before running marp2video, grant permission to your terminal app:

  1. Open System Settings (or System Preferences on older macOS)
  2. Navigate to Privacy & Security > Screen Recording
  3. Enable your terminal application (Terminal, iTerm2, VS Code, etc.)
  4. Restart the terminal after granting permission

Without this permission, ffmpeg will fail with "Could not find video device" or similar errors.

Screen Device Auto-Detection

The tool automatically detects the correct screen capture device. On Macs with external displays or connected iPhones, the device number varies. To list available devices:

ffmpeg -f avfoundation -list_devices true -i ""

You can manually specify the device if needed:

marp2video video --input slides.md --output video.mp4 --screen-device "4:none"
Linux

Uses x11grab for screen capture:

ffmpeg -f x11grab -i ":0.0" ...
Windows

Uses gdigrab for screen capture:

ffmpeg -f gdigrab -i "desktop" ...

Output Format & Platform Compatibility

Videos are encoded with settings optimized for direct upload to YouTube and Udemy - no re-encoding required.

Video Specifications
Setting Value Notes
Container MP4 Universal compatibility
Video Codec H.264 (libx264) Required by YouTube & Udemy
Resolution 1920x1080 Full HD (configurable)
Frame Rate 30fps Standard (configurable)
Quality CRF 23 Good quality/size balance
Pixel Format yuv420p Maximum compatibility
Audio Codec AAC Required by both platforms
Audio Bitrate 192kbps Clear speech audio
YouTube Upload

The combined video (--output) is ready for direct upload:

  • Includes optional crossfade transitions (--transition 0.5)
  • Single file containing all slides with narration
  • No processing or re-encoding needed
Udemy Upload

Individual slide videos (--output-individual) are designed for Udemy courses:

  • Each slide saved as separate file (slide_000.mp4, slide_001.mp4, etc.)
  • Upload as individual lectures in your course curriculum
  • Sequential naming for easy organization

Tip for Udemy: Udemy recommends lectures be 2+ minutes. For short slides, consider:

  • Adding longer pause directives ([PAUSE:5000])
  • Combining related slides into single lectures
  • Using more detailed voiceover scripts

Examples

The examples/ directory contains self-contained examples:

examples/
β”œβ”€β”€ intro/                    # Introduction to marp2video
β”‚   β”œβ”€β”€ presentation.md       # Marp markdown source (13 slides)
β”‚   β”œβ”€β”€ transcript.json       # Multi-language transcript (en-US, en-GB, es-ES)
β”‚   β”œβ”€β”€ README.md             # Detailed usage instructions
β”‚   └── audio/                # Generated audio (after running tts)
β”‚       β”œβ”€β”€ en-US/
β”‚       β”‚   β”œβ”€β”€ manifest.json
β”‚       β”‚   └── slide_*.mp3
β”‚       └── es-ES/
β”‚           β”œβ”€β”€ manifest.json
β”‚           └── slide_*.mp3
└── README.md
Running the Intro Example

Option A: Full pipeline (inline voiceovers)

marp2video slides video \
  --input examples/intro/presentation.md \
  --output examples/intro/output.mp4

Option B: Two-step with transcript (multi-language)

# Generate audio for English
marp2video slides tts \
  --transcript examples/intro/transcript.json \
  --output examples/intro/audio/en-US/ \
  --lang en-US

# Generate video
marp2video slides video \
  --input examples/intro/presentation.md \
  --manifest examples/intro/audio/en-US/manifest.json \
  --output examples/intro/video/en-US.mp4

# Generate Spanish version
marp2video slides tts \
  --transcript examples/intro/transcript.json \
  --output examples/intro/audio/es-ES/ \
  --lang es-ES

marp2video slides video \
  --input examples/intro/presentation.md \
  --manifest examples/intro/audio/es-ES/manifest.json \
  --output examples/intro/video/es-ES.mp4

The intro example is a self-documenting presentation that explains what marp2video does - using marp2video itself.

Additional Example

See example_presentation.md for a complete example with:

  • Custom Marp theme
  • Voiceover comments on each slide
  • Pause directives for timing

Troubleshooting

"ffmpeg not found"

Install ffmpeg using your package manager (see Prerequisites)

"marp CLI not found"

Install Marp CLI: npm install -g @marp-team/marp-cli

"ElevenLabs API error"
  • Verify your API key is correct
  • Check your ElevenLabs account has sufficient credits
  • Ensure you have access to the voice ID you specified
Subtitle burning fails (--subtitles-burn)

The --subtitles-burn flag requires FFmpeg compiled with libass support. If you see an error like "FFmpeg subtitles filter not available", your FFmpeg installation needs to be updated.

Check if your FFmpeg has subtitle support:

ffmpeg -filters 2>&1 | grep subtitles

If nothing is returned, install FFmpeg with libass:

# macOS: Use homebrew-ffmpeg tap (includes libass by default)
brew uninstall ffmpeg
brew tap homebrew-ffmpeg/ffmpeg
brew install homebrew-ffmpeg/ffmpeg/ffmpeg

# Linux (Ubuntu/Debian)
sudo apt install ffmpeg libass-dev

# Verify installation
ffmpeg -filters 2>&1 | grep subtitles
# Should show: subtitles V->V Render text subtitles...

Alternative: Use --subtitles without --subtitles-burn to generate a separate .srt file that video players can load.

Recording issues on macOS

"recording failed: exit status 1" or ffmpeg errors:

  1. Grant Screen Recording permission (most common issue):

    • Go to System Settings > Privacy & Security > Screen Recording
    • Enable your terminal app (Terminal, iTerm2, VS Code, etc.)
    • Restart your terminal after granting permission
  2. Verify ffmpeg can access the screen:

    ffmpeg -f avfoundation -list_devices true -i ""
    

    You should see "Capture screen 0" or similar in the output.

  3. Use verbose mode to see ffmpeg errors:

    marp2video video --input slides.md --output video.mp4 --verbose
    
  4. Other tips:

    • Ensure the browser window is visible during recording
    • Try reducing video resolution if performance is poor
    • Manually specify screen device with --screen-device "1:none"

Development

Running Tests
go test ./...
Building
go build -o bin/marp2video ./cmd/marp2video

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE file for details

Acknowledgments

  • Marp - Markdown presentation ecosystem
  • OmniVoice - Unified TTS/STT provider interface
  • ElevenLabs - AI voice generation (TTS)
  • Deepgram - Speech-to-text (STT) for subtitles
  • Rod - Browser automation framework
  • ffmpeg - Multimedia processing
  • Marptalk - Node.js-based Marp-to-video tool using Google Cloud TTS. Features browser-based TTS fallback for quick iteration without API costs, YouTube chapter markers generation, and LLM-assisted presentation drafting.

Roadmap

  • Custom voice settings (stability, similarity, style)
  • Video transitions between slides
  • Individual slide video export (for Udemy)
  • JSON transcript for multi-language support
  • Decoupled TTS workflow (separate audio generation)
  • Audio manifest with timing information
  • Progress bar during conversion
  • Add subtitle/caption generation
  • Browser demo recording with voiceover
  • Multi-language video generation
  • TTS audio caching for faster iterations
  • Support for background music
  • Batch processing of multiple presentations
  • Web UI for easier configuration
  • Export to different video formats
  • Avatar integration (HeyGen, Synthesia)

Directories ΒΆ

Path Synopsis
cmd
marp2video command
pkg
media
Package media provides utilities for working with audio and video files.
Package media provides utilities for working with audio and video files.
omnivoice/stt
Package stt provides OmniVoice-based speech-to-text for marp2video.
Package stt provides OmniVoice-based speech-to-text for marp2video.
omnivoice/tts
Package tts provides OmniVoice-based text-to-speech for marp2video.
Package tts provides OmniVoice-based text-to-speech for marp2video.
segment
Package segment provides abstractions for content units that can be rendered to video.
Package segment provides abstractions for content units that can be rendered to video.
source
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML).
Package source provides interfaces and implementations for loading content from various sources (Marp markdown, transcript JSON, config YAML).
tts

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL