mdextract

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2026 License: MIT Imports: 4 Imported by: 0

README

mdextract

A Go module for extracting content under specific headings from markdown documents.

Features

  • Extract content under any markdown heading (# through ######)
  • Support for both string and stream input
  • Case-insensitive heading matching
  • Preserves formatting, code blocks, lists, and other markdown elements
  • Stops extraction at next heading of same or higher level
  • List all headings in a document

Installation

go get github.com/subhash/mdextract

Usage

Basic Example
package main

import (
    "fmt"
    "log"
    
    "github.com/subhash/mdextract"
)

func main() {
    markdown := `# My Document

## Introduction

This is the introduction section.
It has multiple paragraphs.

## Features

- Feature 1
- Feature 2
- Feature 3

## Conclusion

Final thoughts here.`

    extractor := mdextract.New(markdown)
    
    // Extract content under "## Features"
    content, err := extractor.GetContent("## Features")
    if err != nil {
        log.Fatal(err)
    }
    
    fmt.Println(content)
    // Output:
    // - Feature 1
    // - Feature 2
    // - Feature 3
}
Extract from Stream
package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    
    "github.com/subhash/mdextract"
)

func main() {
    file, err := os.Open("document.md")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()
    
    scanner := bufio.NewScanner(file)
    extractor := mdextract.NewFromStream(scanner)
    
    content, err := extractor.GetContent("## Installation")
    if err != nil {
        log.Fatal(err)
    }
    
    fmt.Println(content)
}
Get All Headings
extractor := mdextract.New(markdown)
headings := extractor.GetAllHeadings()

for _, heading := range headings {
    fmt.Println(heading)
}
Nested Headings

When extracting content under a heading, all lower-level headings are included until a heading of the same or higher level is encountered:

markdown := `## Section 1

Content before subsection.

### Subsection 1.1

Subsection content.

### Subsection 1.2

More subsection content.

## Section 2

Different section.`

extractor := mdextract.New(markdown)
content, _ := extractor.GetContent("## Section 1")

fmt.Println(content)
// Output:
// Content before subsection.
// 
// ### Subsection 1.1
// 
// Subsection content.
// 
// ### Subsection 1.2
// 
// More subsection content.

API

New(markdown string) *Extractor

Creates a new Extractor from a markdown string.

NewFromStream(scanner *bufio.Scanner) *Extractor

Creates a new Extractor from a buffered scanner (useful for reading from files or streams).

GetContent(heading string) (string, error)

Extracts content under a specific heading until the next heading of the same or higher level.

  • heading: The heading to search for (e.g., "## Section Name")
  • Returns: The content without the heading itself, or an error if the heading is not found
  • Heading matching is case-insensitive
  • Content extraction stops at the next heading of equal or higher level
GetAllHeadings() []string

Returns all headings found in the document.

Testing

Run the test suite:

go test

Run with verbose output:

go test -v

Run benchmarks:

go test -bench=.

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor provides methods to extract content from markdown documents

func New

func New(markdown string) *Extractor

New creates a new Extractor from a markdown string

func NewFromStream

func NewFromStream(scanner *bufio.Scanner) *Extractor

NewFromStream creates a new Extractor from a stream (io.Reader)

func (*Extractor) GetAllHeadings

func (e *Extractor) GetAllHeadings() []string

GetAllHeadings returns all headings in the document

func (*Extractor) GetContent

func (e *Extractor) GetContent(heading string) (string, error)

GetContent extracts content under a specific heading until the next heading of the same or higher level heading should be in the format "# Heading", "## Heading", etc. Returns the content without the heading itself, or an error if the heading is not found

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL