docxsmith

module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2025 License: MIT

README

DocxSmith - The Document Forge

DocxSmith Gopher

A powerful and elegant Go library and CLI tool for manipulating .docx and .pdf files

CI Status Go Report Card Go Reference License

Features

DOCX Support
  • Create new .docx documents from scratch
  • Read and parse existing .docx files
  • Modify document content programmatically
  • Add paragraphs with rich formatting (bold, italic, colors, sizes)
  • Delete paragraphs or ranges of content
  • Find and replace text throughout documents
  • Tables support (create, modify, delete)
  • Extract text content from documents
PDF Support ✨ NEW
  • Create new PDF documents from scratch
  • Read and parse existing PDF files
  • Add text content with styling (bold, italic, colors, sizes)
  • Extract text from PDFs
  • Tables support in PDF generation
  • Metadata management (title, author, subject)
Format Conversion
  • Convert DOCX to PDF with formatting preservation
  • Convert PDF to DOCX for editing
Additional Features
  • CLI tool for command-line operations
  • Scalable architecture for easy extension
  • Well-tested with comprehensive test coverage

Installation

As a Library
go get github.com/Palaciodiego008/docxsmith
As a CLI Tool
go install github.com/Palaciodiego008/docxsmith/cmd/docxsmith@latest

Or build from source:

git clone https://github.com/Palaciodiego008/docxsmith.git
cd docxsmith
go build -o docxsmith ./cmd/docxsmith

Quick Start

Using as a Library
package main

import (
    "log"
    "github.com/Palaciodiego008/docxsmith/pkg/docx"
)

func main() {
    // Create a new document
    doc := docx.New()

    // Add content
    doc.AddParagraph("Welcome to DocxSmith!")
    doc.AddParagraph("This is bold text", docx.WithBold())
    doc.AddParagraph("This is colored text", docx.WithColor("FF0000"))

    // Save the document
    if err := doc.Save("output.docx"); err != nil {
        log.Fatal(err)
    }
}
Using the CLI
DOCX Operations
# Create a new document
docxsmith create -output hello.docx -text "Hello, World!"

# Add content to an existing document
docxsmith add -input hello.docx -output hello2.docx -text "New paragraph" -bold

# Find text in a document
docxsmith find -input hello.docx -text "World"

# Replace text
docxsmith replace -input hello.docx -output hello3.docx -old "World" -new "DocxSmith"

# Extract text
docxsmith extract -input hello.docx

# Create a table
docxsmith table -input hello.docx -output table.docx -create -rows 3 -cols 4
PDF Operations ✨
# Create a new PDF
docxsmith pdf-create -output hello.pdf -text "Hello PDF!" -title "My Document"

# Add content to a PDF
docxsmith pdf-add -input hello.pdf -output hello2.pdf -text "New content" -bold -size 14

# Extract text from PDF
docxsmith pdf-extract -input document.pdf

# Get PDF information
docxsmith pdf-info -input document.pdf
Format Conversion
# Convert DOCX to PDF
docxsmith convert -input document.docx -output document.pdf

# Convert PDF to DOCX
docxsmith convert -input document.pdf -output document.docx

# Convert with custom options
docxsmith convert -input doc.docx -output doc.pdf -font-size 14 -font-family "Times"

Library API

Creating Documents
// Create a new empty document
doc := docx.New()

// Create from an existing template
doc, err := docx.CreateFromTemplate("template.docx")

// Open an existing document
doc, err := docx.Open("existing.docx")
Working with Paragraphs
// Add a simple paragraph
doc.AddParagraph("Simple text")

// Add with formatting
doc.AddParagraph("Bold text", docx.WithBold())
doc.AddParagraph("Italic text", docx.WithItalic())
doc.AddParagraph("Colored text", docx.WithColor("0000FF"))
doc.AddParagraph("Large text", docx.WithSize("32"))
doc.AddParagraph("Centered text", docx.WithAlignment("center"))

// Combine multiple options
doc.AddParagraph("Fancy text",
    docx.WithBold(),
    docx.WithItalic(),
    docx.WithColor("FF0000"),
    docx.WithSize("28"))

// Add paragraph at specific position
doc.AddParagraphAt(2, "Inserted text")

// Delete a paragraph
doc.DeleteParagraph(0)

// Delete a range of paragraphs
doc.DeleteParagraphsRange(0, 5)
Text Operations
// Find text in document
indices := doc.FindText("search term")
// Returns slice of paragraph indices where text was found

// Replace all occurrences
count := doc.ReplaceText("old", "new")

// Replace in specific paragraph
doc.ReplaceTextInParagraph(2, "old", "new")

// Get all text content
text := doc.GetText()

// Get text from specific paragraph
text, err := doc.GetParagraphText(0)
Working with Tables
// Create a table
table := doc.AddTable(3, 4) // 3 rows, 4 columns

// Set cell content
table.SetCellText(0, 0, "Header 1")
table.SetCellText(0, 1, "Header 2")

// Get cell content
text, err := table.GetCellText(1, 1)

// Add a row
table.AddRow()

// Delete a row
table.DeleteRow(1)

// Get table dimensions
rows := table.GetRowCount()
cols := table.GetColumnCount()

// Delete entire table
doc.DeleteTable(0)
Document Information
// Get counts
paraCount := doc.GetParagraphCount()
tableCount := doc.GetTableCount()

// Clear all content
doc.Clear()

// Clone document
newDoc := doc.Clone()
Saving Documents
// Save to file
err := doc.Save("output.docx")

// Save to a different file
err := doc.SaveAs("copy.docx")

// Get document as bytes
data, err := doc.ToBytes()

PDF Library API ✨

Creating PDF Documents
import "github.com/Palaciodiego008/docxsmith/pkg/pdf"

// Create a new PDF
pdfDoc := pdf.New()

// Set metadata
pdfDoc.SetMetadata("My Document", "Author Name", "Subject")

// Add a page
page := pdfDoc.AddPage()

// Add text
page.AddText("Hello PDF", 20, 30, 12)

// Add styled text
style := pdf.TextStyle{
    FontSize:   14,
    FontFamily: "Arial",
    Bold:       true,
    Italic:     false,
    Color:      "FF0000", // Red
}
page.AddTextStyled("Important Text", 20, 50, style)

// Save
pdfDoc.Save("output.pdf")
Reading PDF Documents
// Open existing PDF
pdfDoc, err := pdf.Open("document.pdf")

// Get page count
pageCount := pdfDoc.GetPageCount()

// Extract all text
text := pdfDoc.GetAllText()

// Get specific page
page, err := pdfDoc.GetPage(0)
pageText := page.GetText()
Converting Between Formats
import "github.com/Palaciodiego008/docxsmith/pkg/converter"

// Convert DOCX to PDF
opts := converter.DefaultOptions()
opts.FontSize = 12
opts.FontFamily = "Arial"

err := converter.ConvertDocxToPDF("input.docx", "output.pdf", opts)

// Convert PDF to DOCX
err := converter.ConvertPDFToDocx("input.pdf", "output.docx", opts)

CLI Commands

create - Create a new document
docxsmith create -output file.docx [-text "content"]

Options:

  • -output: Output file path (required)
  • -text: Initial text content (optional)
add - Add content
docxsmith add -input in.docx -output out.docx -text "content" [options]

Options:

  • -input: Input file path (required)
  • -output: Output file path (required)
  • -text: Text to add (required)
  • -at: Insert at specific index (optional)
  • -bold: Make text bold
  • -italic: Make text italic
  • -size: Font size (e.g., "24" for 12pt)
  • -color: Text color (hex without #)
  • -align: Alignment (left, center, right, both)
delete - Delete content
docxsmith delete -input in.docx -output out.docx [options]

Options:

  • -input: Input file path (required)
  • -output: Output file path (required)
  • -paragraph: Paragraph index to delete
  • -start & -end: Delete range of paragraphs
  • -table: Table index to delete
replace - Replace text
docxsmith replace -input in.docx -output out.docx -old "text" -new "replacement"

Options:

  • -input: Input file path (required)
  • -output: Output file path (required)
  • -old: Text to replace (required)
  • -new: Replacement text (required)
  • -paragraph: Only replace in specific paragraph
find - Find text
docxsmith find -input file.docx -text "search"

Options:

  • -input: Input file path (required)
  • -text: Text to find (required)
extract - Extract text
docxsmith extract -input file.docx [-output text.txt]

Options:

  • -input: Input file path (required)
  • -output: Output text file (optional, prints to stdout if omitted)
table - Table operations
docxsmith table -input in.docx -output out.docx [options]

Options:

  • -input: Input file path (required)
  • -output: Output file path (required)
  • -create: Create a new table
  • -rows: Number of rows (default: 2)
  • -cols: Number of columns (default: 2)
  • -set: Set cell text (format: "tableIdx,row,col,text")
info - Document information
docxsmith info -input file.docx

Options:

  • -input: Input file path (required)
clear - Clear all content
docxsmith clear -input in.docx -output out.docx

Options:

  • -input: Input file path (required)
  • -output: Output file path (required)

Examples

See the examples directory for more comprehensive examples:

# Run the basic usage example
cd examples
go run basic_usage.go

This will generate several example documents demonstrating various features.

Testing

Run the test suite:

go test ./...

Run tests with coverage:

go test -cover ./...

Run tests with verbose output:

go test -v ./pkg/docx

Project Structure

docxsmith/
├── cmd/
│   └── docxsmith/          # CLI entry point
│       └── main.go         # Minimal main function
├── internal/
│   └── cli/                # CLI command implementations
│       ├── cli.go          # CLI router and usage
│       ├── create.go       # Create command
│       ├── content.go      # Add, delete, clear commands
│       ├── text.go         # Find, replace, extract commands
│       ├── table.go        # Table operations
│       └── info.go         # Info command
├── pkg/
│   └── docx/               # Core library (public API)
│       ├── document.go     # Document structure
│       ├── reader.go       # Reading .docx files
│       ├── writer.go       # Writing .docx files
│       ├── operations.go   # Document operations
│       ├── table.go        # Table operations
│       ├── creator.go      # Document creation
│       ├── *_test.go       # Tests
├── examples/               # Usage examples
├── testdata/               # Test fixtures
├── go.mod
└── README.md

How It Works

.docx files are actually ZIP archives containing XML files. DocxSmith:

  1. Unzips the .docx file
  2. Parses the XML content (mainly word/document.xml)
  3. Manipulates the XML structure
  4. Serializes back to XML
  5. Repackages as a ZIP file with .docx extension

The library handles all the complexity of the Office Open XML format while providing a simple, intuitive API.

Limitations

  • Currently focuses on document content (paragraphs and tables)
  • Advanced features like images, charts, and headers/footers are not yet supported
  • Complex formatting and styles have limited support
  • Does not preserve all metadata from original documents

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - feel free to use this project for any purpose.

Author

Diego Palacio (@Palaciodiego008)

Acknowledgments

  • Built with Go's standard library
  • Inspired by the need for simple .docx manipulation
  • Name inspired by blacksmiths who forge powerful tools

DocxSmith - Forging documents with precision and elegance.

Directories

Path Synopsis
cmd
docxsmith command
internal
cli
pkg
pdf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL