excelmetadata

package module
v1.2.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 22, 2025 License: MIT Imports: 9 Imported by: 2

README

ExcelMetadata

A Go library for extracting comprehensive metadata from Excel files (.xlsx) using excelize. This library allows you to extract all structural information, styles, formulas, and content from Excel files and export them as JSON.

Features

  • 📊 Complete Excel Structure Extraction

    • Document properties (title, author, dates, etc.)
    • Sheet information (names, visibility, dimensions)
    • Cell data, formulas, and types
    • Merged cells with values
    • Row heights and column widths
  • 🎨 Style Information

    • Font styles (bold, italic, color, size, etc.)
    • Fill patterns and colors
    • Borders and alignment
    • Number formats
    • Cell protection settings
  • 🔗 Rich Content Support

    • Hyperlinks
    • Data validations
    • Images with formatting details
    • Named ranges (defined names)
  • Performance Options

    • Configurable extraction options
    • Cell limit per sheet
    • Selective feature extraction

Installation

go get github.com/prongbang/excelmetadata
CLI
go install github.com/prongbang/excelmetadata/cmd/excelmetadata@v1.2.6
  • Extraction

GO

excelmetadata extract -o sample_metadata.go sample.xlsx

JSON

excelmetadata extract -o sample.metadata.json sample.xlsx

Requirements

  • Go 1.18 or higher
  • github.com/xuri/excelize/v2 v2.9.1+

Quick Start

Basic Usage
package main

import (
    "fmt"
    "log"
    "github.com/prongbang/excelmetadata"
)

func main() {
    // Extract metadata from Excel file
    metadata, err := excelmetadata.QuickExtract("sample.xlsx")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("File: %s\n", metadata.Filename)
    fmt.Printf("Created by: %s\n", metadata.Properties.Creator)
    fmt.Printf("Sheets: %d\n", len(metadata.Sheets))
}
Export to JSON
// Export to JSON string
jsonStr, err := excelmetadata.QuickExtractToJSON("sample.xlsx", true)
if err != nil {
    log.Fatal(err)
}
fmt.Println(jsonStr)

// Export directly to file
err = excelmetadata.QuickExtractToFile("sample.xlsx", "metadata.json", true)
if err != nil {
    log.Fatal(err)
}
Advanced Usage with Options
// Configure extraction options
options := &excelmetadata.Options{
    IncludeCellData:       true,
    IncludeStyles:         true,
    IncludeImages:         true,
    IncludeDefinedNames:   true,
    IncludeDataValidation: true,
    MaxCellsPerSheet:      1000, // Limit cells per sheet
}

// Create extractor with options
extractor, err := excelmetadata.New("large_file.xlsx", options)
if err != nil {
    log.Fatal(err)
}
defer extractor.Close()

// Extract metadata
metadata, err := extractor.Extract()
if err != nil {
    log.Fatal(err)
}

// Process metadata...

Data Structures

Metadata

The main structure containing all extracted information:

type Metadata struct {
    Filename     string               // Original filename
    Properties   DocumentProperties   // Document properties
    Sheets       []SheetMetadata      // Sheet information
    DefinedNames []DefinedName        // Named ranges
    Styles       map[int]StyleDetails // Unique styles
    ExtractedAt  time.Time           // Extraction timestamp
}
SheetMetadata

Information about each sheet:

type SheetMetadata struct {
    Index           int                // Sheet index
    Name            string             // Sheet name
    Visible         bool               // Visibility status
    Dimensions      SheetDimensions    // Used range
    MergedCells     []MergedCell       // Merged cells
    DataValidations []DataValidation   // Validation rules
    Protection      *SheetProtection   // Protection settings
    RowHeights      map[int]float64    // Custom row heights
    ColWidths       map[string]float64 // Custom column widths
    Cells           []CellMetadata     // Cell data
    Images          []ImageMetadata    // Embedded images
}
CellMetadata

Individual cell information:

type CellMetadata struct {
    Address   string            // Cell address (e.g., "A1")
    Value     interface{}       // Cell value
    Formula   string            // Formula if present
    StyleID   int               // Style reference
    Type      excelize.CellType // Cell type
    Hyperlink *Hyperlink        // Hyperlink if present
}

Extraction Options

Option Description Default
IncludeCellData Extract cell values and formulas true
IncludeStyles Extract style information true
IncludeImages Extract embedded images true
IncludeDefinedNames Extract named ranges true
IncludeDataValidation Extract data validation rules true
MaxCellsPerSheet Maximum cells to extract per sheet (0 = unlimited) 0

JSON Output Example

{
  "filename": "sample.xlsx",
  "properties": {
    "title": "Sales Report",
    "creator": "John Doe",
    "created": "2024-01-15T10:30:00Z",
    "modified": "2024-01-20T14:45:00Z"
  },
  "sheets": [
    {
      "index": 0,
      "name": "Sheet1",
      "visible": true,
      "dimensions": {
        "startCell": "A1",
        "endCell": "D10",
        "rowCount": 10,
        "colCount": 4
      },
      "cells": [
        {
          "address": "A1",
          "value": "Product",
          "styleId": 1,
          "type": 3
        },
        {
          "address": "B2",
          "value": "100",
          "formula": "SUM(C2:D2)",
          "styleId": 2,
          "type": 2
        }
      ],
      "mergedCells": [
        {
          "startCell": "A1",
          "endCell": "B1",
          "value": "Header"
        }
      ]
    }
  ],
  "styles": {
    "1": {
      "font": {
        "bold": true,
        "size": 14,
        "color": "#000000"
      },
      "fill": {
        "type": "pattern",
        "pattern": 1,
        "color": ["#E0E0E0"]
      }
    }
  },
  "extractedAt": "2024-01-20T15:00:00Z"
}

Use Cases

  1. Excel to JSON Conversion - Convert Excel files to JSON for web applications
  2. Document Analysis - Analyze Excel file structure and complexity
  3. Version Control - Track changes in Excel files by comparing metadata
  4. Data Migration - Extract data with complete formatting information
  5. Backup and Archival - Create searchable metadata for Excel archives
  6. Template Generation - Extract structure to create new Excel templates

Performance Considerations

  • For large files, use MaxCellsPerSheet to limit extraction
  • Disable unnecessary features (styles, images) for faster extraction
  • Image extraction includes binary data, which can significantly increase JSON size

Limitations

  • Currently supports .xlsx files only (not .xls)
  • Some advanced Excel features may not be captured
  • Images are extracted as binary data (base64 encoded in JSON)

Error Handling

The library provides detailed error messages:

extractor, err := excelmetadata.New("file.xlsx", nil)
if err != nil {
    // Handle file opening errors
    log.Printf("Failed to open file: %v", err)
    return
}
defer extractor.Close()

metadata, err := extractor.Extract()
if err != nil {
    // Handle extraction errors
    log.Printf("Failed to extract metadata: %v", err)
    return
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

This project uses the excelize library, which is licensed under the BSD 3-Clause License.

Acknowledgments

  • Built on top of excelize
  • Inspired by the need for better Excel file introspection tools

Example Projects

1. Excel Diff Tool
// Compare two Excel files
metadata1, _ := excelmetadata.QuickExtract("version1.xlsx")
metadata2, _ := excelmetadata.QuickExtract("version2.xlsx")

// Compare sheet counts, cell values, styles, etc.
2. Excel Search Engine
// Index Excel files for searching
files := []string{"report1.xlsx", "report2.xlsx", "report3.xlsx"}
var index []excelmetadata.Metadata

for _, file := range files {
    metadata, _ := excelmetadata.QuickExtract(file)
    index = append(index, *metadata)
}

// Search through metadata for specific content
3. Excel Structure Validator
// Validate Excel files against a template
template, _ := excelmetadata.QuickExtract("template.xlsx")
submission, _ := excelmetadata.QuickExtract("submission.xlsx")

// Check if submission matches template structure
if len(template.Sheets) != len(submission.Sheets) {
    log.Println("Sheet count mismatch")
}

Support

For issues, questions, or contributions, please visit the GitHub repository.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func QuickExtractToFile

func QuickExtractToFile(excelFile, file string, pretty bool) error

QuickExtractToFile is a convenience function for extracting to a JSON file

func QuickExtractToGO added in v1.2.3

func QuickExtractToGO(filename string) (string, error)

QuickExtractToGO is a convenience function for extracting to GO

func QuickExtractToJSON

func QuickExtractToJSON(filename string, pretty bool) (string, error)

QuickExtractToJSON is a convenience function for extracting to JSON

Types

type AlignmentStyle

type AlignmentStyle struct {
	Horizontal   string `json:"horizontal,omitempty"`
	Vertical     string `json:"vertical,omitempty"`
	WrapText     bool   `json:"wrapText,omitempty"`
	TextRotation int    `json:"textRotation,omitempty"`
	Indent       int    `json:"indent,omitempty"`
	ShrinkToFit  bool   `json:"shrinkToFit,omitempty"`
}

AlignmentStyle represents text alignment

type BorderStyle

type BorderStyle struct {
	Type  string `json:"type,omitempty"`
	Color string `json:"color,omitempty"`
	Style int    `json:"style,omitempty"`
}

BorderStyle represents cell border formatting

type CellMetadata

type CellMetadata struct {
	Address   string            `json:"address"`
	Value     interface{}       `json:"value,omitempty"`
	Formula   string            `json:"formula,omitempty"`
	StyleID   int               `json:"styleId,omitempty"`
	Type      excelize.CellType `json:"type"`
	Hyperlink *Hyperlink        `json:"hyperlink,omitempty"`
}

CellMetadata contains metadata for a single cell

type DataValidation

type DataValidation struct {
	Range        string  `json:"range"`
	Type         string  `json:"type"`
	Operator     string  `json:"operator,omitempty"`
	Formula1     string  `json:"formula1,omitempty"`
	Formula2     string  `json:"formula2,omitempty"`
	ShowError    bool    `json:"showError"`
	ErrorTitle   *string `json:"errorTitle,omitempty"`
	ErrorMessage *string `json:"errorMessage,omitempty"`
}

DataValidation represents data validation rules

type DefinedName

type DefinedName struct {
	Name     string `json:"name"`
	RefersTo string `json:"refersTo"`
	Scope    string `json:"scope,omitempty"`
}

DefinedName represents a named range

type DocumentProperties

type DocumentProperties struct {
	Title          string `json:"title,omitempty"`
	Subject        string `json:"subject,omitempty"`
	Creator        string `json:"creator,omitempty"`
	Keywords       string `json:"keywords,omitempty"`
	Description    string `json:"description,omitempty"`
	LastModifiedBy string `json:"lastModifiedBy,omitempty"`
	Category       string `json:"category,omitempty"`
	Version        string `json:"version,omitempty"`
	Created        string `json:"created,omitempty"`
	Modified       string `json:"modified,omitempty"`
}

DocumentProperties contains Excel document properties

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor is the main interface for extracting Excel metadata

func New

func New(filename string, options *Options) (*Extractor, error)

New creates a new Extractor instance

func (*Extractor) Close

func (e *Extractor) Close() error

Close closes the underlying Excel file

func (*Extractor) Extract

func (e *Extractor) Extract() (*Metadata, error)

Extract performs the metadata extraction

func (*Extractor) ExtractToFile

func (e *Extractor) ExtractToFile(outputPath string, pretty bool) error

ExtractToFile extracts metadata and saves it to a JSON or GO file

func (*Extractor) ExtractToGO added in v1.2.3

func (e *Extractor) ExtractToGO() (string, error)

ExtractToGO extracts metadata and returns it as GO string

func (*Extractor) ExtractToJSON

func (e *Extractor) ExtractToJSON(pretty bool) (string, error)

ExtractToJSON extracts metadata and returns it as JSON string

type FillStyle

type FillStyle struct {
	Type    string   `json:"type,omitempty"`
	Pattern int      `json:"pattern,omitempty"`
	Color   []string `json:"color,omitempty"`
}

FillStyle represents cell fill formatting

type FontStyle

type FontStyle struct {
	Bold      bool    `json:"bold,omitempty"`
	Italic    bool    `json:"italic,omitempty"`
	Underline string  `json:"underline,omitempty"`
	Strike    bool    `json:"strike,omitempty"`
	Family    string  `json:"family,omitempty"`
	Size      float64 `json:"size,omitempty"`
	Color     string  `json:"color,omitempty"`
}

FontStyle represents font formatting

type Hyperlink struct {
	Link string `json:"link"`
}

Hyperlink represents a cell hyperlink

type ImageFormat

type ImageFormat struct {
	AltText             string
	PrintObject         *bool
	Locked              *bool
	LockAspectRatio     bool
	AutoFit             bool
	AutoFitIgnoreAspect bool
	OffsetX             int
	OffsetY             int
	ScaleX              float64
	ScaleY              float64
	Hyperlink           string
	HyperlinkType       string
	Positioning         string
}

type ImageMetadata

type ImageMetadata struct {
	Cell       string       `json:"cell"`
	File       []byte       `json:"file"`
	Extension  string       `json:"extension"`
	InsertType byte         `json:"insertType"`
	Format     *ImageFormat `json:"format"`
}

ImageMetadata represents image information

type MergedCell

type MergedCell struct {
	StartCell string `json:"startCell"`
	EndCell   string `json:"endCell"`
	Value     string `json:"value,omitempty"`
}

MergedCell represents a merged cell range

type Metadata

type Metadata struct {
	Filename     string               `json:"filename"`
	Properties   DocumentProperties   `json:"properties"`
	Sheets       []SheetMetadata      `json:"sheets"`
	DefinedNames []DefinedName        `json:"definedNames,omitempty"`
	Styles       map[int]StyleDetails `json:"styles,omitempty"`
	ExtractedAt  time.Time            `json:"extractedAt"`
}

Metadata represents the complete Excel file metadata

func QuickExtract

func QuickExtract(filename string) (*Metadata, error)

QuickExtract is a convenience function for simple extraction

type Options

type Options struct {
	IncludeCellData       bool
	IncludeStyles         bool
	IncludeImages         bool
	IncludeDefinedNames   bool
	IncludeDataValidation bool
	MaxCellsPerSheet      int
}

Options configures the extraction behavior

func DefaultOptions

func DefaultOptions() *Options

DefaultOptions returns recommended default options

type Protection

type Protection struct {
	Hidden bool `json:"hidden,omitempty"`
	Locked bool `json:"locked,omitempty"`
}

Protection represents cell protection settings

type SheetDimensions

type SheetDimensions struct {
	StartCell string `json:"startCell"`
	EndCell   string `json:"endCell"`
	RowCount  int    `json:"rowCount"`
	ColCount  int    `json:"colCount"`
}

SheetDimensions represents the used range of a sheet

type SheetMetadata

type SheetMetadata struct {
	Index           int                `json:"index"`
	Name            string             `json:"name"`
	Visible         bool               `json:"visible"`
	Dimensions      SheetDimensions    `json:"dimensions"`
	MergedCells     []MergedCell       `json:"mergedCells,omitempty"`
	DataValidations []DataValidation   `json:"dataValidations,omitempty"`
	Protection      *SheetProtection   `json:"protection,omitempty"`
	RowHeights      map[int]float64    `json:"rowHeights,omitempty"`
	ColWidths       map[string]float64 `json:"colWidths,omitempty"`
	Cells           []CellMetadata     `json:"cells,omitempty"`
	Images          []ImageMetadata    `json:"images,omitempty"`
}

SheetMetadata contains metadata for a single sheet

type SheetProtection

type SheetProtection struct {
	Protected           bool   `json:"protected"`
	Password            string `json:"password,omitempty"`
	EditObjects         bool   `json:"editObjects"`
	EditScenarios       bool   `json:"editScenarios"`
	SelectLockedCells   bool   `json:"selectLockedCells"`
	SelectUnlockedCells bool   `json:"selectUnlockedCells"`
}

SheetProtection represents sheet protection settings

type StyleDetails

type StyleDetails struct {
	Font         *FontStyle      `json:"font,omitempty"`
	Fill         *FillStyle      `json:"fill,omitempty"`
	Border       []BorderStyle   `json:"border,omitempty"`
	Alignment    *AlignmentStyle `json:"alignment,omitempty"`
	NumberFormat int             `json:"numberFormat,omitempty"`
	Protection   *Protection     `json:"protection,omitempty"`
}

StyleDetails contains detailed style information

Directories

Path Synopsis
cmd
excelmetadata command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL