archive

package
v0.0.0-...-0c118e2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package archive owns local archive operations: format identification, compact-extraction (with up to MaxCompactFlattenLayers of duplicate- folder flattening), creation, and listing.

The package is deliberately isolated from CLI concerns — the cmd layer resolves sources (URL fetch, git clone, cwd auto-pick) and then feeds concrete local paths into the functions defined here. That separation is what lets the same archive engine power Slice 3 of the downloader feature later without an import cycle.

Format coverage (via github.com/mholt/archives):

read  + write : zip, tar, tar.gz, tar.bz2, tar.xz, tar.zst, gz, bz2, xz, zst
read-only     : 7z, rar

7z/rar writing is rejected with a clear error in CreateArchive so the CLI can surface "use zip/tar.* for outputs" without crashing.

Package archive — write side. Builds zip / tar / tar.* archives from a heterogeneous list of local source paths using mholt/archives.

Compression mode → library knobs:

Best     → DEFLATE max  / gzip 9 / bz2 9
Standard → DEFLATE def  / gzip default / bz2 default
Fast     → DEFLATE 1    / gzip 1 / bz2 1

Filtering: optional include / exclude glob lists run against the in-archive name (NameInArchive). An entry survives when either no includes are set OR it matches at least one include, AND it does NOT match any exclude.

Package archive — source resolution helpers used by the cmd layer to turn user-supplied strings (local paths, HTTPS URLs, git URLs) into concrete on-disk paths the extract / create engines can consume.

Network operations are deliberately kept small: we shell out to aria2c when available, fall back to net/http otherwise, and shell out to git for clone. The downloader package will replace this with the full engine in a later slice — until then this keeps `gitmap uzc <url>` and `gitmap zip <git-url>` functional.

Index

Constants

This section is empty.

Variables

View Source
var ErrUnknownFormat = errors.New("unknown archive format")

ErrUnknownFormat is returned by CreateArchive when the output extension is not recognized. Surfaced as a typed error so the cmd layer can translate it into a friendly user message.

Functions

func AutoDetectSingleArchive

func AutoDetectSingleArchive(dir string) (string, error)

AutoDetectSingleArchive scans dir for exactly one file with a recognized archive extension. Returns the absolute path on success, an error describing 0 or N>1 matches otherwise. Used by `gitmap uzc` when the user passes no explicit source.

func CleanupResolved

func CleanupResolved(r ResolvedSource)

CleanupResolved removes any temp workspace recorded on the source. Always safe to call.

func FlateLevelForMode

func FlateLevelForMode(mode CompressionMode) int

FlateLevelForMode is the exported helper for the cmd layer's --list banner so users can see what they signed up for.

func ListEntries

func ListEntries(ctx context.Context, path string) ([]Entry, Format, error)

ListEntries returns up to maxListEntries entries plus the detected format. Used by `gitmap uzc --list <archive>`.

Types

type CompressionMode

type CompressionMode string

CompressionMode is the user-facing knob persisted in ArchiveHistory.CompressionMode.

type CreateOptions

type CreateOptions struct {
	OutputPath string
	Sources    []string // absolute local paths
	Mode       CompressionMode
	Includes   []string // optional glob list
	Excludes   []string // optional glob list
}

CreateOptions bundles every knob `gitmap zip` exposes.

type CreateResult

type CreateResult struct {
	OutputPath     string
	Format         Format
	EntriesWritten int
}

CreateResult is returned to the cmd layer for printing + history rows.

func CreateArchive

func CreateArchive(ctx context.Context, opts CreateOptions) (CreateResult, error)

CreateArchive walks every source, applies include/exclude filters, and writes the archive to opts.OutputPath using the format derived from the output extension.

type Entry

type Entry struct {
	Path string
	Size int64
	Dir  bool
}

ListEntries walks the archive and returns a flat list of entry names + sizes for the `--list` mode. Bounded internally to 50_000 entries to keep a malicious archive from exhausting memory.

type ExtractResult

type ExtractResult struct {
	OutputDir       string
	Format          Format
	EntriesWritten  int
	UsedTempDir     bool
	FlattenedLayers int
}

ExtractResult is what a compact-extract returns to the caller so it can be persisted into ArchiveHistory and printed to the user.

func CompactExtract

func CompactExtract(ctx context.Context, srcArchive, destBaseDir string) (ExtractResult, error)

CompactExtract extracts srcArchive into a single normalized directory under destBaseDir, named after the archive's base name (sans extension).

Algorithm: temp-dir-then-move. We always extract into a fresh temp dir inside destBaseDir, then walk it to find the "real root" — the first directory that either holds >1 entry OR holds at least one non-dir entry. That real root is then moved (or its contents merged) into `<destBaseDir>/<archiveBaseName>/`. This guarantees:

  1. xap.zip → xap/xap/<files> becomes destBaseDir/xap/<files> (any number of duplicate-name layers up to MaxCompactFlattenLayers is collapsed; we do not require the inner names to match xap — we just promote single-child directories until we hit content.)

  2. xlt.zip → <files> becomes destBaseDir/xlt/<files> (no flatten, just a wrap.)

  3. mixed.zip → README + src/ becomes destBaseDir/mixed/{README,src} (no flatten, the temp dir contents move directly under the wrap.)

The temp dir is always cleaned, even on failure mid-extract.

type Format

type Format string

Format is a string tag persisted in ArchiveHistory.ArchiveFormat. It reads cleanly in PascalCase logs ("Zip", "TarGz") yet round-trips through the canonical extension via FormatFromExt / Format.Extension.

const (
	FormatZip     Format = "Zip"
	FormatTar     Format = "Tar"
	FormatTarGz   Format = "TarGz"
	FormatTarBz2  Format = "TarBz2"
	FormatTarXz   Format = "TarXz"
	FormatTarZst  Format = "TarZst"
	FormatGz      Format = "Gz"
	FormatBz2     Format = "Bz2"
	FormatXz      Format = "Xz"
	FormatZst     Format = "Zst"
	Format7z      Format = "SevenZip"
	FormatRar     Format = "Rar"
	FormatUnknown Format = ""
)

func FormatFromPath

func FormatFromPath(p string) Format

FormatFromPath inspects a file name and returns the matching Format, or FormatUnknown when nothing matches. Multi-extension forms (".tar.gz", ".tar.bz2", ".tar.xz", ".tar.zst") are checked first so a plain ".gz" never wins over ".tar.gz".

func IdentifyArchive

func IdentifyArchive(ctx context.Context, path string) (Format, error)

IdentifyArchive opens the file and asks mholt/archives to sniff the magic bytes. Used as the authoritative format check after extension- based guesses, since a misnamed file (foo.zip that is really a tarball) would otherwise produce a misleading ArchiveHistory.ArchiveFormat row.

func (Format) Extension

func (f Format) Extension() string

Extension returns the canonical extension (with leading dot) the CreateArchive path uses to construct mholt/archives Format objects.

type ResolvedSource

type ResolvedSource struct {
	Original   string
	Kind       SourceKind
	LocalPath  string
	CleanupDir string
}

ResolvedSource is the materialized form of one user-supplied input. LocalPath is always populated; CleanupDir, when non-empty, must be removed by the caller after the operation completes (this is how the HTTP and git branches signal they used a temp workspace).

func ResolveSource

func ResolveSource(ctx context.Context, raw string) (ResolvedSource, error)

ResolveSource turns one input string into a usable local path. The caller is responsible for invoking CleanupResolved afterwards.

type SourceKind

type SourceKind int

SourceKind classifies one entry on the `gitmap zip` / `gitmap uzc` command line. The cmd layer dispatches per-kind; the archive engine only ever sees concrete local paths.

const (
	SourceLocal SourceKind = iota
	SourceHTTP
	SourceGit
)

func ClassifySource

func ClassifySource(s string) SourceKind

ClassifySource is the cheap, pure-function classifier the command layer uses BEFORE doing any IO. Decision order matters: a path like `git@github.com:foo/bar.git` parses as a URL with no scheme, so we detect git first.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL