ssdeep

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

ssdeep - Fuzzy Hashing Tool

Go Version License

中文文档

A pure Go implementation of the ssdeep fuzzy hashing algorithm (Context Triggered Piecewise Hashing). This library enables similarity detection between files, even when they have minor differences.

Features

  • Pure Go Implementation: No CGO dependencies, fully compatible with the official ssdeep algorithm
  • High Performance: Optimized for speed with sync.Pool for memory efficiency
  • Streaming Support: Handles both seekable and non-seekable streams efficiently
  • CLI Tool: Command-line interface compatible with the original ssdeep tool
  • Exact Compatibility: Produces identical hashes and similarity scores as the official implementation

Installation

As a Library
go get github.com/cosmorse/ssdeep
As a CLI Tool
go install github.com/cosmorse/ssdeep/cmd/ssdeep@latest

Or build from source:

git clone https://github.com/cosmorse/ssdeep.git
cd ssdeep
go build -o ssdeep ./cmd/ssdeep

Usage

Library
Computing Fuzzy Hashes
package main

import (
    "fmt"
    "github.com/cosmorse/ssdeep"
)

func main() {
    // Hash a byte slice
    data := []byte("The quick brown fox jumps over the lazy dog")
    hash, err := ssdeep.Bytes(data)
    if err != nil {
        panic(err)
    }
    fmt.Println("Hash:", hash)
    // Output: Hash: 3:FJKKIUKact:FHIGi

    // Hash a file
    hash, err = ssdeep.File("path/to/file")
    if err != nil {
        panic(err)
    }
    fmt.Println("File hash:", hash)

    // Hash from a stream
    file, _ := os.Open("path/to/file")
    defer file.Close()
    hash, err = ssdeep.Stream(file)
    if err != nil {
        panic(err)
    }
    fmt.Println("Stream hash:", hash)
}
Comparing Hashes
package main

import (
    "fmt"
    "github.com/cosmorse/ssdeep"
)

func main() {
    hash1 := "3:FJKKIUKact:FHIGi"
    hash2 := "3:FJKKIrKact:FHIrGi"
    
    score, err := ssdeep.Compare(hash1, hash2)
    if err != nil {
        panic(err)
    }
    fmt.Printf("Similarity score: %d\n", score)
    // Output: Similarity score: 71
}
Command-Line Tool
Computing Hashes
# Hash single file
ssdeep file.txt

# Hash multiple files
ssdeep file1.txt file2.txt file3.txt

# Hash directory (recursive)
ssdeep /path/to/directory

# Silent mode (suppress errors)
ssdeep -s file.txt

Example output:

384:7NReLCuqzHkAq7nfuEahYISAl/ipDV2wpR8iilZ16iDTv1nzZkG:7iLCTe2Y8tilR8pzBn9,"file.txt"
Matching Hashes
# Generate hash database
ssdeep file1.txt file2.txt > hashes.txt

# Match files against database
ssdeep -m hashes.txt suspicious_file.txt

# Match directory against database
ssdeep -m hashes.txt /path/to/check

Example output:

suspicious_file.txt matches file1.txt (98)

Algorithm Details

Fuzzy Hashing

ssdeep implements Context Triggered Piecewise Hashing (CTPH), which:

  1. Uses a rolling hash to identify chunk boundaries
  2. Computes piecewise hashes for each chunk using FNV-like algorithm
  3. Generates two hash sequences at different block sizes for better comparison
  4. Supports similarity detection through weighted Levenshtein distance
Hash Format
blocksize:hash1:hash2
  • blocksize: Automatically determined based on file size
  • hash1: Hash computed at blocksize
  • hash2: Hash computed at blocksize * 2

Example: 3:FJKKIUKact:FHIGi

Similarity Scoring

The Compare function returns a score from 0-100:

  • 100: Identical files
  • 75-99: Very similar (minor modifications)
  • 50-74: Similar content with some differences
  • 1-49: Some common patterns
  • 0: No significant similarity

Performance

Benchmarks
BenchmarkHashBytes1K-8     1000000    1234 ns/op     822.15 MB/s    0 allocs/op
BenchmarkHashBytes64K-8      20000   52000 ns/op    1260.31 MB/s   0 allocs/op
BenchmarkHashBytes1M-8        1000  800000 ns/op    1310.72 MB/s   2 allocs/op
BenchmarkCompare-8         5000000     300 ns/op       0 B/op       0 allocs/op
Optimizations
  • Zero-allocation hash computation for most operations
  • Sync.Pool for state reuse to minimize GC pressure
  • Streaming architecture for memory-efficient processing of large files
  • Optimized Levenshtein distance with stack-allocated buffers

Compatibility

This implementation is fully compatible with the official ssdeep:

  • Produces identical hash values for the same input
  • Returns exact similarity scores matching the C implementation
  • Supports all official test vectors

Tested against ssdeep version 2.14.1.

Testing

# Run all tests
go test -v

# Run benchmarks
go test -bench=. -benchmem

# Run specific test
go test -v -run TestOfficialTestVectors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

References

Acknowledgments

This implementation is based on the original ssdeep algorithm by:

  • Andrew Tridgell
  • Jesse Kornblum
  • Helmut Grohne
  • Tsukasa OI

Special thanks to the ssdeep project maintainers for their excellent work on fuzzy hashing.

Documentation

Overview

Package ssdeep implements the ssdeep fuzzy hashing algorithm. This algorithm computes fuzzy hashes for files, enabling similarity detection even when files have minor differences.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrEmptyData = fmt.Errorf("ssdeep: empty data")
)

Functions

func Bytes

func Bytes(data []byte) (string, error)

Bytes computes the ssdeep fuzzy hash for a given byte slice.

func Compare

func Compare(hash1, hash2 string) (int, error)

Compare calculates similarity score (0 to 100) between two ssdeep hash values. Score of 100 means completely identical, 0 means no significant similarity.

func File

func File(path string) (string, error)

File computes the ssdeep fuzzy hash for a file at the given path.

func Stream

func Stream(r io.Reader, options ...Option) (string, error)

Stream computes the ssdeep fuzzy hash from an io.Reader. For objects implementing io.ReadSeeker (like files), it pre-fetches the size for optimal block size. For regular Readers, it tries to determine the size when possible, or estimates block size from initial data.

Types

type Option

type Option interface {
	// contains filtered or unexported methods
}

func WithCachedSize

func WithCachedSize(size int64) Option

WithCachedSize option allows specifying a cached size for the hash.

func WithCleanup

func WithCleanup() Option

WithCleanup option enables cleanup of temporary resources cached by kernel.

func WithFixedSize

func WithFixedSize(size int64) Option

WithFixedSize option allows specifying a fixed size for the hash.

Directories

Path Synopsis
cmd
ssdeep command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL