duplo

package module
v0.0.0-...-751e882 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2022 License: MIT Imports: 12 Imported by: 10

README

Duplo - Detect Similar or Duplicate Images

Godoc Reference Go Report

This Go library allows you to perform a visual query on a set of images, returning the results in the order of similarity. This allows you to effectively detect duplicates with minor modifications (e.g. some colour correction or watermarks).

It is an implementation of Fast Multiresolution Image Querying by Jacobs et al. which uses truncated Haar wavelet transforms to create visual hashes of the images. The same method has previously been used in the imgSeek software and the retrievr website.

Installation

go get github.com/rivo/duplo

Usage

import "github.com/rivo/duplo"

// Create an empty store.
store := duplo.New()

// Add image "img" to the store.
hash, _ := duplo.CreateHash(img)
store.Add("myimage", hash)

// Query the store based on image "query".
hash, _ = duplo.CreateHash(query)
matches := store.Query(hash)
sort.Sort(matches)
// matches[0] is the best match.

Documentation

http://godoc.org/github.com/rivo/duplo

Possible Applications

  • Identify copyright violations
  • Save disk space by detecting and removing duplicate images
  • Search for images by similarity

Projects Using This Package

  • imgdup2go: A visual image duplicate finder.

More Information

For more information, please go to http://rentafounder.com/find-similar-images-with-duplo/ or get in touch.

Documentation

Overview

Package duplo provides tools to efficiently query large sets of images for visual duplicates. The technique is based on the paper "Fast Multiresolution Image Querying" by Charles E. Jacobs, Adam Finkelstein, and David H. Salesin, with a few modifications and additions, such as the addition of a width to height ratio, the dHash metric by Dr. Neal Krawetz as well as some histogram-based metrics.

Quering the data structure will return a list of potential matches, sorted by the score described in the main paper. The user can make searching for duplicates stricter, however, by filtering based on the additional metrics.

Example

Package example.

// Create some example JPEG images.
addA, _ := jpeg.Decode(base64.NewDecoder(base64.StdEncoding, strings.NewReader(imgA)))
addB, _ := jpeg.Decode(base64.NewDecoder(base64.StdEncoding, strings.NewReader(imgB)))
query, _ := jpeg.Decode(base64.NewDecoder(base64.StdEncoding, strings.NewReader(imgC)))

// Create the store.
store := New()

// Turn two images into hashes and add them to the store.
hashA, _ := CreateHash(addA)
hashB, _ := CreateHash(addB)
store.Add("imgA", hashA)
store.Add("imgB", hashB)

// Query the store for our third image (which is most similar to "imgA").
queryHash, _ := CreateHash(query)
matches := store.Query(queryHash)
fmt.Println(matches[0].ID)
Output:

imgA

Index

Examples

Constants

View Source
const (
	// ImageScale is the width and height to which images are resized before they
	// are being processed.
	ImageScale = 128
)

Variables

View Source
var (
	// TopCoefs is the number of top coefficients (per colour channel), ordered
	// by absolute value, that will be kept. Coefficients that rank lower will
	// be discarded. Change this only once when the package is initialized.
	TopCoefs = 40
)

Functions

This section is empty.

Types

type Hash

type Hash struct {
	haar.Matrix

	// Thresholds contains the coefficient threholds. If you discard all
	// coefficients with abs(coef) < threshold, you end up with TopCoefs
	// coefficients.
	Thresholds haar.Coef

	// Ratio is image width / image height or 0 if height is 0.
	Ratio float64

	// DHash is a 128 bit vector where each bit value depends on the monotonicity
	// of two adjacent pixels. The first 64 bits are based on a 8x8 version of
	// the Y colour channel. The other two 32 bits are each based on a 8x4 version
	// of the Cb, and Cr colour channel, respectively.
	DHash [2]uint64

	// Histogram is histogram quantized into 64 bits (32 for Y and 16 each for
	// Cb and Cr). A bit is set to 1 if the intensity's occurence count is large
	// than the median (for that colour channel) and set to 0 otherwise.
	Histogram uint64

	// HistoMax is the maximum value of the histogram (for each channel Y, Cb,
	// and Cr).
	HistoMax [3]float32
}

Hash represents the visual hash of an image.

func CreateHash

func CreateHash(img image.Image) (Hash, image.Image)

CreateHash calculates and returns the visual hash of the provided image as well as a resized version of it (ImageScale x ImageScale) which may be ignored if not needed anymore.

type Match

type Match struct {
	// The ID of the matched image, as specified in the pool.Add() function.
	ID interface{}

	// The score calculated during the similarity query. The lower, the better
	// the match.
	Score float64

	// The absolute difference between the two image ratios' log values.
	RatioDiff float64

	// The hamming distance between the two dHash bit vectors.
	DHashDistance int

	// The hamming distance between the two histogram bit vectors.
	HistogramDistance int
}

Match represents an image matched by a similarity query.

func (*Match) String

func (m *Match) String() string

type Matches

type Matches []*Match

Matches is a slice of match results.

func (Matches) Len

func (m Matches) Len() int

func (Matches) Less

func (m Matches) Less(i, j int) bool

func (Matches) Swap

func (m Matches) Swap(i, j int)

type Store

type Store struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

Store is a data structure that holds references to images. It holds visual hashes and references to the images but the images themselves are not held in the data structure.

A general limit to the store is that it can hold no more than 4,294,967,295 images. This is to save RAM space but may be easy to extend by modifying its data structures to hold uint64 indices instead of uint32 indices.

Store's methods are concurrency safe. Store implements the GobDecoder and GobEncoder interfaces.

func New

func New() *Store

New returns a new, empty image store.

func (*Store) Add

func (store *Store) Add(id interface{}, hash Hash)

Add adds an image (via its hash) to the store. The provided ID is the value that will be returned as the result of a similarity query. If an ID is already in the store, it is not added again.

func (*Store) Delete

func (store *Store) Delete(id interface{})

Delete removes an image from the store so it will not be returned during a query anymore. Note that the candidate slot still remains occupied but its index will be removed from all index lists. This also means that Size() will not decrease. This is an expensive operation. If the provided ID could not be found, nothing happens.

func (*Store) Exchange

func (store *Store) Exchange(oldID, newID interface{}) error

Exchange exchanges the ID of an image for a new one. If the old ID could not be found, nothing happens. If the new ID already existed prior to the exchange, an error is returned.

func (*Store) GobDecode

func (store *Store) GobDecode(from []byte) error

GobDecode reconstructs the store from a binary representation. You may need to register any types that you put into the store in order for them to be decoded successfully. Example:

gob.Register(YourType{})

func (*Store) GobEncode

func (store *Store) GobEncode() ([]byte, error)

GobEncode places a binary representation of the store in a byte slice.

func (*Store) Has

func (store *Store) Has(id interface{}) bool

Has checks if an image (via its ID) is already contained in the store.

func (*Store) IDs

func (store *Store) IDs() (ids []interface{})

IDs returns a list of IDs of all images contained in the store. This list is created during the call so it may be modified without affecting the store.

func (*Store) Modified

func (store *Store) Modified() bool

Modified indicates whether this store has been modified since it was loaded or created.

func (*Store) Query

func (store *Store) Query(hash Hash) Matches

Query performs a similarity search on the given image hash and returns all potential matches. The returned slice will not be sorted but implements sort.Interface, which will sort it so the match with the best score is its first element.

func (*Store) Size

func (store *Store) Size() int

Size returns the number of images currently in the store.

Directories

Path Synopsis
Package haar provides a Haar wavelet function for bitmap images.
Package haar provides a Haar wavelet function for bitmap images.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL