bm25

package module
v0.0.0-...-dbf469a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: BSD-2-Clause Imports: 12 Imported by: 0

README

bm25

A simple Okapi BM25 ranking function implemented in Go. It supports document retrieval, parallel scoring, custom tokenizers, and collection serialization.

The implementation was derived from An Introduction to Information Retrieval, Manning et al., page 233.

Usage

package main

import (
        "fmt"
        "log"

        "github.com/djfritz/bm25"
)

func main() {
        // Create a collection and set a tokenizer
        c := new(bm25.Collection)
        c.SetTokenizer(bm25.Tokenize)

        // Add documents
        c.AddDocument(bm25.NewTextDocument("the quick brown fox jumps over the lazy dog"))
        c.AddDocument(bm25.NewTextDocument("the lazy dog sat on the porch"))
        c.AddDocument(bm25.NewTextDocument("the fox is quick and brown"))

        // Score documents against a query
        results, err := c.Score("quick fox", 3)
        if err != nil {
                log.Fatal(err)
        }

        for _, r := range results {
                fmt.Printf("score: %.4f  text: %s\n", r.S, r.D.Text())
        }
}

Custom Document Types

Any type that implements the Document interface can be added to a collection:

type Document interface {
        Text() string
}

Parallel Scoring

By default, scoring uses runtime.NumCPU() goroutines. Override with:

c.SetParallel(4) // use 4 goroutines
c.SetParallel(0) // reset to NumCPU (default)

Documentation

Overview

Package bm25 implements the Okapi BM25 ranking function.

The implementation was derived from "An Introduction to Information Retrieval, Manning et al., page 233".

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidN        = errors.New("n must be > 0")
	ErrInvalidQ        = errors.New("query must be non-zero length")
	ErrInvalidParallel = errors.New("number of threads must be >= 0")
)
View Source
var ErrNilTokenizer = errors.New("nil tokenizer")

Functions

func Tokenize

func Tokenize(text string) (map[string]int, error)

Types

type Collection

type Collection struct {
	// contains filtered or unexported fields
}

func Load

func Load(f string) (*Collection, error)

func (*Collection) AddDocument

func (c *Collection) AddDocument(d Document) error

func (*Collection) GobDecode

func (c *Collection) GobDecode(data []byte) error

func (*Collection) GobEncode

func (c *Collection) GobEncode() ([]byte, error)

func (*Collection) Save

func (c *Collection) Save(f string) error

func (*Collection) Score

func (c *Collection) Score(q string, n int) ([]*ScoredDocument, error)

func (*Collection) SetParallel

func (c *Collection) SetParallel(p int) error

func (*Collection) SetTokenizer

func (c *Collection) SetTokenizer(t Tokenizer)

type Document

type Document interface {
	Text() string
}

type ScoredDocument

type ScoredDocument struct {
	S float64
	D Document
}

type TextDocument

type TextDocument struct {
	// contains filtered or unexported fields
}

func NewTextDocument

func NewTextDocument(text string) *TextDocument

func (*TextDocument) GobDecode

func (t *TextDocument) GobDecode(data []byte) error

func (*TextDocument) GobEncode

func (t *TextDocument) GobEncode() ([]byte, error)

func (*TextDocument) Text

func (t *TextDocument) Text() string

type Tokenizer

type Tokenizer func(string) (map[string]int, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL