minisearch

module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2023 License: MIT

README ΒΆ

MiniSearch

build

Restful, in-memory, full-text search engine written in Go.

βœ… Features

  • Full-text indexing of multiple fields in a document
  • Boolean queries with AND, OR operators between subqueries
  • Exact phrase search
  • Document ranking based on TF-IDF
  • Vector similarity search for semantic search
  • Stemming-based query expansion for many languages
  • Document deletion and updating with index garbage collection

πŸ› οΈ Installation

Download binary

To download and run minisearch from a precompiled binary:

  1. Download a precompiled version of minisearch from GitHub.
  2. Run the server binary:
$ ./server
Run with Docker

To run minisearch with Docker, use the minisearch Docker image:

$ docker run -d --name minisearch -p 3000:3000 micpst/minisearch:latest
Build from source

To build and run minisearch from the source code:

  1. Requirements: go & make
  2. Install dependencies:
$ make setup
  1. Build:
$ make build
  1. Run the server binary:
$ ./bin/server

πŸ“˜ Usage

Add documents

Create a new document and add it to the index.

$ curl -X POST localhost:3000/api/v1/documents \
    -H 'Content-Type: application/json' \
    -d '{ 
      "title": "The Silicon Brain", 
      "url": "https://micpst.com/posts/silicon-brain", 
      "abstract": "The human brain is often described as complex..." 
    }'
Upload document dumps

Fill the index with a large number of documents at once by uploading a document dumps.

$ curl -X POST localhost:3000/api/v1/upload \
    -H 'Content-Type: multipart/form-data' \
    -F 'file[]=@/path/to/dataset1.xml.gz' \
    -F 'file[]=@/path/to/dataset2.xml.gz'

The dump should have the following structure:

<docs>
  <doc>
    <title>...</title>
    <url>...</url>
    <abstract>...</abstract>
  </doc>
  <doc>
    <title>...</title>
    <url>...</url>
    <abstract>...</abstract>
  </doc>
</docs>
Update the document

Update the existing document and re-index it with the new fields.

$ curl -X PUT localhost:3000/api/v1/documents/<id> \
    -H 'Content-Type: application/json' \
    -d '{ 
      "title": "The Silicon Brain", 
      "url": "https://micpst.com/posts/silicon-brain", 
      "abstract": "The human brain is often described as complex..." 
    }'
Remove the document

Permanently delete the document and remove it from the index.

$ curl -X DELETE localhost:3000/api/v1/documents/<id>
Search the index

To search the index for documents that contain specific words, use the following request:

$ curl -G localhost:3000/api/v1/search \
    -d query=silicon%20brain \
    -d properties=title,abstract \
    -d bool_mode=AND

πŸ“„ License

All my code is MIT licensed. Libraries follow their respective licenses.

Directories ΒΆ

Path Synopsis
cmd
server command
pkg
lib

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL