MiniSearch

Restful, in-memory, full-text search engine written in Go.
β
Features
- Full-text indexing of multiple fields in a document
- Boolean queries with AND, OR operators between subqueries
- Exact phrase search
- Document ranking based on TF-IDF
- Vector similarity search for semantic search
- Stemming-based query expansion for many languages
- Document deletion and updating with index garbage collection
π οΈ Installation
Download binary
To download and run minisearch from a precompiled binary:
- Download a precompiled version of minisearch from GitHub.
- Run the server binary:
$ ./server
Run with Docker
To run minisearch with Docker, use the minisearch Docker image:
$ docker run -d --name minisearch -p 3000:3000 micpst/minisearch:latest
Build from source
To build and run minisearch from the source code:
- Requirements: go & make
- Install dependencies:
$ make setup
- Build:
$ make build
- Run the server binary:
$ ./bin/server
π Usage
Add documents
Create a new document and add it to the index.
$ curl -X POST localhost:3000/api/v1/documents \
-H 'Content-Type: application/json' \
-d '{
"title": "The Silicon Brain",
"url": "https://micpst.com/posts/silicon-brain",
"abstract": "The human brain is often described as complex..."
}'
Upload document dumps
Fill the index with a large number of documents at once by uploading a document dumps.
$ curl -X POST localhost:3000/api/v1/upload \
-H 'Content-Type: multipart/form-data' \
-F 'file[]=@/path/to/dataset1.xml.gz' \
-F 'file[]=@/path/to/dataset2.xml.gz'
The dump should have the following structure:
<docs>
<doc>
<title>...</title>
<url>...</url>
<abstract>...</abstract>
</doc>
<doc>
<title>...</title>
<url>...</url>
<abstract>...</abstract>
</doc>
</docs>
Update the document
Update the existing document and re-index it with the new fields.
$ curl -X PUT localhost:3000/api/v1/documents/<id> \
-H 'Content-Type: application/json' \
-d '{
"title": "The Silicon Brain",
"url": "https://micpst.com/posts/silicon-brain",
"abstract": "The human brain is often described as complex..."
}'
Remove the document
Permanently delete the document and remove it from the index.
$ curl -X DELETE localhost:3000/api/v1/documents/<id>
Search the index
To search the index for documents that contain specific words, use the following request:
$ curl -G localhost:3000/api/v1/search \
-d query=silicon%20brain \
-d properties=title,abstract \
-d bool_mode=AND
π License
All my code is MIT licensed. Libraries follow their respective licenses.