srake

module
v0.1.0-alpha.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: MIT

README

SRAKE - SRA Knowledge Engine

Pronounced like Japanese sake (酒) — "srah-keh"

Release

SRAKE is a tool for ingesting, searching, and serving NCBI SRA (Sequence Read Archive) metadata. It streams multi-gigabyte compressed archives directly into a local SQLite database, supports full-text and semantic vector search, and exposes results via CLI, REST API, or MCP.

Pre-alpha software — developed at BioHackathon 2025, Mie, Japan. Not production-ready. APIs, schemas, and behavior may change without notice.

Installation

From source (requires Go 1.25+, CGO, SQLite3)
git clone https://github.com/nishad/srake.git
cd srake
go build -tags "sqlite_fts5,search" -o srake ./cmd/srake
Pre-built binaries

Download from the releases page. Available for linux/amd64, linux/arm64, darwin/amd64, and darwin/arm64.

Docker
docker pull ghcr.io/nishad/srake:latest
docker run -v $(pwd)/data:/data ghcr.io/nishad/srake:latest --help

Quick start

# Ingest SRA metadata (auto-selects best source from NCBI)
srake ingest --auto

# Ingest a specific local archive
srake ingest --file /path/to/archive.tar.gz

# Search
srake search "homo sapiens" --limit 10

# Start the API server
srake server --port 8080

# Start MCP server for AI assistants
srake mcp --transport stdio

Commands

srake ingest — Ingest SRA metadata

Streams tar.gz archives from NCBI (or local files) directly into SQLite without intermediate extraction.

srake ingest --auto                # auto-select best source
srake ingest --daily               # latest daily update
srake ingest --monthly             # full monthly dataset
srake ingest --file archive.tar.gz # local file
srake ingest --list                # list available files on NCBI

Filtering during ingest:

srake ingest --file archive.tar.gz \
  --taxon-ids 9606 \
  --platforms ILLUMINA \
  --strategies RNA-Seq \
  --date-from 2024-01-01 \
  --min-reads 1000000

# Preview what would be ingested without inserting
srake ingest --file archive.tar.gz --taxon-ids 9606 --stats-only
srake search — Search metadata

Supports multiple search modes: database (SQLite FTS5), text (Bleve), vector (semantic embeddings), hybrid (combined), and auto (default).

srake search "breast cancer" --limit 20
srake search "RNA-Seq" --organism "homo sapiens" --platform ILLUMINA
srake search "tumor gene expression" --search-mode vector
srake search "mouse brain" --similarity-threshold 0.7 --show-confidence
srake search "covid" --format json --output results.json

Output formats: table (default), json, csv, tsv.

srake server — REST API server
srake server --port 8080 --enable-cors

Endpoints:

Method Path Description
GET /api/v1/search?query=... Search metadata
GET /api/v1/studies/{accession} Get study
GET /api/v1/experiments/{accession} Get experiment
GET /api/v1/samples/{accession} Get sample
GET /api/v1/runs/{accession} Get run
GET /api/v1/stats Database statistics
GET /api/v1/health Health check
POST /api/v1/export Export results
srake mcp — MCP server for AI assistants

Implements Model Context Protocol over stdio or HTTP.

srake mcp --transport stdio    # for Claude Desktop, etc.
srake mcp --transport http --port 8081

Provides tools: search_sra, get_metadata, find_similar, export_results.

srake metadata — Accession lookup
srake metadata SRR12345678 --format json
srake metadata SRP123456 SRX654321 --format yaml
srake db — Database management
srake db info                  # show database statistics
srake db stats --rebuild       # rebuild pre-computed statistics
srake db export -o out.sqlite  # export to SRAmetadb-compatible format
srake db export -o out.sqlite --fts-version 3  # FTS3 for legacy compatibility
srake models — Embedding model management
srake models list
srake models download Xenova/SapBERT-from-PubMedBERT-fulltext
srake models test <model-id> "sample text"

SRAKE supports semantic search using biomedical embeddings (SapBERT via ONNX Runtime). This requires downloading a model and building an index with embeddings enabled.

srake models download Xenova/SapBERT-from-PubMedBERT-fulltext
srake search "metabolic pathway analysis" --search-mode vector

Environment variables

Variable Description
SRAKE_DB_PATH Path to metadata database
SRAKE_INDEX_PATH Path to search index directory
SRAKE_CONFIG_DIR Configuration directory (default: ~/.config/srake)
SRAKE_DATA_DIR Data directory (default: ~/.local/share/srake)
SRAKE_CACHE_DIR Cache directory (default: ~/.cache/srake)
SRAKE_MODEL_VARIANT Embedding model variant (full, quantized)
NO_COLOR Disable colored output

Follows XDG Base Directory Specification.

Database schema

Core tables: studies, experiments, samples, runs, submissions, analyses. Junction tables: experiment_samples, statistics. Full-text search via SQLite FTS5 virtual tables.

Development

go build -tags "sqlite_fts5,search" ./...
go test -tags "sqlite_fts5,search" ./...

# Build with version injection
go build -tags "sqlite_fts5,search" \
  -ldflags="-X main.Version=$(git describe --tags) -X main.Commit=$(git rev-parse --short HEAD) -X main.BuildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  -o srake ./cmd/srake

License

MIT — Nishad Thalhath, 2025

Directories

Path Synopsis
cmd
server command
srake command
test_embeddings command
examples
basic command
internal
api
Package api provides HTTP handlers for the srake REST API, exposing search, metadata retrieval, statistics, and export endpoints.
Package api provides HTTP handlers for the srake REST API, exposing search, metadata retrieval, statistics, and export endpoints.
cli
converter
Package converter provides accession ID conversion between SRA, GEO, BioProject, and BioSample identifier systems using local database lookups and NCBI E-utilities as a fallback.
Package converter provides accession ID conversion between SRA, GEO, BioProject, and BioSample identifier systems using local database lookups and NCBI E-utilities as a fallback.
database
Package database provides SQLite-backed storage for SRA metadata records including studies, experiments, samples, runs, submissions, and analyses.
Package database provides SQLite-backed storage for SRA metadata records including studies, experiments, samples, runs, submissions, and analyses.
errors
Package errors provides error handling utilities for SRAKE.
Package errors provides error handling utilities for SRAKE.
export
Package export handles exporting SRA metadata from the internal database into SRAdb-compatible SQLite database files, with optional gzip compression.
Package export handles exporting SRA metadata from the internal database into SRAdb-compatible SQLite database files, with optional gzip compression.
mcp
processor
Package processor provides streaming ingestion of SRA metadata from tar.gz archives, supporting both HTTP URLs and local files.
Package processor provides streaming ingestion of SRA metadata from tar.gz archives, supporting both HTTP URLs and local files.
query
Package query provides a unified search engine that combines full-text (Bleve), structured metadata (SQLite), and vector similarity search into a single interface.
Package query provides a unified search engine that combines full-text (Bleve), structured metadata (SQLite), and vector similarity search into a single interface.
service
Package service provides high-level business logic for querying and managing SRA metadata, including study, experiment, sample, and run access.
Package service provides high-level business logic for querying and managing SRA metadata, including study, experiment, sample, and run access.
testutil
Package testutil provides testing utilities for SRAKE packages.
Package testutil provides testing utilities for SRAKE packages.
ui

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL