solr-index-fetch

command module
v0.0.0-...-6970577 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2017 License: Apache-2.0 Imports: 10 Imported by: 0

README

Solr index fetcher

Retrieves the most recent / current Solr index version via the Solr HTTP API. This occurs concurrently, so that downloads do not block or slow down the process for large indeces.

Golang allows your OS buffers to handle the file downloads as it prefers. Whether this be in memory or buffered out to a tmp file, it's your OS decision on whether or not it keeps the entire thing in memory. Note that when writing to output files, this memory is not duplicated to the tune of the file size, as io.Copy writes 32MB per buffer at a time, which means at any given time your overhead should only be 128MB, along with channel etc. overhead, and whatever your OS adds for the file downloads.

Implementation and Dependencies

This script was built with Go 1.2.1, and is suitable for retrieving the latest version of a Solr instance's index.

By default, the script will only use 1/2 of your cores for concurrency; ie: if you have 8 cores, you will get 4 workers.

Works flawlessly with Solr 4.6.1, and should work just as well with any version of Solr that implements the same HTTP API and answers queries using the same XML response format.

Build

go build solr-fetch.go

Usage

Options:

  • -l=<URL to Solr Admin page>
  • -o=<Output Path for the downloaded Solr Index>

Example:

localhost:solr-fetch Mo$ ./solr-fetch -l=http://172.20.20.20:8983/solr -o=/var/lib/solr/backup

2014/06/01 09:05:15 Beginning fetch of Solr index...
2014/06/01 09:05:15 {Url:http://172.20.20.20:8983/solr/replication?command=filecontent&file=segments_d&generation=13&indexversion=1401508582278&wt=filestream StatusCode:200}
...

localhost:solr-fetch Mo$ ls results/
_8.fdt      _8.si       _8_Lucene41_0.pos
_8.fdx      _8.tvd      _8_Lucene41_0.tim
_8.fnm      _8.tvx      _8_Lucene41_0.tip
_8.nvd      _8_1.del    segments_d
_8.nvm      _8_Lucene41_0.doc

Notes/TODO

Not implemented:

  • A friendly delay between requests to the Solr server. Solr has no problem fetching these requests in development without delay, though this could change as pressure on the Solr server increases.
  • Clear and retry download in the event of a failure for any N number of attempts requested by CLI.
  • Health check Solr server before beginning process of document retrieval.

Documentation

Overview

Script to retrieve the latest solr index data from a server

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL