microblob

module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2017 License: GPL-3.0

README

microblob

Serve JSON from file via HTTP. Do not store the blobs in a key-value store again, just the offset and lengths of the documents inside a file.

           +---------------------------------------------------------------------------------+
           |                                                                                 |
           |                                                                                 |
+------->  | HTTP request  +-------> lookup offset and length +---------------->  LevelDB <-------+
           |                                                                                 |    |
           |                                              +   <----------------+             |    |
           |                                              |                                  |    |
           |                                              |                                  |    |
<-------+  | HTTP response <-------+ seek and read <------+                                  |    |
           |                                                                                 |    |
           |                           ^  +                                                  |    |
           |                           |  |                                  --microblob     |    |
           +---------------------------------------------------------------------------------+    |
                                       |  |                                                       |
                                       |  |          +--------------------------------------------+
                                       |  |          |
                                       |  |          |
                                       |  v          |
                                                     +
                    $ microblob -file blobfile -db data.db -serve -addr 0.0.0.0:8820

The goal is to serve a large number of keys, while being memory efficient and fast to index. Creating a blob database with 120 million entries takes about an hour, consumes few GB memory during creation and only a few GB on disk and will be served fast from memory, as soon as OS cache parts of the blob file.

It should be possible to use this setup as is for twice or more keys.

Usage

$ microblob -h
Usage of microblob:
  -addr string
          address to serve (default "127.0.0.1:8820")
  -backend string
          backend to use, currently only leveldb (default "leveldb")
  -batch int
          number of lines in a batch (default 100000)
  -db string
          filename to use for backend (default "data.db")
  -file string
          file to index or serve
  -key string
          key to extract
  -r string
          regular expression to use as key extractor
  -serve
          serve file
  -version
          show version and exit
$ microblob -db data.db -file fixtures/1000.ldj -key finc.record_id
$ microblob -db data.db -file fixtures/1000.ldj -serve
$ curl -s localhost:8820/ai-121-b2FpOmFyWGl2Lm9yZzowNzA0LjAwMjQ | jq .
{
  "finc.format": "ElectronicArticle",
  "finc.mega_collection": "Arxiv",
  "finc.record_id": "ai-121-b2FpOmFyWGl2Lm9yZzowNzA0LjAwMjQ",
  "finc.source_id": "121",
  "rft.atitle": "Formation of quasi-solitons in transverse confined ferromagnetic film   media",
  "rft.jtitle": "Arxiv",
  ...
  "url": [
    "http://arxiv.org/abs/0704.0024"
  ],
  "x.subjects": [
    "Nonlinear Sciences - Pattern Formation and Solitons"
  ]
}

Performance

$ ll -h fixtures/example.ldj
-rw-rw-r-- 1 zzz zzz 120G Feb 22 15:35 fixtures/example.ldj

$ wc -l fixtures/example.ldj
118627938 fixtures/example.ldj

$ time microblob -db data.db -file fixtures/example.ldj -key finc.record_id
...
real    68m26.039s
user    58m47.116s
sys      3m21.976s

$ ab -c 10 -n 10000 http://127.0.0.1:8820/ai-121-b2FpOmFyWGl2Lm9yZzowNzA0LjAwNTA
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8820

Document Path:          /ai-121-b2FpOmFyWGl2Lm9yZzowNzA0LjAwNTA
Document Length:        1576 bytes

Concurrency Level:      10
Time taken for tests:   0.445 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      16950000 bytes
HTML transferred:       15760000 bytes
Requests per second:    22480.30 [#/sec] (mean)
Time per request:       0.445 [ms] (mean)
Time per request:       0.044 [ms] (mean, across all concurrent requests)
Transfer rate:          37211.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       2
Waiting:        0    0   0.1      0       2
Total:          0    0   0.1      0       3

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      1
  95%      1
  98%      1
  99%      1
 100%      3 (longest request)

$ hey -n 10000 http://localhost:8820/ai-48-R0xJUF9fTmpneU9UTTFPVUJBUURZNE1qa3pOVGs
All requests done.

Summary:
  Total:	0.2991 secs
  Slowest:	0.0326 secs
  Fastest:	0.0001 secs
  Average:	0.0014 secs
  Requests/sec:	33433.4975

Status code distribution:
  [200]	10000 responses

Response time histogram:
  0.000 [1]	|
  0.003 [9369]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.007 [519]	|∎∎
  0.010 [75]	|
  0.013 [25]	|
  0.016 [7]	|
  0.020 [2]	|
  0.023 [0]	|
  0.026 [1]	|
  0.029 [0]	|
  0.033 [1]	|

Latency distribution:
  10% in 0.0003 secs
  25% in 0.0006 secs
  50% in 0.0011 secs
  75% in 0.0017 secs
  90% in 0.0028 secs
  95% in 0.0036 secs
  99% in 0.0068 secs

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL