README
¶
BadgerDB

BadgerDB is an embeddable, persistent, simple and fast key-value (KV) database written in pure Go. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.
Project Status
Badger v1.0 was released in Nov 2017. Check the Changelog for the full details.
We introduced transactions in v0.9.0 which involved a major API change. If you have a Badger datastore prior to that, please use v0.8.1, but we strongly urge you to upgrade. Upgrading from both v0.8 and v0.9 will require you to take backups and restore using the new version.
Table of Contents
- Getting Started
- Resources
- Contact
- Design
- Other Projects Using Badger
- Frequently Asked Questions
Getting Started
Installing
To start using Badger, install Go 1.8 or above and run go get
:
$ go get github.com/dgraph-io/badger/...
This will retrieve the library and install the badger_info
command line
utility into your $GOBIN
path.
Opening a database
The top-level object in Badger is a DB
. It represents multiple files on disk
in specific directories, which contain the data for a single database.
To open your database, use the badger.Open()
function, with the appropriate
options. The Dir
and ValueDir
options are mandatory and must be
specified by the client. They can be set to the same value to simplify things.
package main
import (
"log"
"github.com/dgraph-io/badger"
)
func main() {
// Open the Badger database located in the /tmp/badger directory.
// It will be created if it doesn't exist.
opts := badger.DefaultOptions
opts.Dir = "/tmp/badger"
opts.ValueDir = "/tmp/badger"
db, err := badger.Open(opts)
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Your code here…
}
Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.
Transactions
Read-only transactions
To start a read-only transaction, you can use the DB.View()
method:
err := db.View(func(txn *badger.Txn) error {
// Your code here…
return nil
})
You cannot perform any writes or deletes within this transaction. Badger ensures that you get a consistent view of the database within this closure. Any writes that happen elsewhere after the transaction has started, will not be seen by calls made within the closure.
Read-write transactions
To start a read-write transaction, you can use the DB.Update()
method:
err := db.Update(func(txn *badger.Txn) error {
// Your code here…
return nil
})
All database operations are allowed inside a read-write transaction.
Always check the returned error value. If you return an error within your closure it will be passed through.
An ErrConflict
error will be reported in case of a conflict. Depending on the state
of your application, you have the option to retry the operation if you receive
this error.
An ErrTxnTooBig
will be reported in case the number of pending writes/deletes in
the transaction exceed a certain limit. In that case, it is best to commit the
transaction and start a new transaction immediately. Here is an example (we are
not checking for errors in some places for simplicity):
updates := make(map[string]string)
txn := db.NewTransaction(true)
for k,v := range updates {
if err := txn.Set([]byte(k),[]byte(v)); err == ErrTxnTooBig {
_ = txn.Commit()
txn = db.NewTransaction(..)
_ = txn.Set([]byte(k),[]byte(v))
}
}
_ = txn.Commit()
Managing transactions manually
The DB.View()
and DB.Update()
methods are wrappers around the
DB.NewTransaction()
and Txn.Commit()
methods (or Txn.Discard()
in case of
read-only transactions). These helper methods will start the transaction,
execute a function, and then safely discard your transaction if an error is
returned. This is the recommended way to use Badger transactions.
However, sometimes you may want to manually create and commit your
transactions. You can use the DB.NewTransaction()
function directly, which
takes in a boolean argument to specify whether a read-write transaction is
required. For read-write transactions, it is necessary to call Txn.Commit()
to ensure the transaction is committed. For read-only transactions, calling
Txn.Discard()
is sufficient. Txn.Commit()
also calls Txn.Discard()
internally to cleanup the transaction, so just calling Txn.Commit()
is
sufficient for read-write transaction. However, if your code doesn’t call
Txn.Commit()
for some reason (for e.g it returns prematurely with an error),
then please make sure you call Txn.Discard()
in a defer
block. Refer to the
code below.
// Start a writable transaction.
txn := db.NewTransaction(true)
defer txn.Discard()
// Use the transaction...
err := txn.Set([]byte("answer"), []byte("42"))
if err != nil {
return err
}
// Commit the transaction and check for error.
if err := txn.Commit(nil); err != nil {
return err
}
The first argument to DB.NewTransaction()
is a boolean stating if the transaction
should be writable.
Badger allows an optional callback to the Txn.Commit()
method. Normally, the
callback can be set to nil
, and the method will return after all the writes
have succeeded. However, if this callback is provided, the Txn.Commit()
method returns as soon as it has checked for any conflicts. The actual writing
to the disk happens asynchronously, and the callback is invoked once the
writing has finished, or an error has occurred. This can improve the throughput
of the application in some cases. But it also means that a transaction is not
durable until the callback has been invoked with a nil
error value.
Using key/value pairs
To save a key/value pair, use the Txn.Set()
method:
err := db.Update(func(txn *badger.Txn) error {
err := txn.Set([]byte("answer"), []byte("42"))
return err
})
This will set the value of the "answer"
key to "42"
. To retrieve this
value, we can use the Txn.Get()
method:
err := db.View(func(txn *badger.Txn) error {
item, err := txn.Get([]byte("answer"))
if err != nil {
return err
}
val, err := item.Value()
if err != nil {
return err
}
fmt.Printf("The answer is: %s\n", val)
return nil
})
Txn.Get()
returns ErrKeyNotFound
if the value is not found.
Please note that values returned from Get()
are only valid while the
transaction is open. If you need to use a value outside of the transaction
then you must use copy()
to copy it to another byte slice.
Use the Txn.Delete()
method to delete a key.
Monotonically increasing integers
To get unique monotonically increasing integers with strong durability, you can
use the DB.GetSequence
method. This method returns a Sequence
object, which
is thread-safe and can be used concurrently via various goroutines.
Badger would lease a range of integers to hand out from memory, with the
bandwidth provided to DB.GetSequence
. The frequency at which disk writes are
done is determined by this lease bandwidth and the frequency of Next
invocations. Setting a bandwith too low would do more disk writes, setting it
too high would result in wasted integers if Badger is closed or crashes.
To avoid wasted integers, call Release
before closing Badger.
seq, err := db.GetSequence(key, 1000)
defer seq.Release()
for {
num, err := seq.Next()
}
Merge Operations
Badger provides support for unordered merge operations. You can define a func
of type MergeFunc
which takes in an existing value, and a value to be
merged with it. It returns a new value which is the result of the merge
operation. All values are specified in byte arrays. For e.g., here is a merge
function (add
) which adds a uint64
value to an existing uint64
value.
uint64ToBytes(i uint64) []byte {
var buf [8]byte
binary.BigEndian.PutUint64(buf[:], i)
return buf[:]
}
func bytesToUint64(b []byte) uint64 {
return binary.BigEndian.Uint64(b)
}
// Merge function to add two uint64 numbers
func add(existing, new []byte) []byte {
return uint64ToBytes(bytesToUint64(existing) + bytesToUint64(new))
}
This function can then be passed to the DB.GetMergeOperator()
method, along
with a key, and a duration value. The duration specifies how often the merge
function is run on values that have been added using the MergeOperator.Add()
method.
MergeOperator.Get()
method can be used to retrieve the cumulative value of the key
associated with the merge operation.
key := []byte("merge")
m := db.GetMergeOperator(key, add, 200*time.Millisecond)
defer m.Stop()
m.Add(uint64ToBytes(1))
m.Add(uint64ToBytes(2))
m.Add(uint64ToBytes(3))
res, err := m.Get() // res should have value 6 encoded
fmt.Println(bytesToUint64(res))
Setting Time To Live(TTL) and User Metadata on Keys
Badger allows setting an optional Time to Live (TTL) value on keys. Once the TTL has
elapsed, the key will no longer be retrievable and will be eligible for garbage
collection. A TTL can be set as a time.Duration
value using the Txn.SetWithTTL()
API method.
An optional user metadata value can be set on each key. A user metadata value
is represented by a single byte. It can be used to set certain bits along
with the key to aid in interpreting or decoding the key-value pair. User
metadata can be set using the Txn.SetWithMeta()
API method.
Txn.SetEntry()
can be used to set the key, value, user metatadata and TTL,
all at once.
Iterating over keys
To iterate over keys, we can use an Iterator
, which can be obtained using the
Txn.NewIterator()
method. Iteration happens in byte-wise lexicographical sorting
order.
err := db.View(func(txn *badger.Txn) error {
opts := badger.DefaultIteratorOptions
opts.PrefetchSize = 10
it := txn.NewIterator(opts)
defer it.Close()
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
v, err := item.Value()
if err != nil {
return err
}
fmt.Printf("key=%s, value=%s\n", k, v)
}
return nil
})
The iterator allows you to move to a specific point in the list of keys and move forward or backward through the keys one at a time.
By default, Badger prefetches the values of the next 100 items. You can adjust
that with the IteratorOptions.PrefetchSize
field. However, setting it to
a value higher than GOMAXPROCS (which we recommend to be 128 or higher)
shouldn’t give any additional benefits. You can also turn off the fetching of
values altogether. See section below on key-only iteration.
Prefix scans
To iterate over a key prefix, you can combine Seek()
and ValidForPrefix()
:
db.View(func(txn *badger.Txn) error {
it := txn.NewIterator(badger.DefaultIteratorOptions)
defer it.Close()
prefix := []byte("1234")
for it.Seek(prefix); it.ValidForPrefix(prefix); it.Next() {
item := it.Item()
k := item.Key()
v, err := item.Value()
if err != nil {
return err
}
fmt.Printf("key=%s, value=%s\n", k, v)
}
return nil
})
Key-only iteration
Badger supports a unique mode of iteration called key-only iteration. It is
several order of magnitudes faster than regular iteration, because it involves
access to the LSM-tree only, which is usually resident entirely in RAM. To
enable key-only iteration, you need to set the IteratorOptions.PrefetchValues
field to false
. This can also be used to do sparse reads for selected keys
during an iteration, by calling item.Value()
only when required.
err := db.View(func(txn *badger.Txn) error {
opts := badger.DefaultIteratorOptions
opts.PrefetchValues = false
it := txn.NewIterator(opts)
defer it.Close()
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
fmt.Printf("key=%s\n", k)
}
return nil
})
Garbage Collection
Badger values need to be garbage collected, because of two reasons:
-
Badger keeps values separately from the LSM tree. This means that the compaction operations that clean up the LSM tree do not touch the values at all. Values need to be cleaned up separately.
-
Concurrent read/write transactions could leave behind multiple values for a single key, because they are stored with different versions. These could accumulate, and take up unneeded space beyond the time these older versions are needed.
Badger relies on the client to perform garbage collection at a time of their choosing. It provides the following methods, which can be invoked at an appropriate time:
DB.PurgeOlderVersions()
: Is no longer needed since v1.5.0. Badger's LSM tree automatically discards older/invalid versions of keys.DB.RunValueLogGC()
: This method is designed to do garbage collection while Badger is online. Along with randomly picking a file, it uses statistics generated by the LSM-tree compactions to pick files that are likely to lead to maximum space reclamation.
It is recommended that this method be called during periods of low activity in your system, or periodically. One call would only result in removal of at max one log file. As an optimization, you could also immediately re-run it whenever it returns nil error (indicating a successful value log GC).
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
again:
err := db.RunValueLogGC(0.7)
if err == nil {
goto again
}
}
Database backup
There are two public API methods DB.Backup()
and DB.Load()
which can be
used to do online backups and restores. Badger v0.9 provides a CLI tool
badger
, which can do offline backup/restore. Make sure you have $GOPATH/bin
in your PATH to use this tool.
The command below will create a version-agnostic backup of the database, to a
file badger.bak
in the current working directory
badger backup --dir <path/to/badgerdb>
To restore badger.bak
in the current working directory to a new database:
badger restore --dir <path/to/badgerdb>
See badger --help
for more details.
If you have a Badger database that was created using v0.8 (or below), you can
use the badger_backup
tool provided in v0.8.1, and then restore it using the
command above to upgrade your database to work with the latest version.
badger_backup --dir <path/to/badgerdb> --backup-file badger.bak
Memory usage
Badger's memory usage can be managed by tweaking several options available in
the Options
struct that is passed in when opening the database using
DB.Open
.
Options.ValueLogLoadingMode
can be set tooptions.FileIO
(instead of the defaultoptions.MemoryMap
) to avoid memory-mapping log files. This can be useful in environments with low RAM.- Number of memtables (
Options.NumMemtables
)- If you modify
Options.NumMemtables
, also adjustOptions.NumLevelZeroTables
andOptions.NumLevelZeroTablesStall
accordingly.
- If you modify
- Number of concurrent compactions (
Options.NumCompactors
) - Mode in which LSM tree is loaded (
Options.TableLoadingMode
) - Size of table (
Options.MaxTableSize
) - Size of value log file (
Options.ValueLogFileSize
)
If you want to decrease the memory usage of Badger instance, tweak these options (ideally one at a time) until you achieve the desired memory usage.
Statistics
Badger records metrics using the expvar package, which is included in the Go standard library. All the metrics are documented in y/metrics.go file.
expvar
package adds a handler in to the default HTTP server (which has to be
started explicitly), and serves up the metrics at the /debug/vars
endpoint.
These metrics can then be collected by a system like Prometheus, to get
better visibility into what Badger is doing.
Resources
Blog Posts
- Introducing Badger: A fast key-value store written natively in Go
- Make Badger crash resilient with ALICE
- Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
- Concurrent ACID Transactions in Badger
Design
Badger was written with these design goals in mind:
- Write a key-value database in pure Go.
- Use latest research to build the fastest KV database for data sets spanning terabytes.
- Optimize for SSDs.
Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.
Comparisons
Feature | Badger | RocksDB | BoltDB |
---|---|---|---|
Design | LSM tree with value log | LSM tree only | B+ tree |
High Read throughput | Yes | No | Yes |
High Write throughput | Yes | Yes | No |
Designed for SSDs | Yes (with latest research 1) | Not specifically 2 | No |
Embeddable | Yes | Yes | Yes |
Sorted KV access | Yes | Yes | Yes |
Pure Go (no Cgo) | Yes | No | Yes |
Transactions | Yes, ACID, concurrent with SSI3 | Yes (but non-ACID) | Yes, ACID |
Snapshots | Yes | Yes | Yes |
TTL support | Yes | Yes | No |
1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.
2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.
3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger
Benchmarks
We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).
Other Projects Using Badger
Below is a list of known projects that use Badger:
- 0-stor - Single device object store.
- Dgraph - Distributed graph database.
- Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
- Usenet Express - Serving over 300TB of data with Badger.
- go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
- gorush - A push notification server written in Go.
- emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
- GarageMQ - AMQP server written in Go.
If you are using Badger in a project please send a pull request to add it to the list.
Frequently Asked Questions
- My writes are getting stuck. Why?
This can happen if a long running iteration with Prefetch
is set to false, but
a Item::Value
call is made internally in the loop. That causes Badger to
acquire read locks over the value log files to avoid value log GC removing the
file from underneath. As a side effect, this also blocks a new value log GC
file from being created, when the value log file boundary is hit.
Please see Github issues #293 and #315.
There are multiple workarounds during iteration:
- Use
Item::ValueCopy
instead ofItem::Value
when retrieving value. - Set
Prefetch
to true. Badger would then copy over the value and release the file lock immediately. - When
Prefetch
is false, don't callItem::Value
and do a pure key-only iteration. This might be useful if you just want to delete a lot of keys. - Do the writes in a separate transaction after the reads.
- My writes are really slow. Why?
Are you creating a new transaction for every single key update, and waiting for
it to Commit
fully before creating a new one? This will lead to very low
throughput. To get best write performance, batch up multiple writes inside a
transaction using single DB.Update()
call. You could also have multiple such
DB.Update()
calls being made concurrently from multiple goroutines.
The way to achieve the highest write throughput via Badger, is to do serial
writes and use callbacks in txn.Commit
, like so:
che := make(chan error, 1)
storeErr := func(err error) {
if err == nil {
return
}
select {
case che <- err:
default:
}
}
getErr := func() error {
select {
case err := <-che:
return err
default:
return nil
}
}
var wg sync.WaitGroup
for _, kv := range kvs {
wg.Add(1)
txn := db.NewTransaction(true)
handle(txn.Set(kv.Key, kv.Value))
handle(txn.Commit(func(err error) {
storeErr(err)
wg.Done()
}))
}
wg.Wait()
return getErr()
In this code, we passed a callback function to txn.Commit
, which can pick up
and return the first error encountered, if any. Callbacks can be made to do more
things, like retrying commits etc.
- I don't see any disk write. Why?
If you're using Badger with SyncWrites=false
, then your writes might not be written to value log
and won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before they
get compacted to disk. The compaction would only happen once MaxTableSize
has been reached. So, if
you're doing a few writes and then checking, you might not see anything on disk. Once you Close
the database, you'll see these writes on disk.
- Reverse iteration doesn't give me the right results.
Just like forward iteration goes to the first key which is equal or greater than the SEEK key, reverse iteration goes to the first key which is equal or lesser than the SEEK key. Therefore, SEEK key would not be part of the results. You can typically add a 0xff
byte as a suffix to the SEEK key to include it in the results. See the following issues: #436 and #347.
- Which instances should I use for Badger?
We recommend using instances which provide local SSD storage, without any limit on the maximum IOPS. In AWS, these are storage optimized instances like i3. They provide local SSDs which clock 100K IOPS over 4KB blocks easily.
- I'm getting a closed channel error. Why?
panic: close of closed channel
panic: send on closed channel
If you're seeing panics like above, this would be because you're operating on a closed DB. This can happen, if you call Close()
before sending a write, or multiple times. You should ensure that you only call Close()
once, and all your read/write operations finish before closing.
- Are there any Go specific settings that I should use?
We highly recommend setting a high number for GOMAXPROCS, which allows Go to observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set it to 128. For more details, see this thread.
- Are there any linux specific settings that I should use?
We recommend setting max file descriptors to a high number depending upon the expected size of you data.
Contact
- Please use discuss.dgraph.io for questions, feature requests and discussions.
- Please use Github issue tracker for filing bugs or feature requests.
- Join
.
- Follow us on Twitter @dgraphlabs.
Documentation
¶
Overview ¶
Package badger implements an embeddable, simple and fast key-value database, written in pure Go. It is designed to be highly performant for both reads and writes simultaneously. Badger uses Multi-Version Concurrency Control (MVCC), and supports transactions. It runs transactions concurrently, with serializable snapshot isolation guarantees.
Badger uses an LSM tree along with a value log to separate keys from values, hence reducing both write amplification and the size of the LSM tree. This allows LSM tree to be served entirely from RAM, while the values are served from SSD.
Usage ¶
Badger has the following main types: DB, Txn, Item and Iterator. DB contains keys that are associated with values. It must be opened with the appropriate options before it can be accessed.
All operations happen inside a Txn. Txn represents a transaction, which can be read-only or read-write. Read-only transactions can read values for a given key (which are returned inside an Item), or iterate over a set of key-value pairs using an Iterator (which are returned as Item type values as well). Read-write transactions can also update and delete keys from the DB.
See the examples for more usage details.
Index ¶
- Constants
- Variables
- type DB
- func (db *DB) Backup(w io.Writer, since uint64) (uint64, error)
- func (db *DB) Close() (err error)
- func (db *DB) GetMergeOperator(key []byte, f MergeFunc, dur time.Duration) *MergeOperator
- func (db *DB) GetSequence(key []byte, bandwidth uint64) (*Sequence, error)
- func (db *DB) Load(r io.Reader) error
- func (db *DB) MaxBatchCount() int64
- func (db *DB) MaxBatchSize() int64
- func (db *DB) NewTransaction(update bool) *Txn
- func (db *DB) RunValueLogGC(discardRatio float64) error
- func (db *DB) Size() (lsm int64, vlog int64)
- func (db *DB) Tables() []TableInfo
- func (db *DB) Update(fn func(txn *Txn) error) error
- func (db *DB) View(fn func(txn *Txn) error) error
- type Entry
- type Item
- func (item *Item) DiscardEarlierVersions() bool
- func (item *Item) EstimatedSize() int64
- func (item *Item) ExpiresAt() uint64
- func (item *Item) IsDeletedOrExpired() bool
- func (item *Item) Key() []byte
- func (item *Item) KeyCopy(dst []byte) []byte
- func (item *Item) String() string
- func (item *Item) ToString() string
- func (item *Item) UserMeta() byte
- func (item *Item) Value() ([]byte, error)
- func (item *Item) ValueCopy(dst []byte) ([]byte, error)
- func (item *Item) Version() uint64
- type Iterator
- type IteratorOptions
- type ManagedDB
- type Manifest
- type MergeFunc
- type MergeOperator
- type Options
- type Sequence
- type TableInfo
- type Txn
- func (txn *Txn) Commit(callback func(error)) error
- func (txn *Txn) CommitAt(commitTs uint64, callback func(error)) error
- func (txn *Txn) Delete(key []byte) error
- func (txn *Txn) Discard()
- func (txn *Txn) Get(key []byte) (item *Item, rerr error)
- func (txn *Txn) NewIterator(opt IteratorOptions) *Iterator
- func (txn *Txn) Set(key, val []byte) error
- func (txn *Txn) SetEntry(e *Entry) error
- func (txn *Txn) SetWithDiscard(key, val []byte, meta byte) error
- func (txn *Txn) SetWithMeta(key, val []byte, meta byte) error
- func (txn *Txn) SetWithTTL(key, val []byte, dur time.Duration) error
Examples ¶
Constants ¶
const (
// ManifestFilename is the filename for the manifest file.
ManifestFilename = "MANIFEST"
)
Variables ¶
var ( // ErrValueLogSize is returned when opt.ValueLogFileSize option is not within the valid // range. ErrValueLogSize = errors.New("Invalid ValueLogFileSize, must be between 1MB and 2GB") // ErrValueThreshold is returned when ValueThreshold is set to a value close to or greater than // uint16. ErrValueThreshold = errors.New("Invalid ValueThreshold, must be lower than uint16.") // ErrKeyNotFound is returned when key isn't found on a txn.Get. ErrKeyNotFound = errors.New("Key not found") // ErrTxnTooBig is returned if too many writes are fit into a single transaction. ErrTxnTooBig = errors.New("Txn is too big to fit into one request") // ErrConflict is returned when a transaction conflicts with another transaction. This can happen if // the read rows had been updated concurrently by another transaction. ErrConflict = errors.New("Transaction Conflict. Please retry") // ErrReadOnlyTxn is returned if an update function is called on a read-only transaction. ErrReadOnlyTxn = errors.New("No sets or deletes are allowed in a read-only transaction") // ErrDiscardedTxn is returned if a previously discarded transaction is re-used. ErrDiscardedTxn = errors.New("This transaction has been discarded. Create a new one") // ErrEmptyKey is returned if an empty key is passed on an update function. ErrEmptyKey = errors.New("Key cannot be empty") // ErrRetry is returned when a log file containing the value is not found. // This usually indicates that it may have been garbage collected, and the // operation needs to be retried. ErrRetry = errors.New("Unable to find log file. Please retry") // ErrThresholdZero is returned if threshold is set to zero, and value log GC is called. // In such a case, GC can't be run. ErrThresholdZero = errors.New( "Value log GC can't run because threshold is set to zero") // ErrNoRewrite is returned if a call for value log GC doesn't result in a log file rewrite. ErrNoRewrite = errors.New( "Value log GC attempt didn't result in any cleanup") // ErrRejected is returned if a value log GC is called either while another GC is running, or // after DB::Close has been called. ErrRejected = errors.New("Value log GC request rejected") // ErrInvalidRequest is returned if the user request is invalid. ErrInvalidRequest = errors.New("Invalid request") // ErrManagedTxn is returned if the user tries to use an API which isn't // allowed due to external management of transactions, when using ManagedDB. ErrManagedTxn = errors.New( "Invalid API request. Not allowed to perform this action using ManagedDB") // ErrInvalidDump if a data dump made previously cannot be loaded into the database. ErrInvalidDump = errors.New("Data dump cannot be read") // ErrZeroBandwidth is returned if the user passes in zero bandwidth for sequence. ErrZeroBandwidth = errors.New("Bandwidth must be greater than zero") // ErrInvalidLoadingMode is returned when opt.ValueLogLoadingMode option is not // within the valid range ErrInvalidLoadingMode = errors.New("Invalid ValueLogLoadingMode, must be FileIO or MemoryMap") // ErrReplayNeeded is returned when opt.ReadOnly is set but the // database requires a value log replay. ErrReplayNeeded = errors.New("Database was not properly closed, cannot open read-only") // ErrWindowsNotSupported is returned when opt.ReadOnly is used on Windows ErrWindowsNotSupported = errors.New("Read-only mode is not supported on Windows") // ErrTruncateNeeded is returned when the value log gets corrupt, and requires truncation of // corrupt data to allow Badger to run properly. ErrTruncateNeeded = errors.New("Value log truncate required to run DB. This might result in data loss.") // ErrBlockedWrites is returned if the user called DropAll. During the process of dropping all // data from Badger, we stop accepting new writes, by returning this error. ErrBlockedWrites = errors.New("Writes are blocked possibly due to DropAll") )
var DefaultIteratorOptions = IteratorOptions{ PrefetchValues: true, PrefetchSize: 100, Reverse: false, AllVersions: false, }
DefaultIteratorOptions contains default options when iterating over Badger key-value stores.
var DefaultOptions = Options{ DoNotCompact: false, LevelOneSize: 256 << 20, LevelSizeMultiplier: 10, TableLoadingMode: options.LoadToRAM, ValueLogLoadingMode: options.MemoryMap, MaxLevels: 7, MaxTableSize: 64 << 20, NumCompactors: 3, NumLevelZeroTables: 5, NumLevelZeroTablesStall: 10, NumMemtables: 5, SyncWrites: true, NumVersionsToKeep: 1, ValueLogFileSize: 1<<30 - 1, ValueLogMaxEntries: 1000000, ValueThreshold: 32, Truncate: false, }
DefaultOptions sets a list of recommended options for good performance. Feel free to modify these to suit your needs.
var LSMOnlyOptions = Options{}
LSMOnlyOptions follows from DefaultOptions, but sets a higher ValueThreshold so values would be colocated with the LSM tree, with value log largely acting as a write-ahead log only. These options would reduce the disk usage of value log, and make Badger act like a typical LSM tree.
Functions ¶
This section is empty.
Types ¶
type DB ¶ added in v0.9.0
type DB struct { sync.RWMutex // Guards list of inmemory tables, not individual reads and writes. // contains filtered or unexported fields }
DB provides the various functions required to interact with Badger. DB is thread-safe.
func (*DB) Backup ¶ added in v0.9.0
Backup dumps a protobuf-encoded list of all entries in the database into the given writer, that are newer than the specified version. It returns a timestamp indicating when the entries were dumped which can be passed into a later invocation to generate an incremental dump, of entries that have been added/modified since the last invocation of DB.Backup()
This can be used to backup the data in a database at a given point in time.
func (*DB) Close ¶ added in v0.9.0
Close closes a DB. It's crucial to call it to ensure all the pending updates make their way to disk. Calling DB.Close() multiple times is not safe and would cause panic.
func (*DB) GetMergeOperator ¶ added in v1.4.0
GetMergeOperator creates a new MergeOperator for a given key and returns a pointer to it. It also fires off a goroutine that performs a compaction using the merge function that runs periodically, as specified by dur.
func (*DB) GetSequence ¶ added in v1.3.0
GetSequence would initiate a new sequence object, generating it from the stored lease, if available, in the database. Sequence can be used to get a list of monotonically increasing integers. Multiple sequences can be created by providing different keys. Bandwidth sets the size of the lease, determining how many Next() requests can be served from memory.
func (*DB) Load ¶ added in v0.9.0
Load reads a protobuf-encoded list of all entries from a reader and writes them to the database. This can be used to restore the database from a backup made by calling DB.Backup().
DB.Load() should be called on a database that is not running any other concurrent transactions while it is running.
func (*DB) MaxBatchCount ¶ added in v1.5.4
MaxBatchCount returns max possible entries in batch
func (*DB) MaxBatchSize ¶ added in v1.5.4
MaxBatchCount returns max possible batch size
func (*DB) NewTransaction ¶ added in v0.9.0
NewTransaction creates a new transaction. Badger supports concurrent execution of transactions, providing serializable snapshot isolation, avoiding write skews. Badger achieves this by tracking the keys read and at Commit time, ensuring that these read keys weren't concurrently modified by another transaction.
For read-only transactions, set update to false. In this mode, we don't track the rows read for any changes. Thus, any long running iterations done in this mode wouldn't pay this overhead.
Running transactions concurrently is OK. However, a transaction itself isn't thread safe, and should only be run serially. It doesn't matter if a transaction is created by one goroutine and passed down to other, as long as the Txn APIs are called serially.
When you create a new transaction, it is absolutely essential to call Discard(). This should be done irrespective of what the update param is set to. Commit API internally runs Discard, but running it twice wouldn't cause any issues.
txn := db.NewTransaction(false) defer txn.Discard() // Call various APIs.
func (*DB) RunValueLogGC ¶ added in v0.9.0
RunValueLogGC triggers a value log garbage collection.
It picks value log files to perform GC based on statistics that are collected duing compactions. If no such statistics are available, then log files are picked in random order. The process stops as soon as the first log file is encountered which does not result in garbage collection.
When a log file is picked, it is first sampled. If the sample shows that we can discard at least discardRatio space of that file, it would be rewritten.
If a call to RunValueLogGC results in no rewrites, then an ErrNoRewrite is thrown indicating that the call resulted in no file rewrites.
We recommend setting discardRatio to 0.5, thus indicating that a file be rewritten if half the space can be discarded. This results in a lifetime value log write amplification of 2 (1 from original write + 0.5 rewrite + 0.25 + 0.125 + ... = 2). Setting it to higher value would result in fewer space reclaims, while setting it to a lower value would result in more space reclaims at the cost of increased activity on the LSM tree. discardRatio must be in the range (0.0, 1.0), both endpoints excluded, otherwise an ErrInvalidRequest is returned.
Only one GC is allowed at a time. If another value log GC is running, or DB has been closed, this would return an ErrRejected.
Note: Every time GC is run, it would produce a spike of activity on the LSM tree.
func (*DB) Size ¶ added in v1.3.0
Size returns the size of lsm and value log files in bytes. It can be used to decide how often to call RunValueLogGC.
type Entry ¶
type Entry struct { Key []byte Value []byte UserMeta byte ExpiresAt uint64 // time.Unix // contains filtered or unexported fields }
Entry provides Key, Value, UserMeta and ExpiresAt. This struct can be used by the user to set data.
type Item ¶ added in v0.9.0
type Item struct {
// contains filtered or unexported fields
}
Item is returned during iteration. Both the Key() and Value() output is only valid until iterator.Next() is called.
func (*Item) DiscardEarlierVersions ¶ added in v1.5.0
func (*Item) EstimatedSize ¶ added in v0.9.0
EstimatedSize returns approximate size of the key-value pair.
This can be called while iterating through a store to quickly estimate the size of a range of key-value pairs (without fetching the corresponding values).
func (*Item) ExpiresAt ¶ added in v1.0.0
ExpiresAt returns a Unix time value indicating when the item will be considered expired. 0 indicates that the item will never expire.
func (*Item) IsDeletedOrExpired ¶ added in v1.4.0
IsDeletedOrExpired returns true if item contains deleted or expired value.
func (*Item) Key ¶ added in v0.9.0
Key returns the key.
Key is only valid as long as item is valid, or transaction is valid. If you need to use it outside its validity, please use KeyCopy
func (*Item) KeyCopy ¶ added in v1.4.0
KeyCopy returns a copy of the key of the item, writing it to dst slice. If nil is passed, or capacity of dst isn't sufficient, a new slice would be allocated and returned.
func (*Item) UserMeta ¶ added in v0.9.0
UserMeta returns the userMeta set by the user. Typically, this byte, optionally set by the user is used to interpret the value.
func (*Item) Value ¶ added in v0.9.0
Value retrieves the value of the item from the value log.
This method must be called within a transaction. Calling it outside a transaction is considered undefined behavior. If an iterator is being used, then Item.Value() is defined in the current iteration only, because items are reused.
If you need to use a value outside a transaction, please use Item.ValueCopy instead, or copy it yourself. Value might change once discard or commit is called. Use ValueCopy if you want to do a Set after Get.
func (*Item) ValueCopy ¶ added in v1.1.0
ValueCopy returns a copy of the value of the item from the value log, writing it to dst slice. If nil is passed, or capacity of dst isn't sufficient, a new slice would be allocated and returned. Tip: It might make sense to reuse the returned slice as dst argument for the next call.
This function is useful in long running iterate/update transactions to avoid a write deadlock. See Github issue: https://github.com/dgraph-io/badger/issues/315
type Iterator ¶
type Iterator struct {
// contains filtered or unexported fields
}
Iterator helps iterating over the KV pairs in a lexicographically sorted order.
func (*Iterator) Close ¶
func (it *Iterator) Close()
Close would close the iterator. It is important to call this when you're done with iteration.
func (*Iterator) Item ¶
Item returns pointer to the current key-value pair. This item is only valid until it.Next() gets called.
func (*Iterator) Next ¶
func (it *Iterator) Next()
Next would advance the iterator by one. Always check it.Valid() after a Next() to ensure you have access to a valid it.Item().
func (*Iterator) Rewind ¶
func (it *Iterator) Rewind()
Rewind would rewind the iterator cursor all the way to zero-th position, which would be the smallest key if iterating forward, and largest if iterating backward. It does not keep track of whether the cursor started with a Seek().
func (*Iterator) Seek ¶
Seek would seek to the provided key if present. If absent, it would seek to the next smallest key greater than provided if iterating in the forward direction. Behavior would be reversed is iterating backwards.
func (*Iterator) ValidForPrefix ¶
ValidForPrefix returns false when iteration is done or when the current key is not prefixed by the specified prefix.
type IteratorOptions ¶
type IteratorOptions struct { // Indicates whether we should prefetch values during iteration and store them. PrefetchValues bool // How many KV pairs to prefetch while iterating. Valid only if PrefetchValues is true. PrefetchSize int Reverse bool // Direction of iteration. False is forward, true is backward. AllVersions bool // Fetch all valid versions of the same key. // contains filtered or unexported fields }
IteratorOptions is used to set options when iterating over Badger key-value stores.
This package provides DefaultIteratorOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.
type ManagedDB ¶ added in v0.9.0
type ManagedDB struct {
*DB
}
ManagedDB allows end users to manage the transactions themselves. Transaction start and commit timestamps are set by end-user.
This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.
WARNING: This is an experimental feature and may be changed significantly in a future release. So please proceed with caution.
func OpenManaged ¶ added in v0.9.0
OpenManaged returns a new ManagedDB, which allows more control over setting transaction timestamps.
This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.
func (*ManagedDB) DropAll ¶ added in v1.5.4
DropAll would drop all the data stored in Badger. It does this in the following way. - Stop accepting new writes. - Pause the compactions. - Pick all tables from all levels, create a changeset to delete all these tables and apply it to manifest. DO not pick up the latest table from level 0, to preserve the (persistent) badgerHead key. - Iterate over the KVs in Level 0, and run deletes on them via transactions. - The deletions are done at the same timestamp as the latest version of the key. Thus, we could write the keys back at the same timestamp as before.
func (*ManagedDB) GetSequence ¶ added in v1.3.0
GetSequence is not supported on ManagedDB. Calling this would result in a panic.
func (*ManagedDB) NewTransaction ¶ added in v0.9.0
NewTransaction overrides DB.NewTransaction() and panics when invoked. Use NewTransactionAt() instead.
func (*ManagedDB) NewTransactionAt ¶ added in v0.9.0
NewTransactionAt follows the same logic as DB.NewTransaction(), but uses the provided read timestamp.
This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.
func (*ManagedDB) SetDiscardTs ¶ added in v1.5.4
SetDiscardTs sets a timestamp at or below which, any invalid or deleted versions can be discarded from the LSM tree, and thence from the value log to reclaim disk space.
type Manifest ¶
type Manifest struct { Levels []levelManifest Tables map[uint64]tableManifest // Contains total number of creation and deletion changes in the manifest -- used to compute // whether it'd be useful to rewrite the manifest. Creations int Deletions int }
Manifest represents the contents of the MANIFEST file in a Badger store.
The MANIFEST file describes the startup state of the db -- all LSM files and what level they're at.
It consists of a sequence of ManifestChangeSet objects. Each of these is treated atomically, and contains a sequence of ManifestChange's (file creations/deletions) which we use to reconstruct the manifest at startup.
func ReplayManifestFile ¶
ReplayManifestFile reads the manifest file and constructs two manifest objects. (We need one immutable copy and one mutable copy of the manifest. Easiest way is to construct two of them.) Also, returns the last offset after a completely read manifest entry -- the file must be truncated at that point before further appends are made (if there is a partial entry after that). In normal conditions, truncOffset is the file size.
type MergeFunc ¶ added in v1.4.0
MergeFunc accepts two byte slices, one representing an existing value, and another representing a new value that needs to be ‘merged’ into it. MergeFunc contains the logic to perform the ‘merge’ and return an updated value. MergeFunc could perform operations like integer addition, list appends etc. Note that the ordering of the operands is unspecified, so the merge func should either be agnostic to ordering or do additional handling if ordering is required.
type MergeOperator ¶ added in v1.4.0
MergeOperator represents a Badger merge operator.
func (*MergeOperator) Add ¶ added in v1.4.0
func (op *MergeOperator) Add(val []byte) error
Add records a value in Badger which will eventually be merged by a background routine into the values that were recorded by previous invocations to Add().
func (*MergeOperator) Get ¶ added in v1.4.0
func (op *MergeOperator) Get() ([]byte, error)
Get returns the latest value for the merge operator, which is derived by applying the merge function to all the values added so far.
If Add has not been called even once, Get will return ErrKeyNotFound.
func (*MergeOperator) Stop ¶ added in v1.4.0
func (op *MergeOperator) Stop()
Stop waits for any pending merge to complete and then stops the background goroutine.
type Options ¶
type Options struct { // 1. Mandatory flags // ------------------- // Directory to store the data in. Should exist and be writable. Dir string // Directory to store the value log in. Can be the same as Dir. Should // exist and be writable. ValueDir string // 2. Frequently modified flags // ----------------------------- // Sync all writes to disk. Setting this to true would slow down data // loading significantly. SyncWrites bool // How should LSM tree be accessed. TableLoadingMode options.FileLoadingMode // How should value log be accessed. ValueLogLoadingMode options.FileLoadingMode // How many versions to keep per key. NumVersionsToKeep int // 3. Flags that user might want to review // ---------------------------------------- // The following affect all levels of LSM tree. MaxTableSize int64 // Each table (or file) is at most this size. LevelSizeMultiplier int // Equals SizeOf(Li+1)/SizeOf(Li). MaxLevels int // Maximum number of levels of compaction. // If value size >= this threshold, only store value offsets in tree. ValueThreshold int // Maximum number of tables to keep in memory, before stalling. NumMemtables int // The following affect how we handle LSM tree L0. // Maximum number of Level 0 tables before we start compacting. NumLevelZeroTables int // If we hit this number of Level 0 tables, we will stall until L0 is // compacted away. NumLevelZeroTablesStall int // Maximum total size for L1. LevelOneSize int64 // Size of single value log file. ValueLogFileSize int64 // Max number of entries a value log file can hold (approximately). A value log file would be // determined by the smaller of its file size and max entries. ValueLogMaxEntries uint32 // Number of compaction workers to run concurrently. NumCompactors int // 4. Flags for testing purposes // ------------------------------ DoNotCompact bool // Stops LSM tree from compactions. // Open the DB as read-only. With this set, multiple processes can // open the same Badger DB. Note: if the DB being opened had crashed // before and has vlog data to be replayed, ReadOnly will cause Open // to fail with an appropriate message. ReadOnly bool // Truncate value log to delete corrupt data, if any. Would not truncate if ReadOnly is set. Truncate bool // contains filtered or unexported fields }
Options are params for creating DB object.
This package provides DefaultOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.
type Sequence ¶ added in v1.3.0
Sequence represents a Badger sequence.
type Txn ¶ added in v0.9.0
type Txn struct {
// contains filtered or unexported fields
}
Txn represents a Badger transaction.
func (*Txn) Commit ¶ added in v0.9.0
Commit commits the transaction, following these steps:
1. If there are no writes, return immediately.
2. Check if read rows were updated since txn started. If so, return ErrConflict.
3. If no conflict, generate a commit timestamp and update written rows' commit ts.
4. Batch up all writes, write them to value log and LSM tree.
5. If callback is provided, Badger will return immediately after checking for conflicts. Writes to the database will happen in the background. If there is a conflict, an error will be returned and the callback will not run. If there are no conflicts, the callback will be called in the background upon successful completion of writes or any error during write.
If error is nil, the transaction is successfully committed. In case of a non-nil error, the LSM tree won't be updated, so there's no need for any rollback.
func (*Txn) CommitAt ¶ added in v0.9.0
CommitAt commits the transaction, following the same logic as Commit(), but at the given commit timestamp. This will panic if not used with ManagedDB.
This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.
func (*Txn) Delete ¶ added in v0.9.0
Delete deletes a key.
This is done by adding a delete marker for the key at commit timestamp. Any reads happening before this timestamp would be unaffected. Any reads after this commit would see the deletion.
The current transaction keeps a reference to the key byte slice argument. Users must not modify the key until the end of the transaction.
func (*Txn) Discard ¶ added in v0.9.0
func (txn *Txn) Discard()
Discard discards a created transaction. This method is very important and must be called. Commit method calls this internally, however, calling this multiple times doesn't cause any issues. So, this can safely be called via a defer right when transaction is created.
NOTE: If any operations are run on a discarded transaction, ErrDiscardedTxn is returned.
func (*Txn) Get ¶ added in v0.9.0
Get looks for key and returns corresponding Item. If key is not found, ErrKeyNotFound is returned.
func (*Txn) NewIterator ¶ added in v0.9.0
func (txn *Txn) NewIterator(opt IteratorOptions) *Iterator
NewIterator returns a new iterator. Depending upon the options, either only keys, or both key-value pairs would be fetched. The keys are returned in lexicographically sorted order. Using prefetch is highly recommended if you're doing a long running iteration. Avoid long running iterations in update transactions.
Example ¶
Output: Counted 1000 elements
func (*Txn) Set ¶ added in v0.9.0
Set adds a key-value pair to the database.
It will return ErrReadOnlyTxn if update flag was set to false when creating the transaction.
The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.
func (*Txn) SetEntry ¶ added in v1.2.0
SetEntry takes an Entry struct and adds the key-value pair in the struct, along with other metadata to the database.
The current transaction keeps a reference to the entry passed in argument. Users must not modify the entry until the end of the transaction.
func (*Txn) SetWithDiscard ¶ added in v1.5.0
SetWithDiscard acts like SetWithMeta, but adds a marker to discard earlier versions of the key.
This method is only useful if you have set a higher limit for options.NumVersionsToKeep. The default setting is 1, in which case, this function doesn't add any more benefit than just calling the normal SetWithMeta (or Set) function. If however, you have a higher setting for NumVersionsToKeep (in Dgraph, we set it to infinity), you can use this method to indicate that all the older versions can be discarded and removed during compactions.
The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.
func (*Txn) SetWithMeta ¶ added in v1.0.0
SetWithMeta adds a key-value pair to the database, along with a metadata byte.
This byte is stored alongside the key, and can be used as an aid to interpret the value or store other contextual bits corresponding to the key-value pair.
The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.
func (*Txn) SetWithTTL ¶ added in v1.0.0
SetWithTTL adds a key-value pair to the database, along with a time-to-live (TTL) setting. A key stored with a TTL would automatically expire after the time has elapsed , and be eligible for garbage collection.
The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.