rsdic

package module
v0.0.0-...-d821cc5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2020 License: MIT Imports: 3 Imported by: 1

README

RSDic

rsdic is a Go package for rank/select dictionary supporting rank/select operations efficiently, and space efficient for both sparse and dense bit arrays. (Such data structures are also called as fully indexable dictionary in CS literatures (FID)).

Conceptually, rsdic represents a bit vector B[0...num), B[i] = 0 or 1, and bits are provided PushBack operation (Thus RSDic supports dynamic addition).

All operations (Bit, Rank, Select, BitAndRank, RunZeros) are supported in O(1) time. For example, rsdic can solve all above operation within 1 micro second for a bit vector of length 10^8 (100M bit).

RSDic combines the idea of on-the-fly decoding of enumrative code, and utilization of rank/select samplings, which is discussed as a future work in the paper [1].

In RSDic, a bit vector is stored in compressed form (Note, we don't need to decode all at operations). To achieve this, a bit vector is divided into small blocks of length 64, and each small block is compressed using enumurative code. For example, a small block contains 10 ones and 54 zeros will be compressed in 38 bits (See enumCode.go for detail). This achieves not only its information theoretic bound, but also achieves more compression if bits are clusterd. rsdic stores information at most 1.3 bit per original bit including its indicies, and compress more if bit vector is "compresible".

This Go version is based on the C++ implementation [2]. But this Go version supports PushBack so that it can support dynamic addition.

Usage

import "github.com/hillbig/rsdic"

rsd := rsdic.New()
rsd.PushBack(true)
rsd.PushBack(false)
rsd.PushBack(true)
rsd.PushBack(true)
// rsd = 1011
fmt.Printf("%d %d %d\n", rsd.Num(), rsd.OneNum(), rsd.ZeroNum()) // 4 3 1

// Bit(pos uint64) returns B[pos]
fmt.Printf("%v\n", rsd.Bit(2)) // true

// Rank(pos uint64, bit bool) returns the number of bit's in B[0...pos)
fmt.Printf("%d %d\n", rsd.Rank(2, false), rsd.Rank(4, true)) // 1 3

// Select(rank uint64, bit bool) returns the position of (rank+1)-th occurence of bit in B.
oneNum := rsd.OneNum()
for i := uint64(0); i < oneNum; i++ {
	fmt.Printf("%d:%d\n", rsd.Select(i, true))
}
// 0:0
// 1:2
// 2:3

rsd.PushBack(false) // You can add anytime

// Use MarshalBinary() and UnmarshalBinary() for serialize/deserialize RSDic.
bytes, err := rsd.MarshalBinary()
newrsd := rsdic.NewRSDic()
err := newrsd.UnmarshalBinary(bytes)

// Enjoy !

Benchmark

  • 1.7 GHz Intel Core i7
  • OS X 10.9.2
  • 8GB 1600 MHz DDR3
  • go version go1.3 darwin/amd64

The results shows that RSDic operations require always (almost) constant time with regard to the length and one's ratio.

go test -bench=.

// RSDic
// A bit vector of length 10^6 with one's ratio = 0.5
// Allocated size: 1.27 bits per original bit.
BenchmarkDenseRSDicBit	20000000	       129 ns/op
BenchmarkDenseRSDicRank	20000000	       127 ns/op
BenchmarkDenseRSDicSelect	 5000000	       315 ns/op

// RSDic
// A bit vector of length 10^8 with one's ratio = 0.5
BenchmarkBit	 5000000	       274 ns/op
BenchmarkDenseRSDicRank	 5000000	       283 ns/op
BenchmarkDenseRSDicSelect	 5000000	       551 ns/op
BenchmarkSparseRSDicBit	10000000	       238 ns/op
BenchmarkSparseRSDicRank	10000000	       279 ns/op
BenchmarkSparseRSDicSelect	 5000000	       593 ns/op

// RSDic
// A bit vector of length 10^6 with one's ratios = 0.01
// Allocated size: 0.32 bits per original bit.
BenchmarkSparseRSDicBit	10000000	       190 ns/op
BenchmarkSparseRSDicRank	10000000	       197 ns/op
BenchmarkSparseRSDicSelect	 5000000	       495 ns/op

// RSDic
// A bit vector of length 10^8 with one's ratios = 0.01
// Allocated size: 0.32 bits per original bit.
BenchmarkSparseRSDicBit	10000000	       247 ns/op
BenchmarkSparseRSDicRank	10000000	       264 ns/op
BenchmarkSparseRSDicSelect	 5000000	       655 ns/op

// []uint8 (for comparison)
// A raw bit vector of length 10^6 with one's ratio = 0.5
// Bit requires O(1) time and Rank, Select require O(n) time
BenchmarkDenseRawBit	2000000000	         1.86 ns/op
BenchmarkDenseRawRank	    5000	    629768 ns/op
BenchmarkDenseRawSelect	     500	   5089415 ns/op

// []uint8 (for comparison)
// A raw bit vector of length 10^8 with one's ratio = 0.5
BenchmarkDenseRawBit	2000000000	         1.82 ns/op
BenchmarkDenseRawRank	      50	  67224404 ns/op
BenchmarkDenseRawSelect	       5	 477187054 ns/op

Documentation

Overview

Package rsdic provides a rank/select dictionary supporting many basic operations in constant time using very small working space (smaller than original).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type RSDic

type RSDic struct {
	// contains filtered or unexported fields
}

RSDic is a Rank/Select struct

func New

func New() *RSDic

New returns RSDic with a bit array of length 0.

func (RSDic) AllocSize

func (rs RSDic) AllocSize() int

AllocSize returns the allocated size in bytes.

func (RSDic) Bit

func (rs RSDic) Bit(pos uint64) bool

Bit returns the (pos+1)-th bit in bits, i.e. bits[pos]

func (RSDic) BitAndRank

func (rs RSDic) BitAndRank(pos uint64) (bool, uint64)

BitAndRank returns the (pos+1)-th bit (=b) and Rank(pos, b) Although this is equivalent to b := Bit(pos), r := Rank(pos, b), BitAndRank is faster.

func (RSDic) MarshalBinary

func (rs RSDic) MarshalBinary() (out []byte, err error)

MarshalBinary encodes the RSDic into a binary form and returns the result.

func (RSDic) Num

func (rs RSDic) Num() uint64

Num returns the number of bits

func (RSDic) OneNum

func (rs RSDic) OneNum() uint64

OneNum returns the number of ones in bits

func (*RSDic) PushBack

func (rs *RSDic) PushBack(bit bool)

PushBack appends the bit to the end of B

func (RSDic) Rank

func (rs RSDic) Rank(pos uint64, bit bool) uint64

Rank returns the number of bit's in B[0...pos)

func (RSDic) Select

func (rs RSDic) Select(rank uint64, bit bool) uint64

Select returns the position of (rank+1)-th occurence of bit in B Select returns num if rank+1 is larger than the possible range. (i.e. Select(oneNum, true) = num, Select(zeroNum, false) = num)

func (RSDic) Select0

func (rs RSDic) Select0(rank uint64) uint64

Select0 is

func (RSDic) Select1

func (rs RSDic) Select1(rank uint64) uint64

Select1 is

func (*RSDic) UnmarshalBinary

func (rs *RSDic) UnmarshalBinary(in []byte) (err error)

UnmarshalBinary decodes the RSDic from a binary from generated MarshalBinary

func (RSDic) ZeroNum

func (rs RSDic) ZeroNum() uint64

ZeroNum returns the number of zeros in bits

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL