mybench

package module
v0.0.0-...-afd2565 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2024 License: MIT Imports: 31 Imported by: 0

README

mybench

mybench is a benchmark authoring library that helps you create your own database benchmark with Golang. The central features of mybench includes:

  • A library approach to database benchmarking
  • Discretized precise rate control: the rate at which the events run is discretized to a relatively low frequency (default: 50hz), as Linux + Golang cannot reliably maintain 100~1000Hz. The number of events run on each iteration is determined by sampling an uniform or Poisson distribution. The rate control is very precise and have been achieved standard deviations of <0.2% of the desired rate.
  • Ability to parallelize a single workload into multiple goroutines, each with its own connection.
  • Ability to run multiple workloads simultaneously with data being logged from all workloads.
  • Uses HDR Histogram to keep track of latency online.
  • Web UI for live monitoring throughput and latency of the current benchmark.
  • A simple interface for implementing the data loader (which creates the tables and seed it with data) and the benchmark driver.
  • A number of built-in data generators, including thread-safe auto incrementing generators.
  • Command line wrapper: A wrapper library to help build command line apps for the benchmark.

Design

For more details, see the design doc. Some of the information in this section may eventually move there.

There are a few important structs defined in this library, and they are:

  • Benchmark: The main "entrypoint" to running a benchmark. This keeps track of multiple Workloads and performs data aggregation across all the Workloads and their BenchmarkWorkers.
  • WorkloadInterface: an interface that is defined by the end-user who want to create a benchmark. Notably, the end-user will implement an Event() function that should be called at a some specified EventRate (concurrently with a number of goroutines).
  • Workload: Responsible for creating and running the workers (goroutines) to call the Event() function of the WorkloadInterface.
  • BenchmarkWorker: Responsible for setting up the Looper and keeping track of the worker-local statistics (such as the event latency/histograms for the local goroutine).
  • Looper: Responsible for discretizing the desired event rate into something that's achievable on Linux. Actually calls the Event() function. It can also perform complex discretization such as Poisson-distribution based event sampling.
  • BenchmarkDataLoader: A data loader helper that helps you easily concurrently load data by specifying only a few options, such as the number of rows and the type of data generator for each columns.
  • BenchmarkApp[T]: A wrapper to help create a command line app for a benchmark.
  • Table: An object that helps you create the database and track a default set of data generators.
Data collection and flow

The benchmark system mainly collects data about the throughput and latency of the Event() function call, which contains custom logic (usually MySQL calls). Since Event() can be called from a large number of BenchmarkWorkers, each BenchmarkWorker collects its own statistics for performance reasons. The data collected by the BenchmarkWorkers are:

  • The count and rate of Event()
  • The latency distribution of Event() as tracked via the HDR Histogram.
  • Unimplemented:
    • How long the worker spent in "saturation" (i.e. Event() is slower than the requested event rate). This is probably an important metric for later.
    • The amount of time spent sleeping (could be useful to debug saturation problem in case the looper is incorrectly implemented).
    • Everything in OuterLoopStat: wakeup latency, event batch size. This is probably less important than the above.

Having all this data in hundreds of independent Goroutines (BenchmarkWorkers) is not particularly useful. The data must be aggregated. This data aggregation is done on the workload level by the Workload, which is then aggregated at the Benchmark level via the data logger. This description may make it sound like the data collection is initiated by the BenchmarkWorkers -- it is not. Instead, every few seconds, the data logger calls the appropriate functions to aggregate data. During data collection, a lock taken for each BenchmarkWorker, which allows for the safe reading of data. This is fine as each BenchmarkWorker has its own mutex and there's never a lot of contention. If this becomes a problem, lockless programming may be a better approach.

Run a benchmark

  • Shopify orders benchmark: make examplebench && build/examplebench -host mysql-1 -user sys.admin_rw -pass hunter2 -bench -eventrate 3000
    • Change the host
    • Change the event rate. The command above specifies 3000 events/s.
  • Go to https://localhost:8005 to see the monitoring web UI.

Write your own benchmark

See benchmarks for examples and read the docs.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var VersionString = "1.0"

Functions

func QuestionMarksStringList

func QuestionMarksStringList(n int) string

func Run

func Run(benchmarkInterface BenchmarkInterface) error

Runs a custom defined benchmark that implements the BenchmarkInterface.

Types

type AbstractWorkload

type AbstractWorkload interface {
	Run(context.Context, *sync.WaitGroup, time.Time)
	Config() WorkloadConfig

	// We need to set the RateControlConfig on the Workload object, because the
	// data logger needs to know the concurrency/event rate of the workload.
	// See comments in Benchmark.Start for more details.
	FinishInitialization(DatabaseConfig, RateControlConfig)

	// The DataLogger need to iterate through all the OnlineHistograms for each
	// worker so it can perform the double buffer swap. Since the DataLogger only
	// have access to a map of AbstractWorkload, it doesn't have access to the
	// underlying BenchmarkWorker array (which are templated). So this
	// AbstractWorkload needs to provide a way to iterate through all the
	// OnlineHistograms. The naive way to implement this would be by returning a
	// slice of OnlineHistograms. This is not a good approach, because it would
	// allocate memory, which can take unbounded time. Since the code that
	// iterates through all the online histograms are in a "critical" section that
	// takes as small amount of a time as possible, this is not acceptable. Thus,
	// basically create an iterator. Since Golang doesn't have a built-in iterator
	// pattern, the functional pattern is chosen.
	// TODO: check that the functional pattern doesn't introduce unnecessary
	// overhead/memory allocations.
	ForEachOnlineHistogram(func(int, *OnlineHistogram))

	// The DataLogger needs the rate control config to make allocations and record
	// desired event rates. See comments in Benchmark.Start for more details.
	RateControlConfig() RateControlConfig
}

We want the workload to be templated so the context data can be transparently passed from the workload to the Event() function without going through runtime type selection.

However, we don't want to store a templated workload in the Benchmark object, as it would start to infect every other struct (like DataLogger). Further, since the Workload is stored in the Benchmark in a map, if it is templated, you cannot have different ContextDataT on a per workload basis if you wanted it.

Is this kind of a hack? Maybe. I haven't decided yet.

type AutoIncrementGenerator

type AutoIncrementGenerator struct {
	// contains filtered or unexported fields
}

Atomically generate an auto incrementing value from the client-side.

Sample from existing with sample uniformly between the min value to the current value. There is no guarantee that it will land on an existing value if values have been deleted.

TODO: track deletion, but this is problematic too, because golang doesn't offer a concurrent-write map.

func NewAutoIncrementGenerator

func NewAutoIncrementGenerator(min, current int64) *AutoIncrementGenerator

func NewAutoIncrementGeneratorFromDatabase

func NewAutoIncrementGeneratorFromDatabase(databaseConfig DatabaseConfig, table, column string) (*AutoIncrementGenerator, error)

func (*AutoIncrementGenerator) Current

func (g *AutoIncrementGenerator) Current() int64

Get the current value without generating a new value.

func (*AutoIncrementGenerator) Generate

func (g *AutoIncrementGenerator) Generate(r *Rand) interface{}

func (*AutoIncrementGenerator) GenerateTyped

func (g *AutoIncrementGenerator) GenerateTyped(r *Rand) int64

func (*AutoIncrementGenerator) Min

func (g *AutoIncrementGenerator) Min() int64

func (*AutoIncrementGenerator) SampleFromExisting

func (g *AutoIncrementGenerator) SampleFromExisting(r *Rand) interface{}

func (*AutoIncrementGenerator) SampleFromExistingTyped

func (g *AutoIncrementGenerator) SampleFromExistingTyped(r *Rand) int64

type Benchmark

type Benchmark struct {
	BenchmarkConfig

	Name        string
	LogInterval time.Duration // TODO: make these two values configurable (move into BenchmarkConfig maybe)
	LogRingSize int
	// contains filtered or unexported fields
}

func NewBenchmark

func NewBenchmark(benchmarkName string, benchmarkConfig BenchmarkConfig) (*Benchmark, error)

func (*Benchmark) AddWorkload

func (b *Benchmark) AddWorkload(workload AbstractWorkload)

func (*Benchmark) DataSnapshots

func (b *Benchmark) DataSnapshots() []*DataSnapshot

func (*Benchmark) Start

func (b *Benchmark) Start()

func (*Benchmark) StopAndWait

func (b *Benchmark) StopAndWait()

type BenchmarkConfig

type BenchmarkConfig struct {
	Bench    bool
	Load     bool
	Duration time.Duration
	LogFile  string
	LogTable string
	Note     string

	DatabaseConfig DatabaseConfig

	RateControlConfig RateControlConfig

	HttpPort int
}

func NewBenchmarkConfig

func NewBenchmarkConfig() *BenchmarkConfig

func (*BenchmarkConfig) Config

func (c *BenchmarkConfig) Config() BenchmarkConfig

func (*BenchmarkConfig) ValidateAndSetDefaults

func (c *BenchmarkConfig) ValidateAndSetDefaults() error

type BenchmarkInterface

type BenchmarkInterface interface {
	// Returns the name of the benchmark
	Name() string

	// Returns a list of constructed workloads (or an error) for this benchmark.
	Workloads() ([]AbstractWorkload, error)

	// This function is called if the benchmark ran with the --load flag. This
	// should load the database when called.
	RunLoader() error

	// Returns the benchmark app configuration. The configuration returned must
	// have already been validated with defaults filled in. Make sure to call
	// .Validate() on it during the construction of the object that implements
	// this interface.
	//
	// If you implement BenchmarkInterface with BenchmarkConfig embedded in it,
	// you won't need to define this method as the BenchmarkConfig object already
	// defines this method for you, and embedding it in your struct will cause
	// your struct to inherit the method implemented on BenchmarkConfig.
	Config() BenchmarkConfig
}

This is the interface that the benchmark application needs to implement

type BenchmarkWorker

type BenchmarkWorker[ContextDataT any] struct {
	// contains filtered or unexported fields
}

A single goroutine worker that loops and benchmarks MySQL

func NewBenchmarkWorker

func NewBenchmarkWorker[ContextDataT any](workloadIface WorkloadInterface[ContextDataT], databaseConfig DatabaseConfig, rateControlConfig RateControlConfig) (*BenchmarkWorker[ContextDataT], error)

func (*BenchmarkWorker[ContextDataT]) Run

func (b *BenchmarkWorker[ContextDataT]) Run(ctx context.Context, workerInitializationWg *sync.WaitGroup, startTime time.Time) error

type Column

type Column struct {
	// Name of the column
	Name string

	// SQL definition of the column
	Definition string

	// The data generator for the data of this column.
	Generator DataGenerator
}

type Connection

type Connection struct {
	*client.Conn
	// contains filtered or unexported fields
}

A thin wrapper around https://pkg.go.dev/github.com/go-mysql-org/go-mysql/client#Conn for now. It is possible in the future to extend this to support databases other than MySQL.

This should only be initialized via DatabaseConfig.Connection().

func (*Connection) Close

func (c *Connection) Close() error

func (*Connection) GetRoundRobinConnection

func (c *Connection) GetRoundRobinConnection() *client.Conn

type DataGenerator

type DataGenerator interface {
	Generate(*Rand) interface{}
	SampleFromExisting(*Rand) interface{}
}

An interface for the data generator.

There are two ways to generate data:

1. Generate a new value to be inserted into the database. This is generated via the Generate call. 2. Generate an "existing" value to be used in the WHERE clause of a SQL statement. This is generated via the SampleFromExisting call. Note, most generators cannot guarantee that an existing value is generated, as it would be probihitively expensive to keep track of all the existing data. Consult the documentation of the specific generators for details.

type DataLogger

type DataLogger struct {
	Interval       time.Duration
	RingSize       int
	OutputFilename string
	TableName      string
	Note           string
	Benchmark      *Benchmark
	// contains filtered or unexported fields
}

func NewDataLogger

func NewDataLogger(dataLogger *DataLogger) (*DataLogger, error)

func (*DataLogger) DataSnapshots

func (d *DataLogger) DataSnapshots() []*DataSnapshot

func (*DataLogger) Run

func (d *DataLogger) Run(ctx context.Context, startTime time.Time)

type DataSnapshot

type DataSnapshot struct {
	// Time since start of the test.
	Time float64

	// The throughput and latency data for all monitored benchmarks merged
	// together.
	AllWorkloadData WorkloadDataSnapshot

	// The throughput and latency data for individual monitored benchmarks,
	// indexed by the workload name.
	PerWorkloadData map[string]WorkloadDataSnapshot
}

type DatabaseConfig

type DatabaseConfig struct {
	// TODO: Unix socket
	Host     string
	Port     int
	User     string
	Pass     string
	Database string

	// If this is set, a connection will not be established. This is useful for
	// non-database-related tests such as selfbench.
	// TODO: this is kind of a hack...
	NoConnection bool

	// The number of underlying connections per Connection object, implemented as a
	// static pool from which connections are fetched in a round-robin sequence with
	// each successive request. The sole purpose of this feature is to multiply the
	// number of open connections to the database to assess any performance impact
	// specific to the overall number of open database connections.
	//
	// Note: this feature does not work on all benchmarks at this moment. It only
	// works with benchmarks that uses the mybench.Connection.GetRoundRobinConnection
	// function to get their connections and execute queries.
	ConnectionMultiplier int

	// Enable CLIENT_MULTI_STATEMENTS options on the client
	ClientMultiStatements bool
}

The database config object that can be turned into a single connection (without connection pooling).

func (DatabaseConfig) Connection

func (cfg DatabaseConfig) Connection() (*Connection, error)

Returns a connection object based on the database configuration

func (DatabaseConfig) CreateDatabaseIfNeeded

func (cfg DatabaseConfig) CreateDatabaseIfNeeded() error

Creates a new database if it doesn't exist

type DatetimeInterval

type DatetimeInterval struct {
	Start time.Time
	End   time.Time
}

type DiscretizedLooper

type DiscretizedLooper struct {
	EventRate       float64
	OuterLoopRate   float64
	LooperType      LooperType
	DebugIdentifier string

	// The context passed is a trace context from runtime/trace package
	Event          func(context.Context) error
	TraceEvent     func(EventStat)
	TraceOuterLoop func(OuterLoopStat)
	// contains filtered or unexported fields
}

func (*DiscretizedLooper) Run

func (l *DiscretizedLooper) Run(ctx context.Context) error

type EnumGenerator

type EnumGenerator[T any] struct {
	// contains filtered or unexported fields
}

Generates values from a discrete set of possible values.

Sample from existing is the exact same as generation, which means it is possible to generate values not in the database but available in the set of values.

func NewEnumGenerator

func NewEnumGenerator[T any](values []T) *EnumGenerator[T]

func (*EnumGenerator[T]) Generate

func (g *EnumGenerator[T]) Generate(r *Rand) interface{}

func (*EnumGenerator[T]) GenerateTyped

func (g *EnumGenerator[T]) GenerateTyped(r *Rand) T

func (*EnumGenerator[T]) SampleFromExisting

func (g *EnumGenerator[T]) SampleFromExisting(r *Rand) interface{}

func (*EnumGenerator[T]) SampleFromExistingTyped

func (g *EnumGenerator[T]) SampleFromExistingTyped(r *Rand) T

type EventStat

type EventStat struct {
	TimeTaken time.Duration
}

type ExtendedHdrHistogram

type ExtendedHdrHistogram struct {
	// contains filtered or unexported fields
}

This extends the HDR histogram so it can track: - Start time - Under and overflow counts

func NewExtendedHdrHistogram

func NewExtendedHdrHistogram(startTime time.Time) *ExtendedHdrHistogram

func (*ExtendedHdrHistogram) IntervalData

func (h *ExtendedHdrHistogram) IntervalData(endTime time.Time, histMin, histMax, histSize int64) IntervalData

func (*ExtendedHdrHistogram) Merge

func (h *ExtendedHdrHistogram) Merge(other *ExtendedHdrHistogram)

Should only be called from the data logger, after it is copied away from the double buffer. Maybe some of these "read" methods should be defined on a different type, so it can never be called on an object that could be in the double buffer.

func (*ExtendedHdrHistogram) RecordValue

func (h *ExtendedHdrHistogram) RecordValue(v int64)

func (*ExtendedHdrHistogram) ResetDataOnly

func (h *ExtendedHdrHistogram) ResetDataOnly()

func (*ExtendedHdrHistogram) ResetStartTime

func (h *ExtendedHdrHistogram) ResetStartTime(startTime time.Time)

type HistogramCardinalityStringGenerator

type HistogramCardinalityStringGenerator struct {
	// contains filtered or unexported fields
}

Generates a fixed number of unique strings with uniform distribution. For example, if cardinality is 10, then this generator will generate 10 distinct string values. The frequency of the strings are uniform.

Sample from existing is the same as generate, which means it may not sample an existing value.

func NewHistogramCardinalityStringGenerator

func NewHistogramCardinalityStringGenerator(binsEndPoints, frequency []float64, length int) *HistogramCardinalityStringGenerator

See NewHistogramDistribution for documentation the arguments for this function. Note each integer generated by the histogram will be mapped to a string. To specify make sure integers such as 1, 2, 3, 4 are generated, the binsEndPoints must be 0.5, 1.5, 2.5, 3.5, 4.5.

func (*HistogramCardinalityStringGenerator) Generate

func (g *HistogramCardinalityStringGenerator) Generate(r *Rand) interface{}

func (*HistogramCardinalityStringGenerator) GenerateTyped

func (g *HistogramCardinalityStringGenerator) GenerateTyped(r *Rand) string

func (*HistogramCardinalityStringGenerator) SampleFromExisting

func (g *HistogramCardinalityStringGenerator) SampleFromExisting(r *Rand) interface{}

func (*HistogramCardinalityStringGenerator) SampleFromExistingTyped

func (g *HistogramCardinalityStringGenerator) SampleFromExistingTyped(r *Rand) string

type HistogramDistribution

type HistogramDistribution struct {
	// contains filtered or unexported fields
}

This generates float64 values based on a discrete probability distribution (represented via a histogram) via the inverse transform sampling algorithm (https://en.wikipedia.org/wiki/Inverse_transform_sampling). Specifically, the steps followed are:

  1. Normalize the frequency values of the histogram to values of between 0 and 1.
  2. Compute the cumulative distribution for the normalized histogram such that its output value is also between 0 and 1. So we have the function cdf(bin_value) -> [0, 1].
  3. Generate a random value, x, between 0 and 1. This value represents a sampled number from the cdf function output. If we compute the inverse function cdf^-1(x) -> bin_value, we will obtain a randomly sampled bin value that will be randomly sampled according to the frequency specified in the histogram.
  4. The inverse function cdf^-1(x) is calculated via linear interpolation.

Note that the ExistingValue for this distribution is the same as NextValue and thus has no memory of past generated values.

func NewHistogramDistribution

func NewHistogramDistribution(binsEndPoints, frequency []float64) HistogramDistribution

Creates a histogram distribution which is used by Rand to generate random numbers.

The value at frequency[i] correspond to the bin starting at [bins[i], bins[i+1]). Thus, len(bins) == len(frequency) + 1.

Also, the value in bins must be sorted.

type HistogramFloatGenerator

type HistogramFloatGenerator struct {
	// contains filtered or unexported fields
}

Generates floating point values according to a histogram distribution.

Sample from existing does not track of values already generated but samples from the same distribution as the Generate. This means it is possible to generate values that doesn't exist in the database.

func NewHistogramFloatGenerator

func NewHistogramFloatGenerator(binsEndPoints, frequency []float64) *HistogramFloatGenerator

See NewHistogramDistribution for documentation the arguments for this function.

func (*HistogramFloatGenerator) Generate

func (g *HistogramFloatGenerator) Generate(r *Rand) interface{}

func (*HistogramFloatGenerator) GenerateTyped

func (g *HistogramFloatGenerator) GenerateTyped(r *Rand) float64

func (*HistogramFloatGenerator) SampleFromExisting

func (g *HistogramFloatGenerator) SampleFromExisting(r *Rand) interface{}

func (*HistogramFloatGenerator) SampleFromExistingTyped

func (g *HistogramFloatGenerator) SampleFromExistingTyped(r *Rand) float64

type HistogramIntGenerator

type HistogramIntGenerator struct {
	// contains filtered or unexported fields
}

Generates integers according to a histogram distribution. One possible use case of this is when you want to distribute a foreign key/id with a particular distribution. For example, a `posts` table can have many posts, with 50% of the rows having one `user_id`, and then 25% of the rows with another `user_id`.

Sample from existing does not track of values already generated but samples from the same distribution as the Generate. This means it is possible to generate values that doesn't exist in the database.

func NewHistogramIntGenerator

func NewHistogramIntGenerator(binsEndPoints, frequency []float64) *HistogramIntGenerator

See NewHistogramDistribution for documentation the arguments for this function. Note each integer generated by the histogram will be mapped to a string. To specify make sure integers such as 1, 2, 3, 4 are generated, the binsEndPoints must be 0.5, 1.5, 2.5, 3.5, 4.5.

func (*HistogramIntGenerator) Generate

func (g *HistogramIntGenerator) Generate(r *Rand) interface{}

func (*HistogramIntGenerator) GenerateTyped

func (g *HistogramIntGenerator) GenerateTyped(r *Rand) int64

func (*HistogramIntGenerator) SampleFromExisting

func (g *HistogramIntGenerator) SampleFromExisting(r *Rand) interface{}

func (*HistogramIntGenerator) SampleFromExistingTyped

func (g *HistogramIntGenerator) SampleFromExistingTyped(r *Rand) int64

type HistogramLengthStringGenerator

type HistogramLengthStringGenerator struct {
	// contains filtered or unexported fields
}

Generates a random string with length selected by a histogram distribution.

Sample from existing is the same as generate and does not keep track of existing values. Since there are a very large amount of possible strings being generated, there is almost no chance that an existing value will be generated. It is best not to use that method and expect good results.

func NewHistogramLengthStringGenerator

func NewHistogramLengthStringGenerator(binsEndPoints, frequency []float64) *HistogramLengthStringGenerator

See NewHistogramDistribution for documentation the arguments for this function. Note each integer generated by the histogram will be mapped to a string. To specify make sure integers such as 1, 2, 3, 4 are generated, the binsEndPoints must be 0.5, 1.5, 2.5, 3.5, 4.5.

func (*HistogramLengthStringGenerator) Generate

func (g *HistogramLengthStringGenerator) Generate(r *Rand) interface{}

func (*HistogramLengthStringGenerator) GenerateTyped

func (g *HistogramLengthStringGenerator) GenerateTyped(r *Rand) string

func (*HistogramLengthStringGenerator) SampleFromExisting

func (g *HistogramLengthStringGenerator) SampleFromExisting(r *Rand) interface{}

func (*HistogramLengthStringGenerator) SampleFromExistingTyped

func (g *HistogramLengthStringGenerator) SampleFromExistingTyped(r *Rand) string

type HttpServer

type HttpServer struct {
	// contains filtered or unexported fields
}

func NewHttpServer

func NewHttpServer(benchmark *Benchmark, note string, port int) *HttpServer

func (*HttpServer) Run

func (h *HttpServer) Run()

type IntervalData

type IntervalData struct {
	StartTime time.Time
	EndTime   time.Time
	Count     int64
	Delta     float64
	Rate      float64

	// All data in microseconds
	Min          int64
	Mean         float64
	Max          int64
	Percentile25 int64
	Percentile50 int64
	Percentile75 int64
	Percentile90 int64
	Percentile99 int64

	UnderflowCount int64
	OverflowCount  int64

	UniformHist *UniformHistogram
}

type JSONGenerator

type JSONGenerator struct {
	// contains filtered or unexported fields
}

Generates the same JSON document every time. This is based on map[string]string.

func NewJSONGenerator

func NewJSONGenerator(objLength, valueLength int) *JSONGenerator

func (*JSONGenerator) Generate

func (g *JSONGenerator) Generate(r *Rand) interface{}

func (*JSONGenerator) GenerateTyped

func (g *JSONGenerator) GenerateTyped(r *Rand) string

func (*JSONGenerator) SampleFromExisting

func (g *JSONGenerator) SampleFromExisting(r *Rand) interface{}

func (*JSONGenerator) SampleFromExistingTyped

func (g *JSONGenerator) SampleFromExistingTyped(r *Rand) string

type LockedDoubleBuffer

type LockedDoubleBuffer[T any] struct {
	// contains filtered or unexported fields
}

This is a double buffer implemented using a lock. The target usage is as follows:

  1. Single consumer single producer.
  2. The producer goroutine writes data frequently.
  3. The consumer goroutine reads data infrequently.
  4. The consumer goroutine will first swap the buffer. It gets the non-active data during the swap. After the swap, it will read the data and then reset the data to 0, so it can be swapped again.
  5. We can never swap before the non-active data is reset.
  6. While the producer goroutine is writing to the data, the swap is not allowed to occur.

func NewLockedDoubleBuffer

func NewLockedDoubleBuffer[T any](newT func() T) *LockedDoubleBuffer[T]

func (*LockedDoubleBuffer[T]) SafeActiveWrite

func (b *LockedDoubleBuffer[T]) SafeActiveWrite(f func(T))

Since we need to prevent swapping from happening while writing, we are using a lock. Lock-less could work as well but will be more complex and may not be necessary.

func (*LockedDoubleBuffer[T]) Swap

func (b *LockedDoubleBuffer[T]) Swap(preSwapCallback func(nonActiveData T)) T

Swap the active and non-active data with a lock. Returns the non-active data.

type LooperType

type LooperType int
const (
	LooperTypeUniform LooperType = iota
	LooperTypePoisson
)

type NoContextData

type NoContextData struct{}

This is a convenience type defined to indicate that there's no context data. If you don't need a custom context data for your workload, use this.

Further, the workload should embed this struct so the NewContextData function does not need to be defined

func (NoContextData) NewContextData

func (NoContextData) NewContextData(*Connection) (NoContextData, error)

Any struct that embeds NoContextData will inherit this method. If this struct is also trying to implement WorkloadInterface, then it does not have to define this method manually.

type NormalFloatGenerator

type NormalFloatGenerator struct {
	// contains filtered or unexported fields
}

Generates a floating point number with a given normal distribution.

Sample from existing is the same as generating a number, which means it is not guaranteed to land on an existing value.

func NewNormalFloatGenerator

func NewNormalFloatGenerator(mean, stddev float64) *NormalFloatGenerator

func (*NormalFloatGenerator) Generate

func (g *NormalFloatGenerator) Generate(r *Rand) interface{}

func (*NormalFloatGenerator) GenerateTyped

func (g *NormalFloatGenerator) GenerateTyped(r *Rand) float64

func (*NormalFloatGenerator) SampleFromExisting

func (g *NormalFloatGenerator) SampleFromExisting(r *Rand) interface{}

func (*NormalFloatGenerator) SampleFromExistingTyped

func (g *NormalFloatGenerator) SampleFromExistingTyped(r *Rand) float64

type NormalIntGenerator

type NormalIntGenerator struct {
	// contains filtered or unexported fields
}

Generates a random integer value according to a normal distribution.

Sample from existing is the same as generation.

func NewNormalIntGenerator

func NewNormalIntGenerator(mean, stddev int64) *NormalIntGenerator

func (*NormalIntGenerator) Generate

func (g *NormalIntGenerator) Generate(r *Rand) interface{}

func (*NormalIntGenerator) GenerateTyped

func (g *NormalIntGenerator) GenerateTyped(r *Rand) int64

func (*NormalIntGenerator) SampleFromExisting

func (g *NormalIntGenerator) SampleFromExisting(r *Rand) interface{}

func (*NormalIntGenerator) SampleFromExistingTyped

func (g *NormalIntGenerator) SampleFromExistingTyped(r *Rand) int64

type NullGenerator

type NullGenerator struct{}

A boring generator that only generates only null values.

func NewNullGenerator

func NewNullGenerator() NullGenerator

func (NullGenerator) Generate

func (NullGenerator) Generate(*Rand) interface{}

func (NullGenerator) SampleFromExisting

func (NullGenerator) SampleFromExisting(*Rand) interface{}

type OnlineHistogram

type OnlineHistogram struct {
	*LockedDoubleBuffer[*ExtendedHdrHistogram]
}

func NewOnlineHistogram

func NewOnlineHistogram(startTime time.Time) *OnlineHistogram

func (*OnlineHistogram) RecordValue

func (h *OnlineHistogram) RecordValue(v int64)

func (*OnlineHistogram) Swap

func (h *OnlineHistogram) Swap(preSwapCallback func(nonActiveData *ExtendedHdrHistogram)) *ExtendedHdrHistogram

type OuterLoopStat

type OuterLoopStat struct {
	// The desired wake up time.
	DesiredWakeupTime time.Time

	// The start time of the outer loop iteration and the actual wake up time.
	ActualWakeupTime time.Time

	// Number of Event() calls to make in this outer loop iteration.
	EventBatchSize int64

	// The end time of the outer loop iteration, but doesn't actually include the
	// time taken to calculate the next wakeup time, as that code is assumed to be
	// very fast and negligible.
	EventsEnd time.Time

	// The total time taken to process all Event() calls, including the TraceEvent calls.
	EventsLatency time.Duration

	// The next desired wakeup time
	NextDesiredWakeupTime time.Time

	// The next expected event's activation time. If this timestamp is really
	// behind the actual wakeup time, then the system is very backlogged.
	NextExpectedEventTime time.Time

	// The cumulative number of events executed.
	CumulativeNumberOfEvents int64
}

type Rand

type Rand struct {
	*rand.Rand
}

func NewRand

func NewRand() *Rand

Creates a new Rand object

func (*Rand) HistFloat

func (r *Rand) HistFloat(hist HistogramDistribution) float64

func (*Rand) HistInt

func (r *Rand) HistInt(hist HistogramDistribution) int64

func (*Rand) NormalFloat

func (r *Rand) NormalFloat(mean, stddev float64) float64

func (*Rand) NormalInt

func (r *Rand) NormalInt(mean, stddev int64) int64

func (*Rand) UniformFloat

func (r *Rand) UniformFloat(min, max float64) float64

func (*Rand) UniformInt

func (r *Rand) UniformInt(min, max int64) int64

type RateControlConfig

type RateControlConfig struct {
	// The total rate at which Event is called in hz, across all goroutines.
	EventRate float64

	// Number of goroutines to drive the EventRate. Each goroutine will get the
	// same portion of the EventRate. This needs to be increased if a single
	// goroutine cannot drive the EventRate.  If not specified, it will be
	// calculated from EventRate and MaxEventRatePerWorker.
	Concurrency int

	// The maximum event rate per goroutine, used to calculate Concurrency.
	// If not specified, it will default to 100.
	MaxEventRatePerWorker float64

	// The desired rate of the outer loop that batches events. Default: 50.
	OuterLoopRate float64

	// The type of looper used. Default to Uniform looper.
	LooperType LooperType
}

type Ring

type Ring[T any] struct {
	// contains filtered or unexported fields
}

A terrible implementation of a ring, based on the Golang ring which is not thread-safe nor offers a nice API.

I can't believe there are no simple ring buffer data structure in Golang, with generics.

func NewRing

func NewRing[T any](capacity int) *Ring[T]

func (*Ring[T]) Push

func (r *Ring[T]) Push(data T)

func (*Ring[T]) ReadAllOrdered

func (r *Ring[T]) ReadAllOrdered() []T

type StatusData

type StatusData struct {
	CurrentTime   float64
	Note          string
	Workloads     []string
	DataSnapshots []*DataSnapshot
}

type Table

type Table struct {
	// The name of the table
	Name string

	// The list of columns. The order of the columns in the table follows the
	// order of this slice.
	Columns []*Column

	// The columns for the primary key for the table. This is a slice as it
	// supports a composite primary key.
	PrimaryKey []string

	// A list of columns for indices.
	Indices [][]string

	// A list of columns for unique indices.
	UniqueKeys [][]string

	// Additional table options appended to the end of the CREATE TABLE
	// statements, such as compression settings, auto increment settings, and so
	// on.
	TableOptions string
	// contains filtered or unexported fields
}

This struct provides helpers for creating and seeding a table.

func InitializeTable

func InitializeTable(t Table) Table

func (Table) CreateTableQuery

func (t Table) CreateTableQuery() string

func (Table) DropTableQuery

func (t Table) DropTableQuery() string

func (Table) Generate

func (t Table) Generate(r *Rand, column string) interface{}

func (Table) InsertQuery

func (t Table) InsertQuery(r *Rand, batchSize int, valueOverride map[string]interface{}) (string, []interface{})

func (Table) InsertQueryList

func (t Table) InsertQueryList(r *Rand, valueOverrides []map[string]interface{}) (string, []interface{})

func (Table) ReloadData

func (t Table) ReloadData(databaseConfig DatabaseConfig, totalrows int64, batchSize int64, concurrency int)

Drop and recreate the table with data seeded via the data generators.

totalrows specifies the total number of rows to insert into the new table. batchSize controls how many rows to insert in one INSERT statement (200 is usually a good starting point), concurrency is the number of goroutines used to insert the data.

If concurrency is 0, it is set by default to 16. This allows the loader to reuse the -concurrency flag (which is default 0).

func (Table) SampleFromExisting

func (t Table) SampleFromExisting(r *Rand, column string) interface{}

type UniformCardinalityStringGenerator

type UniformCardinalityStringGenerator struct {
	// contains filtered or unexported fields
}

Generates a fixed number of unique strings with uniform distribution. For example, if cardinality is 10, then this generator will generate 10 distinct string values. The frequency of the strings are uniform.

Sample from existing is the same as generate, which means it may not sample an existing value.

func NewUniformCardinalityStringGenerator

func NewUniformCardinalityStringGenerator(cardinality, length int) *UniformCardinalityStringGenerator

func (*UniformCardinalityStringGenerator) Generate

func (g *UniformCardinalityStringGenerator) Generate(r *Rand) interface{}

func (*UniformCardinalityStringGenerator) GenerateTyped

func (g *UniformCardinalityStringGenerator) GenerateTyped(r *Rand) string

func (*UniformCardinalityStringGenerator) SampleFromExisting

func (g *UniformCardinalityStringGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformCardinalityStringGenerator) SampleFromExistingTyped

func (g *UniformCardinalityStringGenerator) SampleFromExistingTyped(r *Rand) string

type UniformDatetimeGenerator

type UniformDatetimeGenerator struct {
	// contains filtered or unexported fields
}

Generates a date time value in two modes:

  1. GenerateNow == true will cause Generate to return time.Now.
  2. GenerateNow == false will cause Generate to generate a random time between the intervals specified in Intervals with uniform probability distribution.

SampleFromExisting always will sample from the Intervals. However, if GenerateNow == true, then it will also sample between an extra interval between when Generate() is first called and the moment the SampleFromExisting call is made.

Generate and SampleFromExisting will return a string of the time formatted with YYYY-MM-DD hh:mm:ss, which is what SQL expects. GenerateTyped and SampleFromExistingTyped will return time.Time.

func NewNowGenerator

func NewNowGenerator() *UniformDatetimeGenerator

func NewUniformDatetimeGenerator

func NewUniformDatetimeGenerator(intervals []DatetimeInterval, generateNow bool) *UniformDatetimeGenerator

func (*UniformDatetimeGenerator) Generate

func (g *UniformDatetimeGenerator) Generate(r *Rand) interface{}

func (*UniformDatetimeGenerator) GenerateTyped

func (g *UniformDatetimeGenerator) GenerateTyped(r *Rand) time.Time

func (*UniformDatetimeGenerator) SampleFromExisting

func (g *UniformDatetimeGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformDatetimeGenerator) SampleFromExistingTyped

func (g *UniformDatetimeGenerator) SampleFromExistingTyped(r *Rand) time.Time

type UniformDecimalGenerator

type UniformDecimalGenerator struct {
	// contains filtered or unexported fields
}

TODO: can this be folded into the UniformFloatGenerator? Generates an random decimal value

Sampling from existing is the same as the generation, which mean it is not guaranteed to generate an existing value if the number of rows in the database is small or the decimal has a large precision

func NewUniformDecimalGenerator

func NewUniformDecimalGenerator(precision, scale int) *UniformDecimalGenerator

func (*UniformDecimalGenerator) Generate

func (g *UniformDecimalGenerator) Generate(r *Rand) interface{}

func (*UniformDecimalGenerator) GenerateTyped

func (g *UniformDecimalGenerator) GenerateTyped(r *Rand) string

func (*UniformDecimalGenerator) SampleFromExisting

func (g *UniformDecimalGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformDecimalGenerator) SampleFromExistingTyped

func (g *UniformDecimalGenerator) SampleFromExistingTyped(r *Rand) string

type UniformFloatGenerator

type UniformFloatGenerator struct {
	// contains filtered or unexported fields
}

Generates a random floating point value according to an uniform distribution between min (inclusive) and max (exclusive).

Sampling from existing is the same as the generation, since there are a large number of floating point values, it is unlikely to generate an exact value that has been used before. However, the generated value may still be useful in WHERE clauses that uses the greater than or less than operators.

func NewUniformFloatGenerator

func NewUniformFloatGenerator(min, max float64) *UniformFloatGenerator

func (*UniformFloatGenerator) Generate

func (g *UniformFloatGenerator) Generate(r *Rand) interface{}

func (*UniformFloatGenerator) GenerateTyped

func (g *UniformFloatGenerator) GenerateTyped(r *Rand) float64

func (*UniformFloatGenerator) SampleFromExisting

func (g *UniformFloatGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformFloatGenerator) SampleFromExistingTyped

func (g *UniformFloatGenerator) SampleFromExistingTyped(r *Rand) float64

type UniformHistogram

type UniformHistogram struct {
	Buckets []hdrhistogram.Bar
	// contains filtered or unexported fields
}

func NewUniformHistogram

func NewUniformHistogram(histMin, histMax, histSize int64) *UniformHistogram

func (*UniformHistogram) RecordValues

func (h *UniformHistogram) RecordValues(v int64, count int64)

type UniformIntGenerator

type UniformIntGenerator struct {
	// contains filtered or unexported fields
}

Generates an integer value in the inclusive range between min (inclusive) and max (exclusive) with an uniform distribution.

Sampling from existing is the same as the generation, which mean it is not guaranteed to generate an existing value if the number of rows in the database is small.

func NewUniformIntGenerator

func NewUniformIntGenerator(min, max int64) *UniformIntGenerator

func (*UniformIntGenerator) Generate

func (g *UniformIntGenerator) Generate(r *Rand) interface{}

func (*UniformIntGenerator) GenerateTyped

func (g *UniformIntGenerator) GenerateTyped(r *Rand) int64

func (*UniformIntGenerator) SampleFromExisting

func (g *UniformIntGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformIntGenerator) SampleFromExistingTyped

func (g *UniformIntGenerator) SampleFromExistingTyped(r *Rand) int64

type UniformLengthStringGenerator

type UniformLengthStringGenerator struct {
	// contains filtered or unexported fields
}

Generates a random string with length selected between the min and max specified with uniform probability.

Sample from existing is the same as generate and does not keep track of existing values. Since there are a very large amount of possible strings being generated, there is almost no chance that an existing value will be generated. It is best not to use that method and expect good results.

func NewUniformLengthStringGenerator

func NewUniformLengthStringGenerator(minLength, maxLength int) *UniformLengthStringGenerator

func (*UniformLengthStringGenerator) Generate

func (g *UniformLengthStringGenerator) Generate(r *Rand) interface{}

func (*UniformLengthStringGenerator) GenerateTyped

func (g *UniformLengthStringGenerator) GenerateTyped(r *Rand) string

func (*UniformLengthStringGenerator) SampleFromExisting

func (g *UniformLengthStringGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniformLengthStringGenerator) SampleFromExistingTyped

func (g *UniformLengthStringGenerator) SampleFromExistingTyped(r *Rand) string

type UniqueStringGenerator

type UniqueStringGenerator struct {
	// contains filtered or unexported fields
}

Generates an unique string with a fixed length every time Generate is called. The internal generation is based on an atomic, incrementing integer. Each integer is converted into a string (via a hash function).

Sample from existing will generate a value that has previously been generated. However, the value may have been deleted in the database so it's not guaranteed that the value generated will exist on the database.

func NewUniqueStringGenerator

func NewUniqueStringGenerator(length int, min, current int64) *UniqueStringGenerator

length is the length of the string to be generated min and current are the integer values used to generate the strings. For loading data (when there are nothing in the database), min and current both should be 0. When there are already data in the database, min and current should be set to the min and max integer values used to generate strings that already exist in the database.

func NewUniqueStringGeneratorFromDatabase

func NewUniqueStringGeneratorFromDatabase(databaseConfig DatabaseConfig, table, column string) (*UniqueStringGenerator, error)

func (*UniqueStringGenerator) Generate

func (g *UniqueStringGenerator) Generate(r *Rand) interface{}

func (*UniqueStringGenerator) GenerateTyped

func (g *UniqueStringGenerator) GenerateTyped(r *Rand) string

func (*UniqueStringGenerator) SampleFromExisting

func (g *UniqueStringGenerator) SampleFromExisting(r *Rand) interface{}

func (*UniqueStringGenerator) SampleFromExistingTyped

func (g *UniqueStringGenerator) SampleFromExistingTyped(r *Rand) string

type UuidGenerator

type UuidGenerator struct {
	Version int
}

Generates UUIDs SampleFromExisting is basically broken as this should only very rarely generate a duplicate UUID. Version 1 uuid's have the timestamp at which they were generated embedded in them Version 4 uuid's are random

func NewUuidGenerator

func NewUuidGenerator(version int) *UuidGenerator

NewUuidGenerator Only version 1 (timebased) and version 4 (random) supported

func (*UuidGenerator) Generate

func (g *UuidGenerator) Generate(r *Rand) interface{}

func (*UuidGenerator) GenerateTyped

func (g *UuidGenerator) GenerateTyped(r *Rand) string

func (*UuidGenerator) SampleFromExisting

func (g *UuidGenerator) SampleFromExisting(r *Rand) interface{}

func (*UuidGenerator) SampleFromExistingTyped

func (g *UuidGenerator) SampleFromExistingTyped(r *Rand) string

type VisualizationConfig

type VisualizationConfig struct {
	// The min, max, and size of the visualization histogram that tracks the
	// per-event latency. This is not used to actually track the latency as the code
	// internally uses HDR histogram to track a much wider range. Instead, this is
	// used for the final data logging and web UI display.
	// The min and max has units of microseconds and defaults to 0 and 100000.
	LatencyHistMin  int64
	LatencyHistMax  int64
	LatencyHistSize int64
}

type WorkerContext

type WorkerContext[T any] struct {
	// A worker-specific connection object to the database.
	Conn *Connection

	// A worker-specific Rand object. This is needed because the global rand.*
	// functions in Go uses a global mutex underneath the hood and can cause
	// severe performance problems within mybench due to lock contention.
	Rand *Rand

	// User-defined custom data. If no custom data is needed, set T to be
	// mybench.NoContextData.
	Data T

	// A context.Context object used for tracing only. The tracing used is
	// Golang's runtime/trace package. Each Event executed is already in a task
	// called OuterLoopIteration{WorkloadName}. By the time the Event function is
	// called, it is done so inside a trace region called `Event`. Nested regions
	// can be created by the user.
	TraceCtx context.Context
}

This is the object type that holds the thread-local context data for each benchmark worker. Each benchmark worker has its own copy of this. This object is passed by the benchmark worker to the workload Event() function through the Event() function argument.

The WorkerContext also holds arbitrary data with user-defined types under the Data attribute. This allows the user to store custom thread-local data on each benchmark for their workloads. A common use case is to store a statement object in an user-defined struct.

type Workload

type Workload[ContextDataT any] struct {
	WorkloadConfig
	// contains filtered or unexported fields
}

The actual benchmark struct for a single workload.

func NewWorkload

func NewWorkload[ContextDataT any](workloadIface WorkloadInterface[ContextDataT]) *Workload[ContextDataT]

func (*Workload[ContextDataT]) Config

func (w *Workload[ContextDataT]) Config() WorkloadConfig

func (*Workload[ContextDataT]) FinishInitialization

func (w *Workload[ContextDataT]) FinishInitialization(databaseConfig DatabaseConfig, rateControlConfig RateControlConfig)

func (*Workload[ContextDataT]) ForEachOnlineHistogram

func (w *Workload[ContextDataT]) ForEachOnlineHistogram(f func(int, *OnlineHistogram))

func (*Workload[ContextDataT]) RateControlConfig

func (w *Workload[ContextDataT]) RateControlConfig() RateControlConfig

func (*Workload[ContextDataT]) Run

func (w *Workload[ContextDataT]) Run(ctx context.Context, workerInitializationWg *sync.WaitGroup, startTime time.Time)

type WorkloadConfig

type WorkloadConfig struct {
	// The name of the workload, for identification purposes only.
	Name string

	// scales the workload by the given percentage
	// this currently scales various RateControl parameters
	WorkloadScale float64

	// Configures the visualization for this workload.
	// Some workload may know the latency bounds, and may wish to choose a better scale for the histograms
	Visualization VisualizationConfig
}

Config used to create the Workload

func (WorkloadConfig) Config

func (w WorkloadConfig) Config() WorkloadConfig

type WorkloadDataSnapshot

type WorkloadDataSnapshot struct {
	// Throughput and latency data
	IntervalData

	// The desired throughput
	DesiredRate float64
}

Merges the IntervalData with other data. Represents all the stats collected for a single workload.

type WorkloadInterface

type WorkloadInterface[ContextDataT any] interface {
	// Returns the config for the workload. Should always return the same
	// value.
	//
	// If the struct that implements this interface embeds WorkloadConfig, it
	// does not need to manually define this method as WorkloadConfig already
	// defines this method and structs that embeds WorkloadConfig will inherit
	// that method.
	Config() WorkloadConfig

	// Run the code for a single event. Run your queries here. This function is
	// called repeatedly at the desired rate.
	//
	// This function can be called concurrently from many goroutines. Each
	// goroutine will pass a different connection object to this function, so the
	// connection should be safe to use normally. However, if you rely on any
	// variables set in the Prepare function, make sure it is thread safe but also
	// performant.
	Event(WorkerContext[ContextDataT]) error

	// Since the Event() function can be called from multiple goroutines, there is
	// no convenient place to store goroutine-specific data. Storing it on the
	// WorkloadInterface won't work because that object is shared
	// between all BenchmarkWorkers (and hence goroutines and threads) .
	//
	// Using the WorkerContext.Data field, each BenchmarkWorker can store its own
	// goroutine-local data. This is passed to the Event() function call on that
	// goroutine only, which means use of the data is thread-safe, as it is not
	// accessed from any other threads.
	//
	// This method creates a new context data object and it is called once per
	// goroutine/BenchmarkWorker before the main event loop.
	//
	// If a workload does not need context data, the struct that implements
	// workload interface should simply embed NoContextData, which defines this
	// method.
	NewContextData(*Connection) (ContextDataT, error)
}

An interface for implementing the workload. Although only one Workload struct exists, the functions defined in this can be called concurrently from many (thousands) goroutines. Care must be taken to ensure no data races (use atomic or mutexes, although the latter could be slow as the number of goroutines increase)

A example of a workload could be selecting data from a single table. Another workload could be inserting into the same table. This is similar in meaning as the "class" Section 6.2 in the paper: "OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases" https://www.cs.cmu.edu/~pavlo/papers/oltpbench-vldb.pdf.

Each benchmark supports multiple workloads that can be scaled and reported on seperately allowing for evolving workload mixtures.

Directories

Path Synopsis
benchmarks
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL