batch

package module

v0.1.0 Latest Latest Go to latest Published: May 2, 2022 License: MIT Imports: 5 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/elgopher/batch

Links

Open Source Insights

README ¶

What it can be used for?

To speed up application performance without sacrificing data consistency or durability and making source code or architecture complex.

The batch package simplifies writing Go applications that process incoming requests (HTTP, GRPC etc.) in a batch manner: instead of processing each request separately, group incoming requests to a batch and run whole group at once. This method of processing can significantly speed up the application and reduce the consumption of disk, network or CPU.

The batch package can be used to write any type of servers that handle thousands of requests per second. Thanks to this small library, you can create relatively simple code without the need to use low-level data structures.

Why batch processing improves performance?

Normally a web application is using following pattern to modify data in the database:

Load resource from database. Resource is some portion of data such as record, document etc. Lock the entire resource pessimistically or optimistically (by reading version number).
Apply change to data
Save resource to database. Release the pessimistic lock. Or run atomic update with version check (optimistic lock).

But such architecture does not scale well if number of requests for a single resource is very high (meaning hundreds or thousands of requests per second). The lock contention in such case is very high and database is significantly overloaded. Practically, the number of concurrent requests is limited.

One solution to this problem is to reduce the number of costly operations. Because a single resource is loaded and saved thousands of times per second we can instead:

Load the resource once (let's say once per second)
Execute all the requests from this period of time on an already loaded resource. Run them all sequentially.
Save the resource and send responses to all clients if data was stored successfully.

Such solution could improve the performance by a factor of 1000. And resource is still stored in a consistent state.

The batch package does exactly that. You configure the duration of window, provide functions to load and save resource and once the request comes in - you run a function:

// set up the batch processor:
processor := batch.StartProcessor(
    batch.Options[*YourResource]{ // YourResource is your Go struct
        MinDuration:  100 * time.Millisecond,
        LoadResource: ...,
        SaveResource: ...,
    },
)

// following code is run from http/grpc handler
// resourceKey uniquely identifies the resource
err := s.BatchProcessor.Run(resourceKey, func(r *YourResource) {
    // here is the code which is executed inside batch  
})

For real-life example see example web application.

Installation

# Add batch to your Go module:
go get github.com/elgopher/batch

Please note that at least Go 1.18 is required.

Scaling out

Single Go http server is able to handle up to 10-50k of requests per second on a commodity hardware. This is a lot, but very often you also need:

high availability (if one server goes down you want other to handle the traffic)
you want to handle hundred thousands or millions of requests per second

For both cases you need to deploy multiple servers and put a load balancer in front of them. Please note though, that you have to carefully configure the load balancing algorithm. Round-robin is not an option here, because sooner or later you will have problems with locking (multiple server instances will run batches on the same resource). Ideal solution is to route requests based on parameters or URL. For example some http parameter could be a resource key. You can instruct load balancer to calculate hash on this parameter and always route requests with this param value to the same backend (of course if all backends are still available).

Documentation ¶

Index ¶

Variables
func GoroutineNumberForKey(key string, goroutines int) int
type Options
type Processor
- func StartProcessor[Resource any](options Options[Resource]) *Processor[Resource]
- func (p *Processor[Resource]) Run(key string, op func(Resource)) error
- func (p *Processor[Resource]) Stop()

Constants ¶

This section is empty.

Variables ¶

View Source

var ProcessorStopped = errors.New("run failed: processor is stopped")

Functions ¶

func GoroutineNumberForKey ¶

func GoroutineNumberForKey(key string, goroutines int) int

Types ¶

type Options ¶

type Options[Resource any] struct {
	MinDuration           time.Duration
	MaxDuration           time.Duration
	LoadResource          func(_ context.Context, key string) (Resource, error)
	SaveResource          func(_ context.Context, key string, _ Resource) error
	GoRoutines            int
	GoRoutineNumberForKey func(_ string, goroutines int) int
}

type Processor ¶

type Processor[Resource any] struct {
	// contains filtered or unexported fields
}

func StartProcessor ¶

func StartProcessor[Resource any](options Options[Resource]) *Processor[Resource]

func (*Processor[Resource]) Run ¶

func (p *Processor[Resource]) Run(key string, op func(Resource)) error

Run lets you run an operation which will be run along other operations in a single batch (as a single atomic transaction). If there is no pending batch then the batch will be started. Operations are run sequentially.

Run ends when the entire batch has ended.

func (*Processor[Resource]) Stop ¶

func (p *Processor[Resource]) Stop()

Stop ends all running batches. No new operations will be accepted.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
_example
http
store
train

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL