retry

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2023 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package retry provides a flexible, configurable retry mechanism with a focus on safety and preventing retry amplification in distributed systems. It is designed to allow arbitrary functions to be retried until they succeed or a maximum number of attempts is reached.

The main type in this package is Retryer, which encapsulates the retry logic. The primary method on Retryer is Do, which takes a function and attempts to execute it until it succeeds or the maximum number of attempts is reached.

Usage:

To use the retry package, create a Retryer with the desired configuration and call its Do method with the function you want to retry:

retryer := retry.NewRetryer(retry.NewDefaultConfig())
err := retryer.Do(ctx, func(ctx context.Context) error {
    // Your code here.
})

If the function succeeds (i.e., returns nil), then Do also returns nil. If the function fails (i.e., returns an error) and the maximum number of attempts is reached, then Do returns an error.

Design:

The retry package uses a token bucket to limit retries. Once the token bucket is exhausted, no more retries are allowed, but calls without retries can still go through. This approach, inspired by the AWS SDK, prevents retry amplification and helps to maintain system stability during periods of high load or partial outages.

Algorithm details when configured with default RateLimitFirstRequestDisabled:

  1. When a function is passed to the Do method of a Retryer, the function is executed.
  2. If the function succeeds (returns nil), the Do method also returns nil. The token bucket is then incremented by a fixed amount (the NoRetryIncrement), rewarding the successful operation.
  3. If the function fails (returns an error), the Retryer checks if the maximum number of attempts has been reached. If so, it returns an error.
  4. If the maximum number of attempts has not been reached, the Retryer attempts to get a retry token from the token bucket, deducting a certain cost (the RetryCost or RetryTimeoutCost, depending on the type of error).
  5. If a retry token is successfully obtained, the function is retried. If the retry is successful, the cost of the retry token is refunded to the token bucket.
  6. The process repeats until the function succeeds or the maximum number of attempts is reached.

The retry package also includes an optional circuit breaker mode, which can be enabled by setting the RateLimitFirstRequest option to RateLimitFirstRequestEnabled when creating a Retryer. In this mode, the Retryer will start rate limiting the first request attempt once the token bucket is exhausted, effectively opening the circuit breaker.

Unlike other retry or circuit breaker packages, the retry package is designed with lessons learned from operating large-scale distributed systems. While libraries like hashicorp/go-retryablehttp offer retry mechanisms, they often lack sophisticated controls to prevent retry amplification, and they are typically specific to HTTP clients. On the other hand, our retry package is not tied to any specific protocol and incorporates a token bucket approach to enforce request quotas, a strategy inspired by the AWS SDK, which itself is based on over two decades of experience operating services and providing SDKs to customers at AWS.

In contrast to the Google SRE book's method of retry budgets, which requires coordination between different parts of the system, our retry package provides a self-contained mechanism for safe retries. This makes it easier to use and integrate into your system without needing to coordinate with other teams or services.

Furthermore, while libraries like sony/gobreaker provide basic circuit breaker functionality, they often use simple failure ratios to open circuit breakers. Our retry package, however, offers an optional circuit breaker mode that is more sophisticated. It starts rate limiting the first request attempt once the token bucket is exhausted, effectively opening the circuit breaker.

Configuration:

The behavior of the Retryer can be customized by providing a Config struct when creating it. The Config struct includes options for the maximum number of attempts, the maximum backoff delay, the set of retryable checks, the set of timeout checks, and various parameters for the token bucket.

For more details, see the documentation for the Retryer, Config, and related types.

Index

Constants

View Source
const (
	// DefaultMaxAttempts is the maximum of attempts for a request.
	DefaultMaxAttempts int = 3

	// DefaultMaxBackoff is the maximum back off delay between attempts.
	DefaultMaxBackoff = time.Second
)
View Source
const (
	// DefaultRetryRateTokens is the number of tokens in the token bucket for the retryRateLimiter.
	//
	// With the defaults, you get 100 failed retries before you are rate limited, or 50 failed retries due to
	// timeouts before you are rate limited.
	DefaultRetryRateTokens uint = 500

	// DefaultRetryCost is the cost of a single failed retry attempt. If you retry, and you succeed, you get a refund.
	// But if the retry fails you lose the tokens.
	DefaultRetryCost uint = 5

	// DefaultRetryTimeoutCost is the cost of a single failed retry attempt due to a timeout error.
	// If you retry and you succeed, you get a refund. But if the retry fails you lose the tokens.
	DefaultRetryTimeoutCost uint = 10

	// DefaultNoRetryIncrement is the number of tokens to add to the token bucket for a successful attempt.
	DefaultNoRetryIncrement uint = 1

	// DefaultProbeRateLimit is the calls per second to allow if the retry token bucket is exhausted and it is
	// also being used to rate limit the first attempt.
	DefaultProbeRateLimit uint = 1
)

def retry token quota values.

Variables

View Source
var DefaultRetryables = []awsretry.IsErrorRetryable{

	awsretry.RetryableConnectionError{},

	awsretry.RetryableHTTPStatusCode{
		Codes: defaultRetryableHTTPStatusCodes,
	},
	awsretry.RetryableHTTPStatusCode{
		Codes: defaultThrottleHTTPStatusCodes,
	},
	RetryableConnectErrorCode{
		Codes: defaultRetryableConnectErrorCodes,
	},
	RetryableConnectErrorCode{
		Codes: defaultThrottleConnectErrorCodes,
	},
	RetryableConnectErrorCode{
		Codes: defaultTimeoutConnectErrorCodes,
	},
}

DefaultRetryables provides the set of retryable checks that are used by default.

Functions

This section is empty.

Types

type Config

type Config struct {
	// Maximum number of attempts that should be made.
	MaxAttempts int

	// MaxBackoff duration between retried attempts.
	MaxBackoff time.Duration

	// Retryables is the set of retryable checks that should be used.
	Retryables awsretry.IsErrorRetryables

	// Timeouts is the set of timeout checks that should be used.
	Timeouts awsretry.IsErrorTimeouts

	// RetryRateTokens is the number of tokens in the token bucket for the retryRateLimiter.
	RetryRateTokens uint

	// The cost to deduct from the retryRateLimiter's token bucket per retry.
	RetryCost uint

	// The cost to deduct from the retryRateLimiter's token bucket per retry caused
	// by timeout error.
	RetryTimeoutCost uint

	// The cost to payback to the retryRateLimiter's token bucket for successful
	// attempts.
	NoRetryIncrement uint

	// ProbeRateLimit is the calls per second to allow if the retry token bucket is exhausted and it is
	// also being used to rate limit the first attempt. This is used as the max and refill rate for the
	// probeRateLimiter.
	ProbeRateLimit uint
}

func NewDefaultConfig

func NewDefaultConfig() Config

type RetryableConnectErrorCode

type RetryableConnectErrorCode struct {
	Codes map[connect_go.Code]struct{}
}

RetryableConnectErrorCode determines if an attempt should be retried based on the Connect error code [1].

[1] https://connectrpc.com/docs/protocol#error-codes

func (RetryableConnectErrorCode) IsErrorRetryable

func (r RetryableConnectErrorCode) IsErrorRetryable(err error) aws.Ternary

IsErrorRetryable return if the error is retryable based on the Connect error code.

type Retryer

type Retryer struct {
	// contains filtered or unexported fields
}

func NewRetryer

func NewRetryer(
	config Config,
	opts ...RetryerOption,
) *Retryer

func (*Retryer) Do

func (r *Retryer) Do(ctx context.Context, f func(context.Context) error) error

Do will attempt to execute the provided function until it succeeds, or the max attempts is reached.

func (*Retryer) GetRetryToken

func (r *Retryer) GetRetryToken(ctx context.Context, opErr error, isProbe bool) (func(error) error, error)

GetRetryToken attempts to deduct the retry cost from the retry token pool. Returns the token release function, or error.

If isProbe is true, then this is a probe request and we are treating the token bucket as a circuit breaker. In this case, we are allowed to make a request despite the retry token bucket being exhausted, because we need to allow through a small, safe rate of traffic to determine when it is safe to resume normal traffic.

type RetryerOption

type RetryerOption func(*Retryer)

func WithLogger

func WithLogger(logger zerolog.Logger) RetryerOption

func WithRateLimitFirstRequestDisabled

func WithRateLimitFirstRequestDisabled() RetryerOption

func WithRateLimitFirstRequestEnabled

func WithRateLimitFirstRequestEnabled() RetryerOption

func WithSleep

func WithSleep(sleep func(time.Duration)) RetryerOption

WithSleep sets the sleep function used to sleep between retries. This is exposed for testing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL