replds

package module
v0.0.0-...-b6e6e3c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2023 License: LGPL-3.0 Imports: 15 Imported by: 1

README

replds

Maintains a (small) set of files, replicated across multiple servers. It is targeted at small datasets that are managed by automation workflows and need to be propagated to machines at runtime.

Data replication is eventually consistent, conflict resolution applies last-write-wins semantics. Writes are immediately forwarded to all peers, but at most one copy must succeed in order for the write to be acknowleged successfully. The last written data will appear on all nodes as soon as network partitions are resolved.

Given the replication model, this is not safe to use with multiple writers on overlapping key space. For read-modify-update workflows, it is best to implement a separate locking mechanism so that only a single workflow accesses the data at any given time (since there is no locking in this service itself, this is necessary to prevent out-of-order unexpected updates).

There is no dynamic cluster control: the full list of peers must be provided to each daemon. This suggests the usage of a configuration management system to generate the daemon configuration.

Configuration

The replds tool requires a YAML-encoded configuration file (which you can specify using the --config command-line option). This file should contain the following attributes:

  • client - configuration for the replds client commands
    • url - service URL (the hostname can resolve to multiple IP addresses)
    • tls - TLS configuration for the client
      • cert - path to the certificate
      • key - path to the private key
      • ca - path to the CA file
  • server - configuration for the replds server command
    • path - path of the locally managed repository
    • peers - list of URLs of cluster peers
    • tls_client - TLS configuration for the peer-to-peer client
      • cert - path to the certificate
      • key - path to the private key
      • ca - path to the CA file
  • http_server - configuration for the HTTP server
    • tls - server-side TLS configuration
      • cert - path to the server certificate
      • key - path to the server's private key
      • ca - path to the CA used to validate clients
      • acl - TLS-based access controls, a list of entries with the following attributes:
        • path is a regular expression to match the request URL path
        • cn is a regular expression that must match the CommonName part of the subject of the client certificate
    • max_inflight_requests - maximum number of in-flight requests to allow before server-side throttling kicks in

TLS Setup

For safe usage, you will want to secure peer-to-peer and client-to-peer communication with TLS, with separate credentials. Then, you can set ACLs to only allow the /api/internal/ URL prefix for peers, and everything else under /api/ for all clients.

Service integration

The replication strategy adopted by replds puts severe limits on how it can be used, however there are at least two useful use cases that we'd like to examine in more detail. In both cases, there is a single master server that controls the workflow (i.e. the key space is not partitioned).

Letsencrypt automation

In this scenario, SSL certificates are automatically generated at runtime with Letsencrypt (from a cron job), and we need to propagate them to front-end servers.

This scenario is relatively simple because the timeouts and delays involved in the workflow are so much greater than propagation delays and expected fault durations that data convergence is not an issue: when we refresh a SSL certificate 30 days before its expiration, it's fine if it gets picked up by application servers within a day or more.

The workflow is going to look like this:

  • A cron job (on a single node) examines the local repository to find certificates that are about to expire, and renews them using the ACME API. We are ignoring the details of the challenge/response validation process as they are not relevant to data propagation issues.
  • The cron job stores the results in replds.
  • Periodically, the application servers are reloaded to pick up the new certificates, possibly via another cron job.

Using an independent data reload cycle, it is potentially possible to end up in a situation where the application is reloaded when the certificate and the private key do not (yet) match. One possible strategy for handling this situation is for the service to crash, and rely on an automatic service restart policy to keep trying to start it again until the data is up to date: not optimal perhaps, but simple and guaranteed to converge.

Package repository

Here, we need to propagate a Debian package repository across multiple servers for redundancy. The incoming packages are sent to the master repository server (in our case, over SSH), where some processing takes place that results in a bunch of files being updated (the new packages, and the repository metadata). This processing stage needs to access the entire repository.

We're wrapping external functionality and tools, and they may be complex enough that we can't simply make them use the replds API, so we're going to let the tools use the local filesystem as they normally would. At the same time, we can't just run the repository tools on the filesystem copy managed by replds itself, because in that case we would not be able to detect changes. So we use a separate staging directory to run the repository tools on, and the final workflow is:

  • rsync data from the replds-managed dir to the staging dir;
  • run the metadata-generation tools on the staging dir;
  • synchronize the data back to replds using the sync command.

Usage

The Debian package comes with a replds-instance-create script that can be used to set up multiple replds instances. For an instance named foo, the script will setup the replds@foo systemd service, and it will create the replds-foo user and group. Add users that need to read the repository files to that group. The configuration will be read from /etc/replds/foo.yml.

Note that files created by the daemon will be world-readable by default. Set the process umask if you wish to restrict this further.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FS

type FS struct {
	// contains filtered or unexported fields
}

FS implements the 'storage' interface on the local file system.

type HTTPServer

type HTTPServer struct {
	*Server
}

HTTPServer wraps a Server with an HTTP interface.

func NewHTTPServer

func NewHTTPServer(s *Server) *HTTPServer

NewHTTPServer creates a new HTTPServer.

func (*HTTPServer) Handler

func (s *HTTPServer) Handler() http.Handler

Handler returns the http.Handler for this server.

type Node

type Node struct {
	Path  string `json:"path"`
	Value []byte `json:"value"`

	Timestamp time.Time `json:"timestamp"`

	Deleted bool `json:"deleted,omitempty"`
}

Node is an annotated path/value entry.

func (*Node) Copy

func (n *Node) Copy() *Node

type PublicClient

type PublicClient interface {
	SetNodes(context.Context, *SetNodesRequest) (*SetNodesResponse, error)
}

PublicClient for the public HTTP API.

func NewPublicClient

func NewPublicClient(config *clientutil.BackendConfig) (PublicClient, error)

NewPublicClient returns a new client for the public HTTP API.

type Server

type Server struct {
	// contains filtered or unexported fields
}

Server for the replicated filesync.

func NewServer

func NewServer(peers []string, dir string, tlsConfig *clientutil.TLSClientConfig, readonly bool) (*Server, error)

NewServer creates a new Server with the given peers and backends.

func (*Server) Close

func (s *Server) Close()

Close the server and all its associated resources. Wait for poll goroutines to terminate.

type SetNodesRequest

type SetNodesRequest struct {
	Nodes []*Node `json:"nodes"`
}

SetNodesRequest is the request type for the SetNodes method.

type SetNodesResponse

type SetNodesResponse struct {
	HostsOk  int `json:"hosts_ok"`
	HostsErr int `json:"hosts_err"`
}

SetNodesResponse is the response returned by the SetNodes method.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL