discovery

package
Version: v2.1.1+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2017 License: BSD-3-Clause Imports: 22 Imported by: 0

Documentation

Overview

Package discovery provides a way to discover all tablets e.g. within a specific shard and monitor their current health.

Use the HealthCheck object to query for tablets and their health.

For an example how to use the HealthCheck object, see worker/topo_utils.go.

Tablets have to be manually added to the HealthCheck using AddTablet(). Alternatively, use a Watcher implementation which will constantly watch a source (e.g. the topology) and add and remove tablets as they are added or removed from the source. For a Watcher example have a look at NewShardReplicationWatcher().

Each HealthCheck has a HealthCheckStatsListener that will receive notification of when tablets go up and down. TabletStatsCache is one implementation, that caches the known tablets and the healthy ones per keyspace/shard/tabletType.

Internally, the HealthCheck module is connected to each tablet and has a streaming RPC (StreamHealth) open to receive periodic health infos.

Index

Constants

View Source
const (
	DefaultHealthCheckConnTimeout = 1 * time.Minute
	DefaultHealthCheckRetryDelay  = 5 * time.Second
	DefaultHealthCheckTimeout     = 1 * time.Minute
)

See the documentation for NewHealthCheck below for an explanation of these parameters.

View Source
const (
	// DefaultTopoReadConcurrency can be used as default value for the topoReadConcurrency parameter of a TopologyWatcher.
	DefaultTopoReadConcurrency int = 5
	// DefaultTopologyWatcherRefreshInterval can be used as the default value for
	// the refresh interval of a topology watcher.
	DefaultTopologyWatcherRefreshInterval = 1 * time.Minute

	// HealthCheckTemplate is the HTML code to display a TabletsCacheStatusList
	HealthCheckTemplate = `` /* 649-byte string literal not displayed */

)

Variables

This section is empty.

Functions

func IsReplicationLagHigh

func IsReplicationLagHigh(tabletStats *TabletStats) bool

IsReplicationLagHigh verifies that the given TabletStats refers to a tablet with high replication lag, i.e. higher than the configured discovery_low_replication_lag flag.

func TabletToMapKey

func TabletToMapKey(tablet *topodatapb.Tablet) string

TabletToMapKey creates a key to the map from tablet's host and ports. It should only be used in discovery and related module.

func TrivialStatsUpdate

func TrivialStatsUpdate(o, n *TabletStats) bool

TrivialStatsUpdate returns true iff the old and new TabletStats haven't changed enough to warrant re-calling FilterByReplicationLag.

Types

type FakeHealthCheck

type FakeHealthCheck struct {
	// contains filtered or unexported fields
}

FakeHealthCheck implements discovery.HealthCheck.

func NewFakeHealthCheck

func NewFakeHealthCheck() *FakeHealthCheck

NewFakeHealthCheck returns the fake healthcheck object.

func (*FakeHealthCheck) AddTablet

func (fhc *FakeHealthCheck) AddTablet(tablet *topodatapb.Tablet, name string)

AddTablet adds the tablet and calls the listener.

func (*FakeHealthCheck) AddTestTablet

func (fhc *FakeHealthCheck) AddTestTablet(cell, host string, port int32, keyspace, shard string, tabletType topodatapb.TabletType, serving bool, reparentTS int64, err error) *sandboxconn.SandboxConn

AddTestTablet inserts a fake entry into FakeHealthCheck. The Tablet can be talked to using the provided connection. The Listener is called, as if AddTablet had been called.

func (*FakeHealthCheck) CacheStatus

func (fhc *FakeHealthCheck) CacheStatus() TabletsCacheStatusList

CacheStatus is not implemented.

func (*FakeHealthCheck) Close

func (fhc *FakeHealthCheck) Close() error

Close is not implemented.

func (*FakeHealthCheck) GetAllTablets

func (fhc *FakeHealthCheck) GetAllTablets() map[string]*topodatapb.Tablet

GetAllTablets returns all the tablets we have.

func (*FakeHealthCheck) GetConnection

func (fhc *FakeHealthCheck) GetConnection(key string) queryservice.QueryService

GetConnection returns the TabletConn of the given tablet.

func (*FakeHealthCheck) RegisterStats

func (fhc *FakeHealthCheck) RegisterStats()

RegisterStats is not implemented.

func (*FakeHealthCheck) RemoveTablet

func (fhc *FakeHealthCheck) RemoveTablet(tablet *topodatapb.Tablet)

RemoveTablet removes the tablet.

func (*FakeHealthCheck) Reset

func (fhc *FakeHealthCheck) Reset()

Reset cleans up the internal state.

func (*FakeHealthCheck) SetListener

func (fhc *FakeHealthCheck) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)

SetListener is not implemented.

func (*FakeHealthCheck) WaitForInitialStatsUpdates

func (fhc *FakeHealthCheck) WaitForInitialStatsUpdates()

WaitForInitialStatsUpdates is not implemented.

type FilterByShard

type FilterByShard struct {
	// contains filtered or unexported fields
}

FilterByShard is a TabletRecorder filter that filters tablets by keyspace/shard.

func NewFilterByShard

func NewFilterByShard(tr TabletRecorder, filters []string) (*FilterByShard, error)

NewFilterByShard creates a new FilterByShard on top of an existing TabletRecorder. Each filter is a keyspace|shard entry, where shard can either be a shard name, or a keyrange. All tablets that match at least one keyspace|shard tuple will be forwarded to the underlying TabletRecorder.

func (*FilterByShard) AddTablet

func (fbs *FilterByShard) AddTablet(tablet *topodatapb.Tablet, name string)

AddTablet is part of the TabletRecorder interface.

func (*FilterByShard) RemoveTablet

func (fbs *FilterByShard) RemoveTablet(tablet *topodatapb.Tablet)

RemoveTablet is part of the TabletRecorder interface.

type HealthCheck

type HealthCheck interface {
	// TabletRecorder interface adds AddTablet and RemoveTablet methods.
	// AddTablet adds the tablet, and starts health check on it.
	// RemoveTablet removes the tablet, and stops its StreamHealth RPC.
	TabletRecorder

	// RegisterStats registers the connection counts stats.
	// It can only be called on one Healthcheck object per process.
	RegisterStats()
	// SetListener sets the listener for healthcheck
	// updates. sendDownEvents is used when a tablet changes type
	// (from replica to master for instance). If the listener
	// wants two events (Up=false on old type, Up=True on new
	// type), sendDownEvents should be set. Otherwise, the
	// healthcheck will only send one event (Up=true on new type).
	//
	// Note that the default implementation requires to set the
	// listener before any tablets are added to the healthcheck.
	SetListener(listener HealthCheckStatsListener, sendDownEvents bool)
	// WaitForInitialStatsUpdates waits until all tablets added via
	// AddTablet() call were propagated to the listener via corresponding
	// StatsUpdate() calls. Note that code path from AddTablet() to
	// corresponding StatsUpdate() is asynchronous but not cancelable, thus
	// this function is also non-cancelable and can't return error. Also
	// note that all AddTablet() calls should happen before calling this
	// method. WaitForInitialStatsUpdates won't wait for StatsUpdate() calls
	// corresponding to AddTablet() calls made during its execution.
	WaitForInitialStatsUpdates()
	// GetConnection returns the TabletConn of the given tablet.
	GetConnection(key string) queryservice.QueryService
	// CacheStatus returns a displayable version of the cache.
	CacheStatus() TabletsCacheStatusList
	// Close stops the healthcheck.
	Close() error
}

HealthCheck defines the interface of health checking module. The goal of this object is to maintain a StreamHealth RPC to a lot of tablets. Tablets are added / removed by calling the AddTablet / RemoveTablet methods (other discovery module objects can for instance watch the topology and call these).

Updates to the health of all registered tablet can be watched by registering a listener. To get the underlying "TabletConn" object which is used for each tablet, use the "GetConnection()" method below and pass in the Key string which is also sent to the listener in each update (as it is part of TabletStats).

func NewDefaultHealthCheck

func NewDefaultHealthCheck() HealthCheck

NewDefaultHealthCheck creates a new HealthCheck object with a default configuration.

func NewHealthCheck

func NewHealthCheck(connTimeout, retryDelay, healthCheckTimeout time.Duration) HealthCheck

NewHealthCheck creates a new HealthCheck object. Parameters: connTimeout.

The duration to wait until a health-check streaming connection is up.
0 means it should establish the connection in the background and return immediately.

retryDelay.

The duration to wait before retrying to connect (e.g. after a failed connection
attempt).

healthCheckTimeout.

The duration for which we consider a health check response to be 'fresh'. If we don't get
a health check response from a tablet for more than this duration, we consider the tablet
not healthy.

type HealthCheckImpl

type HealthCheckImpl struct {
	// contains filtered or unexported fields
}

HealthCheckImpl performs health checking and notifies downstream components about any changes.

func (*HealthCheckImpl) AddTablet

func (hc *HealthCheckImpl) AddTablet(tablet *topodatapb.Tablet, name string)

AddTablet adds the tablet, and starts health check. It does not block on making connection. name is an optional tag for the tablet, e.g. an alternative address.

func (*HealthCheckImpl) CacheStatus

func (hc *HealthCheckImpl) CacheStatus() TabletsCacheStatusList

CacheStatus returns a displayable version of the cache.

func (*HealthCheckImpl) Close

func (hc *HealthCheckImpl) Close() error

Close stops the healthcheck. After Close() returned, it's guaranteed that the listener isn't currently executing and won't be called again.

func (*HealthCheckImpl) GetConnection

func (hc *HealthCheckImpl) GetConnection(key string) queryservice.QueryService

GetConnection returns the TabletConn of the given tablet.

func (*HealthCheckImpl) RegisterStats

func (hc *HealthCheckImpl) RegisterStats()

RegisterStats registers the connection counts stats

func (*HealthCheckImpl) RemoveTablet

func (hc *HealthCheckImpl) RemoveTablet(tablet *topodatapb.Tablet)

RemoveTablet removes the tablet, and stops the health check. It does not block.

func (*HealthCheckImpl) SetListener

func (hc *HealthCheckImpl) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)

SetListener sets the listener for healthcheck updates. It must be called after NewHealthCheck and before any tablets are added (either through AddTablet or through a Watcher).

func (*HealthCheckImpl) WaitForInitialStatsUpdates

func (hc *HealthCheckImpl) WaitForInitialStatsUpdates()

WaitForInitialStatsUpdates waits until all tablets added via AddTablet() call were propagated to downstream via corresponding StatsUpdate() calls.

type HealthCheckStatsListener

type HealthCheckStatsListener interface {
	// StatsUpdate is called when:
	// - a new tablet is known to the HealthCheck, and its first
	//   streaming healthcheck is returned. (then ts.Up is true).
	// - a tablet is removed from the list of tablets we watch
	//   (then ts.Up is false).
	// - a tablet dynamically changes its type. When registering the
	//   listener, if sendDownEvents is true, two events are generated
	//   (ts.Up false on the old type, ts.Up true on the new type).
	//   If it is false, only one event is sent (ts.Up true on the new
	//   type).
	StatsUpdate(*TabletStats)
}

HealthCheckStatsListener is the listener to receive health check stats update.

type TabletRecorder

type TabletRecorder interface {
	// AddTablet adds the tablet.
	// Name is an alternate name, like an address.
	AddTablet(tablet *topodatapb.Tablet, name string)

	// RemoveTablet removes the tablet.
	RemoveTablet(tablet *topodatapb.Tablet)
}

TabletRecorder is the part of the HealthCheck interface that can add or remove tablets. We define it as a sub-interface here so we can add filters on tablets if needed.

type TabletStats

type TabletStats struct {
	// Key uniquely identifies that serving tablet. It is computed
	// from the Tablet's record Hostname and PortMap. If a tablet
	// is restarted on different ports, its Key will be different.
	// Key is computed using the TabletToMapKey method below.
	// key can be used in GetConnection().
	Key string
	// Tablet is the tablet object that was sent to HealthCheck.AddTablet.
	Tablet *topodatapb.Tablet
	// Name is an optional tag (e.g. alternative address) for the
	// tablet.  It is supposed to represent the tablet as a task,
	// not as a process.  For instance, it can be a
	// cell+keyspace+shard+tabletType+taskIndex value.
	Name string
	// Target is the current target as returned by the streaming
	// StreamHealth RPC.
	Target *querypb.Target
	// Up describes whether the tablet is added or removed.
	Up bool
	// Serving describes if the tablet can be serving traffic.
	Serving bool
	// TabletExternallyReparentedTimestamp is the last timestamp
	// that this tablet was either elected the master, or received
	// a TabletExternallyReparented event. It is set to 0 if the
	// tablet doesn't think it's a master.
	TabletExternallyReparentedTimestamp int64
	// Stats is the current health status, as received by the
	// StreamHealth RPC (replication lag, ...).
	Stats *querypb.RealtimeStats
	// LastError is the error we last saw when trying to get the
	// tablet's healthcheck.
	LastError error
}

TabletStats is returned when getting the set of tablets.

func FilterByReplicationLag

func FilterByReplicationLag(tabletStatsList []*TabletStats) []*TabletStats

FilterByReplicationLag filters the list of TabletStats by TabletStats.Stats.SecondsBehindMaster. The algorithm (TabletStats that is non-serving or has error is ignored): - Return the list if there is 0 or 1 tablet. - Return the list if all tablets have <=30s lag. - Filter by replication lag: for each tablet, if the mean value without it is more than 0.7 of the mean value across all tablets, it is valid. - Make sure we return at least two tablets (if there is one with <2h replication lag). - If one tablet is removed, run above steps again in case there are two tablets with high replication lag. (It should cover most cases.) For example, lags of (5s, 10s, 15s, 120s) return the first three; lags of (30m, 35m, 40m, 45m) return all.

func RemoveUnhealthyTablets

func RemoveUnhealthyTablets(tabletStatsList []TabletStats) []TabletStats

RemoveUnhealthyTablets filters all unhealthy tablets out. NOTE: Non-serving tablets are considered healthy.

func (*TabletStats) DeepEqual

func (e *TabletStats) DeepEqual(f *TabletStats) bool

DeepEqual compares two TabletStats. Since we include protos, we need to use proto.Equal on these.

func (*TabletStats) String

func (e *TabletStats) String() string

String is defined because we want to print a []*TabletStats array nicely.

type TabletStatsCache

type TabletStatsCache struct {
	// contains filtered or unexported fields
}

TabletStatsCache is a HealthCheckStatsListener that keeps both the current list of available TabletStats, and a serving list: - for master tablets, only the current master is kept. - for non-master tablets, we filter the list using FilterByReplicationLag. It keeps entries for all tablets in the cell it's configured to serve for, and for the master independently of which cell it's in. Note the healthy tablet computation is done when we receive a tablet update only, not at serving time. Also note the cache may not have the last entry received by the tablet. For instance, if a tablet was healthy, and is still healthy, we do not keep its new update.

func NewTabletStatsCache

func NewTabletStatsCache(hc HealthCheck, cell string) *TabletStatsCache

NewTabletStatsCache creates a TabletStatsCache, and registers it as HealthCheckStatsListener of the provided healthcheck. Note we do the registration in this code to guarantee we call SetListener with sendDownEvents=true, as we need these events to maintain the integrity of our cache.

func NewTabletStatsCacheDoNotSetListener

func NewTabletStatsCacheDoNotSetListener(cell string) *TabletStatsCache

NewTabletStatsCacheDoNotSetListener is identical to NewTabletStatsCache but does not automatically set the returned object as listener for "hc". Instead, it's up to the caller to ensure that TabletStatsCache.StatsUpdate() gets called properly. This is useful for chaining multiple listeners. When the caller sets its own listener on "hc", they must make sure that they set the parameter "sendDownEvents" to "true" or this cache won't properly remove tablets whose tablet type changes.

func (*TabletStatsCache) GetHealthyTabletStats

func (tc *TabletStatsCache) GetHealthyTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats

GetHealthyTabletStats returns only the healthy targets. The returned array is owned by the caller. For TabletType_MASTER, this will only return at most one entry, the most recent tablet of type master.

func (*TabletStatsCache) GetTabletStats

func (tc *TabletStatsCache) GetTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats

GetTabletStats returns the full list of available targets. The returned array is owned by the caller.

func (*TabletStatsCache) ResetForTesting

func (tc *TabletStatsCache) ResetForTesting()

ResetForTesting is for use in tests only.

func (*TabletStatsCache) StatsUpdate

func (tc *TabletStatsCache) StatsUpdate(ts *TabletStats)

StatsUpdate is part of the HealthCheckStatsListener interface.

func (*TabletStatsCache) WaitForAllServingTablets

func (tc *TabletStatsCache) WaitForAllServingTablets(ctx context.Context, ts topo.SrvTopoServer, cell string, types []topodatapb.TabletType) error

WaitForAllServingTablets waits for at least one healthy serving tablet in the given cell for all keyspaces / shards before returning. It will return ctx.Err() if the context is canceled.

func (*TabletStatsCache) WaitForTablets

func (tc *TabletStatsCache) WaitForTablets(ctx context.Context, cell, keyspace, shard string, types []topodatapb.TabletType) error

WaitForTablets waits for at least one tablet in the given cell / keyspace / shard before returning. The tablets do not have to be healthy. It will return ctx.Err() if the context is canceled.

type TabletStatsList

type TabletStatsList []*TabletStats

TabletStatsList is used for sorting.

func (TabletStatsList) Len

func (tsl TabletStatsList) Len() int

Len is part of sort.Interface.

func (TabletStatsList) Less

func (tsl TabletStatsList) Less(i, j int) bool

Less is part of sort.Interface

func (TabletStatsList) Swap

func (tsl TabletStatsList) Swap(i, j int)

Swap is part of sort.Interface

type TabletsCacheStatus

type TabletsCacheStatus struct {
	Cell         string
	Target       *querypb.Target
	TabletsStats TabletStatsList
}

TabletsCacheStatus is the current tablets for a cell/target.

func (*TabletsCacheStatus) StatusAsHTML

func (tcs *TabletsCacheStatus) StatusAsHTML() template.HTML

StatusAsHTML returns an HTML version of the status.

type TabletsCacheStatusList

type TabletsCacheStatusList []*TabletsCacheStatus

TabletsCacheStatusList is used for sorting.

func (TabletsCacheStatusList) Len

func (tcsl TabletsCacheStatusList) Len() int

Len is part of sort.Interface.

func (TabletsCacheStatusList) Less

func (tcsl TabletsCacheStatusList) Less(i, j int) bool

Less is part of sort.Interface

func (TabletsCacheStatusList) Swap

func (tcsl TabletsCacheStatusList) Swap(i, j int)

Swap is part of sort.Interface

type TopologyWatcher

type TopologyWatcher struct {
	// contains filtered or unexported fields
}

TopologyWatcher polls tablet from a configurable set of tablets periodically. When tablets are added / removed, it calls the TabletRecorder AddTablet / RemoveTablet interface appropriately.

func NewCellTabletsWatcher

func NewCellTabletsWatcher(topoServer topo.Server, tr TabletRecorder, cell string, refreshInterval time.Duration, topoReadConcurrency int) *TopologyWatcher

NewCellTabletsWatcher returns a TopologyWatcher that monitors all the tablets in a cell, and starts refreshing.

func NewShardReplicationWatcher

func NewShardReplicationWatcher(topoServer topo.Server, tr TabletRecorder, cell, keyspace, shard string, refreshInterval time.Duration, topoReadConcurrency int) *TopologyWatcher

NewShardReplicationWatcher returns a TopologyWatcher that monitors the tablets in a cell/keyspace/shard, and starts refreshing.

func NewTopologyWatcher

func NewTopologyWatcher(topoServer topo.Server, tr TabletRecorder, cell string, refreshInterval time.Duration, topoReadConcurrency int, getTablets func(tw *TopologyWatcher) ([]*topodatapb.TabletAlias, error)) *TopologyWatcher

NewTopologyWatcher returns a TopologyWatcher that monitors all the tablets in a cell, and starts refreshing.

func (*TopologyWatcher) Stop

func (tw *TopologyWatcher) Stop()

Stop stops the watcher. It does not clean up the tablets added to TabletRecorder.

func (*TopologyWatcher) WaitForInitialTopology

func (tw *TopologyWatcher) WaitForInitialTopology() error

WaitForInitialTopology waits until the watcher reads all of the topology data for the first time and transfers the information to TabletRecorder via its AddTablet() method.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL