reb

package
v1.4.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2026 License: MIT Imports: 37 Imported by: 0

README

Global Rebalance - Execution Flow (sketch)

Global rebalance uses a single transport name (trname = "reb") for cross-target object streaming. Because the trname is fixed, at most one DM may be registered against the transport at any moment. This document describes how that constraint is maintained across rebalance generations.

Cleanup mode is a separate rebalance generation mode; it reuses the *Reb lifecycle but does not open data streams. See Cleanup mode. Unless explicitly stated otherwise, the lifecycle and invariants below describe regular, data-moving rebalance generations that use DM/transport streaming.

Lifecycle

The *Reb service is constructed once at target startup and reused across all rebalance generations.

The DM, however, is per-streaming generation: a new DM is constructed at the start of each Run() that needs streams, and torn down before Run() returns.

When rebalance runs in cleanup mode it certainly does not open streams.

New()
    initialize Reb service state only; no DM

Run()
    preempt previous rebalance, if any
    compute haveStreams
    initRenew()

_renew()                            // under reb.mu -------------
    if haveStreams:
        NewDM
        RegRecv      on that DM
        Open         that same DM

endStreams()                        // under reb.mu (called from fini)
    Close        that same DM
    UnregRecv    that same DM
    reb.dm = nil

fini()
    endStreams (above) before xreb.Finish(),
    so EndTime is the safe signal for the next generation

Invariants

  1. Single ownership. reb.dm != nil if and only if this Run is past a successful _renew and has not yet reached endStreams. No other code path sets or clears reb.dm.

  2. Atomic generation start. NewDM + RegRecv + Open runs under reb.mu inside _renew. No other generation can observe a half-constructed DM, and any failure in this sequence unregisters and zeros reb.dm before returning.

  3. Atomic generation end. Close + UnregRecv + (reb.dm = nil) runs under reb.mu inside endStreams, the sole DM teardown site.

  4. Preempt waits preemptRetries seconds for full cleanup. The next generation's preempt polls oxreb.EndTime().IsZero(), not IsDone(). EndTime becomes non-zero only after xreb.Finish(), which runs strictly after endStreams completes. Therefore, when preempt observes a non-zero EndTime, the previous generation's UnregRecv has already happened and the trname slot is free.

Preempt timeout

The (currently hardcoded) preemptRetries polling budget in _preempt() is a compromise.

preemptRetries is currently 16 seconds

During this time the previous - already aborted - generation must fully exit, which entails:

  • abort propagation through joggers and (optional) nwp workers
  • in-flight transport, and
  • fini() quiesce

Under degraded disks or heavy load, abort propagation alone can approach this bound.

In the end, the timeout value is a compromise: long enough to cover typical cleanup, short enough that Smap flicker (when nodes keep leaving and (re)joining) doesn't stack waiters.

Cleanup mode

Cleanup mode is a rebalance generation that reuses the *Reb lifecycle but does not open data streams and does not migrate object payloads.

The motivation is scalability. A regular data-moving rebalance may temporarily leave extra local copies while the cluster converges. Tracking every migrated object at runtime, only to remove the old copy later, would not scale for large clusters and buckets with millions or billions of objects.

Cleanup mode is therefore out-of-band. It performs a separate local walk, recomputes the expected HRW owner for each object, verifies the object at that expected location, and removes the local misplaced copy only when it is safe to do so.

Documentation

Overview

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package ais provides core functionality for the AIStore object storage.

  • Copyright (c) 2018-2024, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package reb provides global cluster-wide rebalance upon adding/removing storage nodes.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsGFN added in v1.3.18

func IsGFN() bool

func OffTimedGFN added in v1.3.18

func OffTimedGFN(detail string)

func OnTimedGFN added in v1.3.18

func OnTimedGFN()

Types

type ExtArgs added in v1.3.26

type ExtArgs struct {
	Tstats cos.StatsUpdater
	Notif  *xact.NotifXact
	Bck    *meta.Bck // advanced usage, limited scope
	Prefix string    // ditto
	Oxid   string    // oldRMD g[version]
	NID    int64     // newRMD version
	Flags  uint32    // xact.ArgsMsg.Flags
}

type Reb

type Reb struct {
	// contains filtered or unexported fields
}

func New

func New(config *cmn.Config) *Reb

func (*Reb) AbortLocal

func (reb *Reb) AbortLocal(olderSmapV int64, err error)

(limited usage; compare with `abortAll` below)

func (*Reb) FilterAdd

func (reb *Reb) FilterAdd(uname []byte)

func (*Reb) RebStatus

func (reb *Reb) RebStatus(status *Status)

via GET /v1/health (apc.Health) - lock-free: all fields below are independently atomic - this path is used by peer health/status polling and must not contend with writers in fini/_renew/cleanup

func (*Reb) Run added in v1.3.31

func (reb *Reb) Run(smap *meta.Smap, extArgs *ExtArgs)

Run() is the main method: serialized to execute one at a time (while possibly _preempting_ currently running rebalance) and go through controlled enumerated stages.

Prior to starting to run over this target's buckets (of user data), there's a certain startup phase that also entails constructing and opening a new data mover (DM), whereby (NewDM + RegRecv + Open) is an atomic generation-start transition.

No other rebalance generation may observe or replace the DM between its registration and open; any failure (below) unregisters the same DM instance and zeros it before returning.

A note on stage management:

  1. Non-EC and EC rebalances run in parallel
  2. Execution starts after the `Reb` sets the current stage to rebStageTraverse
  3. Only EC rebalance changes the current stage
  4. Global rebalance performs checks such as `stage > rebStageTraverse` or `stage < rebStagePostTraverse`. Since all EC stages are between `Traverse` and `PostTraverse` non-EC rebalance does not "notice" stage changes.

See also: README.md in this package.

func (*Reb) RunCleanup added in v1.4.5

func (reb *Reb) RunCleanup(smap *meta.Smap, extArgs *ExtArgs, force bool)

RunCleanup walks mountpaths and removes local copies of objects whose HRW target already has them. Piggy-backs on the rebalance lifecycle (xreg slot, markers, smap snapshot, abort plumbing) but is not a migration: no DM, no streams, no GFN, no cross-target post-traverse synchronization.

Cleanup mode is the post-#288 / post-lomAcks-removal recovery tool: an operator-driven pass that reclaims source-side leftovers from a prior rebalance using HeadObjT2T as the per-LOM safety check.

See also: 'ais space-cleanup' (recommended for routine use).

type Status

type Status struct {
	Targets     meta.Nodes `json:"targets"`             // targets I'm waiting for ACKs from
	SmapVersion int64      `json:"smap_version,string"` // current Smap version (via smapOwner)
	RebVersion  int64      `json:"reb_version,string"`  // Smap version of *this* rebalancing op
	RebID       int64      `json:"reb_id,string"`       // rebalance ID
	Stats       core.Stats `json:"stats"`               // transmitted/received totals
	Stage       uint32     `json:"stage"`               // the current stage - see enum above
	Aborted     bool       `json:"aborted"`             // aborted?
	Running     bool       `json:"running"`             // running?
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL