anser

package module
v0.0.0-...-dfd9f30 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 15, 2020 License: Apache-2.0 Imports: 23 Imported by: 0

README

=======================================
``anser`` -- Database Migration Toolkit
=======================================

Summary
-------

Anser is a toolkit for managing evolving data sets for
applications. It focuses on on-line data transformations and providing
higher-level tools to support data modeling and access. 

For the Evergreen project, anser allows us to treat these routine data
migrations, data back fills, and retroactively changing the schema of
legacy data as part of application code rather than one-off shell
scripts.

Overview
--------

In general, anser migrations have a two-phase approach. First a
generator runs with some configuration and an input query to collect
input documents and creation migration jobs. Then, the output of these
generators, are executed in parallel 

You can define generators either directly in your own code, *or* you
can use the configuration-file based approach for a more flexible
approach.

Concepts
~~~~~~~~

There are three major types of migrations: 

- ``simple``: these migrations perform their transformations using
  MongoDB's update syntax. Use these migrations for very basic
  migrations, particularly when you want to throttle the rate of
  migrations and avoid the use of larger difficult-to-index
  multi-updates.
  
- ``manual``: these migrations call a user-defined function on a
  ``bson.RawDoc`` representation of the document to migrate. Use these
  migrations for more complex transformations or those migrations
  that you want to write in application code. 
  
- ``stream``: these migrations are similar to manual migrations;
  however, they pass a database session *and* an iterator to all
  documents impacted by the migration. These jobs offer ultimate
  flexibility.
  
Internally these jobs execute using amboy infrastructure and make it
possible to express dependencies between migrations. Additionally the
`MovingAverageRateLimitedWorkers
<https://godoc.org/github.com/mongodb/amboy/pool#NewMovingAverageRateLimitedWorkers>`_
and `SimpleRateLimitingWorkers
<https://godoc.org/github.com/mongodb/amboy/pool#NewSimpleRateLimitedWorkers>`_
were developed to support anser migrations, as well as the `adaptive
ordering local queue
<https://godoc.org/github.com/mongodb/amboy/queue#NewAdaptiveOrderedLocalQueue>`_
which respects dependency-driven ordering.

Considerations
~~~~~~~~~~~~~~

While it's possible to do any kind of migration with anser, we have
found the following properties to be useful to keep in mind when
building migrations: 

- Write your migration implementations so that they are idempotent so
  that it's possible to run them multiple times with the same effect.

- Ensure that generator queries are supported by indexes, otherwise
  the generator processes will force collection scans. 

- Rate-Limiting, provided by configuring the underlying amboy
  infrastructure, focuses on limiting the number of migration (or
  generator) jobs executed, rather than limiting the jobs based on
  their impact. 
  
- Use batch limits. Generators have limits to control the number of
  jobs that they will produce. This is particularly useful for tests,
  but may have adverse effects on job dependency, particularly if
  logical migrations are split across more than one generator
  function.  

Installation
------------

Anser uses `grip <https://github.com/mongodb/grip>`_ for logging and
`amboy <https://github.com/mongodb/amboy>`_ for task
management. Because anser does not vendor these dependencies, you
should also vendor them. 

Resources
---------

Please consult the godoc for most usage. Most of the API is in the `top
level package <https://godoc.org/github.com/mongodb/anser>`_; however,
please do also consider the `model
<https://godoc.org/github.com/mongodb/anser/model>`_ 
and `bsonutil <https://godoc.org/github.com/mongodb/anser/bsonutil>`_ package.

Additionally you can use the interfaces `db
<https://godoc.org/github.com/mongodb/anser/db>`_
package as a wrapper for `mgo <https://godoc.org/github.com/mongodb/anser>`_ to access
MongoDB which allows you to use `mocks
<https://godoc.org/github.com/mongodb/anser/mocks>`_ as needed for
testing without depending on a running database instance.

Project
-------

Please file feature requests and bug reports in the `MAKE project
<https://jira.mongodb.com/browse/MAKE>`_ of the MongoDB Jira
instance. This is also the place to file related amboy and grip
requests.

Future anser development will focus on supporting additional migration
workflows, supporting additional MongoDB and BSON utilities, and
providing tools to support easier data-life-cycle management.

Documentation

Overview

Package anser provides a document transformation and processing tool to support data migrations.

Application

The anser.Application is the primary interface in which migrations are defined and executed. Applications are constructed with a list of MigrationGenerators, and relevant operations. Then the Setup method configures the application, with an anser.Environment, which sets up and collects dependency information. Finally, the Run method executes the migrations in two phases: first by generating migration jobs, and finally by running all migration jobs.

The ordering of migrations is derived from the dependency information between generators and the jobs that they generate. When possible jobs are executed in parallel, but the execution of migration operations is a property of the queue object configured in the anser.Environment.

Dependency Manager

The anser package provides a custom amboy/dependency.Manager object, which allows migrations to express dependencies to other migrations. The State() method ensures that all migration IDs specified as edges are satisfied before reporting as "ready" for work.

Migration Execution Environment

Anser provides the Environment interface, with a global instance accessible via the exported GetEnvironment() function to provide access to runtime configuration state: database connections; amboy.Queue objects, and registries for task implementations.

The Environment is an interface: you can build a mock, or use one provided for testing purposes by anser (coming soon).

Generator

Generators create migration operations and are the first step in an anser Migration. They are supersets of amboy.Job interfaces.

The current limitation is that the generated jobs must be stored within the implementation of the generator job, which means they must either all fit in memory *or* be serializable independently (e.g. fit in the 16mb document limit if using a MongoDB backed queue.)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ResetEnvironment

func ResetEnvironment()

ResetEnvironment resets the global environment object. Use this only in testing (and only when you must.) It is not safe for concurrent use.

Types

type Application

type Application struct {
	Generators []Generator
	Options    model.ApplicationOptions
	// contains filtered or unexported fields
}

Application define the root level of a database migration. Construct a migration application, pass in an anser.Environment object to the Setup function to initialize the application and then call Run to execute the application.

Anser migrations run in two phases, a generation phase, which runs the jobs defined in the Generators field, and then runs all migration operations.

The ordering of migrations is determined by the dependencies: there are dependencies between generator functions, and if a generator function has dependencies, then the migrations it produces will depend on all migrations produced by the generators dependencies.

If the DryRun operation is set, then the application will run all of the migration.

If the Limit operation is set to a value greater than 0, the application will only run *that* number of jobs.

func NewApplication

func NewApplication(env Environment, conf *model.Configuration) (*Application, error)

NewApplication constructs and sets up an application instance from a configuration structure, presumably loaded from a configuration file.

You can construct an application instance using default initializers, if you do not want to define the migrations using the configuration structures.

func (*Application) Run

func (a *Application) Run(ctx context.Context) error

func (*Application) Setup

func (a *Application) Setup(e Environment) error

Setup takes a configured anser.Environment implementation and configures all generator.

You can only run this function once; subsequent attempts return an error but are a noop otherwise.

type Environment

type Environment interface {
	Setup(amboy.Queue, client.Client, db.Session) error
	GetSession() (db.Session, error)
	GetClient() (client.Client, error)
	GetQueue() (amboy.Queue, error)
	GetDependencyNetwork() (model.DependencyNetworker, error)
	MetadataNamespace() model.Namespace

	RegisterLegacyManualMigrationOperation(string, db.MigrationOperation) error
	GetLegacyManualMigrationOperation(string) (db.MigrationOperation, bool)
	RegisterLegacyDocumentProcessor(string, db.Processor) error
	GetLegacyDocumentProcessor(string) (db.Processor, bool)

	RegisterManualMigrationOperation(string, client.MigrationOperation) error
	GetManualMigrationOperation(string) (client.MigrationOperation, bool)
	RegisterDocumentProcessor(string, client.Processor) error
	GetDocumentProcessor(string) (client.Processor, bool)

	NewDependencyManager(string) dependency.Manager
	RegisterCloser(func() error)
	Close() error

	SetPreferedDB(interface{})
	PreferClient() bool
}

Environment exposes the execution environment for the migration utility, and is the method by which, potentially serialized job definitions are able to gain access to the database and through which generator jobs are able to gain access to the queue.

Implementations should be thread-safe, and are not required to be reconfigurable after their initial configuration.

func GetEnvironment

func GetEnvironment() Environment

GetEnvironment returns the global environment object. Because this produces a pointer to the global object, make sure that you have a way to replace it with a mock as needed for testing.

type Generator

type Generator interface {
	// Jobs produces job objects for the results of the
	// generator.
	Jobs() <-chan amboy.Job

	// Generators are themselves amboy.Jobs.
	amboy.Job
}

Generator is a amboy.Job super set used to store implementations that generate other jobs jobs. Internally they construct and store member jobs.

Indeed this interface may be useful at some point for any kind of job generating operation.

func NewManualMigrationGenerator

func NewManualMigrationGenerator(e Environment, opts model.GeneratorOptions, opName string) Generator

func NewSimpleMigrationGenerator

func NewSimpleMigrationGenerator(e Environment, opts model.GeneratorOptions, update map[string]interface{}) Generator

func NewStreamMigrationGenerator

func NewStreamMigrationGenerator(e Environment, opts model.GeneratorOptions, opName string) Generator

type Migration

type Migration amboy.Job

Migration is a type alias for amboy.Job, used to identify migration-operations as distinct from other kinds of amboy.Jobs

func NewManualMigration

func NewManualMigration(e Environment, m model.Manual) Migration

func NewSimpleMigration

func NewSimpleMigration(e Environment, m model.Simple) Migration

func NewStreamMigration

func NewStreamMigration(e Environment, m model.Stream) Migration

type MigrationHelper

type MigrationHelper interface {
	Env() Environment

	// Migrations need to record their state to help resolve
	// dependencies to the database.
	FinishMigration(context.Context, string, *job.Base)
	SaveMigrationEvent(context.Context, *model.MigrationMetadata) error

	// The migration helper provides a model/interface for
	// interacting with the database to check the state of a
	// migration operation, helpful in dependency approval.
	PendingMigrationOperations(context.Context, model.Namespace, map[string]interface{}) int
	GetMigrationEvents(context.Context, map[string]interface{}) MigrationMetadataIterator
}

MigrationHelper is an interface embedded in all jobs as an "extended base" for migrations on top of the existing amboy.Base type which implements most job functionality.

MigrationHelper implementations should not require construction: getter methods should initialize nil values at runtime.

func NewClientMigrationHelper

func NewClientMigrationHelper(e Environment) MigrationHelper

func NewLegacyMigrationHelper

func NewLegacyMigrationHelper(e Environment) MigrationHelper

NewLegacyMigrationHelper constructs a new migration helper instance. Use this to inject environment instances into tasks.

func NewMigrationHelper

func NewMigrationHelper(e Environment) MigrationHelper

type MigrationMetadataIterator

type MigrationMetadataIterator interface {
	Next(context.Context) bool
	Item() *model.MigrationMetadata
	Err() error
	Close() error
}

MigrationMetadataiterator wraps a query response for data about a migration.

Directories

Path Synopsis
Package bsonutil provides a number of simple common utilities for interacting bson-tagged structs in go.
Package bsonutil provides a number of simple common utilities for interacting bson-tagged structs in go.
cmd
Package db provides tools for using MongoDB databases.
Package db provides tools for using MongoDB databases.
Package mock contains mocked implementations of the interfaces defined in the anser package.
Package mock contains mocked implementations of the interfaces defined in the anser package.
Package model provides public data structures and interfaces to represent migration operations.
Package model provides public data structures and interfaces to represent migration operations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL