btmtd

command
v0.0.0-...-ba1c585 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 29, 2017 License: BSD-3-Clause Imports: 13 Imported by: 0

README

Mounttable on Cloud Bigtable

This package and its sub-packages contain a mounttable server implementation that uses Google's Cloud Bigtable service for storage. It is fast and scalable to millions of nodes and, with enough replicas, millions of requests per second.

Schema

Bigtable is not a relational database. Each table has only one key, and all operations are atomic only at the row level. There is no way to mutate multiple rows together atomically.

See the Overview of Cloud Bigtable for more information.

Our table has one row per node. The row key is a hash of the node name followed by the name itself. This spreads rows evenly across all tablet servers with no risk of name collision.

The table has three column families:

  • Metadata m: used to store information about the row:
    • ID i: A 4-byte ID for the node. Doesn't have to be globally unique.
    • Version v: The version changes every time the row is mutated. It is used to detect conflicts related to concurrent access.
    • Creator c: The cell's value is the name of its creator and its timestamp is the node creation time.
    • Sticky s: When this column is present, the node is not automatically garbage collected.
    • Permissions p: The access.Permissions of the node.
  • Servers s: Each mounted server has its own column. The column name is the server's address. The value contains the mount flags. The timestamp is the mount deadline.
  • Children c: Each child has its own column. The column name is the name of the child, without the path. The timestamp is the child creation time.

Example:

Key ID Version Creator Sticky Permissions Mounted Server... Child...
540f1a56/ id1 54321 user 1 {"Admin":... (id2)foo
1234abcd/foo id2 123 user {"Admin":... (id3)bar
46d523e3/foo/bar id3 5436 user {"Admin":... /example.com:123 (deadline)

Counters are stored in another table, one row with one column per counter.

Mutations

All operations use optimistic concurrency control. If a conflicting change happens during a mutation, the whole operation is restarted.

  • Mutation on N
    • Get Node N
    • Check caller's permissions
    • Apply mutation on N if node version hasn't changed

If the node version changed, it means that another mutation was applied between the time when we retrieved the node and when we tried to apply our mutation. When that happens, we restart to whole operation, starting with retrieving the node again.

Mounting / Unmounting a server

Mounting a server consists of adding a new cell to the node's row. The column family is s, the column name is the address of the server, the timestamp is the mount deadline, and the value contains the mount flags.

Unmounting a server consists of deleting the server's column.

Adding / Removing a child node

Adding or removing a node requires two mutations: one on the parent, one on the child.

When adding a node, we first add it to the parent, and then create a new row with the same timestamp.

When deleting a node, we first delete the row, and then delete the child column on the parent.

If the server process dies between the two mutations, it will leave the parent with a reference to a child row that doesn't exist. As a consequence, the parent will never be seen as "empty" and will not be automatically garbage collected. This will be corrected when:

  • the child is re-created, or
  • the parent is forcibly deleted.

Hot rows & caching

Some nodes are expected to be accessed significantly more than others, e.g. the root node and its immediate children are traversed more often than nodes that are further down the tree. The bigtable rows associated with these nodes are "hotter" which can lead to traffic imbalance and poor performance.

This problem is alleviated with a small cache in the bigtable client. High frequency or concurrent requests for the same rows can be bundled together to reduce both latency and bigtable load at the same time.

Opportunistic garbage collection

A node can be garbage-collected when it has no children, no mounted servers, and hasn't been marked as sticky. A node is sticky when someone explicitly called SetPermissions on it.

The garbage collection happens opportunistically. When a mounttable server accessed a node that is eligible for garbage collection while processing a request, this node is removed before the ongoing request completes.

Documentation

Overview

Runs the mounttable service.

Usage:

btmtd [flags]
btmtd [flags] <command>

The btmtd commands are:

setup       Creates and sets up the table
destroy     Destroy the table
dump        Dump the table
fsck        Check the table consistency
help        Display help for commands or topics

The btmtd flags are:

-cluster=
  The Cloud Bigtable cluster name
-in-memory-test=false
  If true, use an in-memory bigtable server (for testing only)
-key-file=
  The file that contains the Google Cloud JSON credentials to use
-max-nodes-per-user=10000
  The maximum number of nodes that a single user can create.
-max-servers-per-user=10000
  The maximum number of servers that a single user can mount.
-name=
  If provided, causes the mount table to mount itself under this name.
-permissions-file=
  The file that contains the initial node permissions.
-project=
  The Google Cloud project of the Cloud Bigtable cluster
-table=mounttable
  The name of the table to use
-zone=
  The Google Cloud zone of the Cloud Bigtable cluster

The global flags are:

-alsologtostderr=true
  log to standard error as well as files
-log_backtrace_at=:0
  when logging hits line file:N, emit a stack trace
-log_dir=
  if non-empty, write log files to this directory
-logtostderr=false
  log to standard error instead of files
-max_stack_buf_size=4292608
  max size in bytes of the buffer to use for logging stack traces
-metadata=<just specify -metadata to activate>
  Displays metadata for the program and exits.
-stderrthreshold=2
  logs at or above this threshold go to stderr
-time=false
  Dump timing information to stderr before exiting the program.
-v=0
  log level for V logs
-v23.credentials=
  directory to use for storing security credentials
-v23.i18n-catalogue=
  18n catalogue files to load, comma separated
-v23.namespace.root=[/(dev.v.io:r:vprod:service:mounttabled)@ns.dev.v.io:8101]
  local namespace root; can be repeated to provided multiple roots
-v23.permissions.file=map[]
  specify a perms file as <name>:<permsfile>
-v23.permissions.literal=
  explicitly specify the runtime perms as a JSON-encoded access.Permissions.
  Overrides all --v23.permissions.file flags.
-v23.proxy=
  object name of proxy service to use to export services across network
  boundaries
-v23.tcp.address=
  address to listen on
-v23.tcp.protocol=wsh
  protocol to listen with
-v23.vtrace.cache-size=1024
  The number of vtrace traces to store in memory.
-v23.vtrace.collect-regexp=
  Spans and annotations that match this regular expression will trigger trace
  collection.
-v23.vtrace.dump-on-shutdown=true
  If true, dump all stored traces on runtime shutdown.
-v23.vtrace.sample-rate=0
  Rate (from 0.0 to 1.0) to sample vtrace traces.
-v23.vtrace.v=0
  The verbosity level of the log messages to be captured in traces
-vmodule=
  comma-separated list of globpattern=N settings for filename-filtered logging
  (without the .go suffix).  E.g. foo/bar/baz.go is matched by patterns baz or
  *az or b* but not by bar/baz or baz.go or az or b.*
-vpath=
  comma-separated list of regexppattern=N settings for file pathname-filtered
  logging (without the .go suffix).  E.g. foo/bar/baz.go is matched by patterns
  foo/bar/baz or fo.*az or oo/ba or b.z but not by foo/bar/baz.go or fo*az

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL