samplr

package module
v0.0.0-...-89f4574 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 29, 2022 License: Apache-2.0 Imports: 15 Imported by: 0

README

samplr

samplr is a service designed to take a set of GitHub repositories, scan their history, and produce a set of Snippets which are exposed over an API.

An example snippet:

// [ START helloworld_snippet ]
public static void Main(string[] args) {
    System.Console.WriteLine("Hello world!");
}
// [ END helloworld_snippet]

Main Components

samplrd

This service clones a GitHub repository, iterates through the commits on the master branch, and exposes a gRPC API to query those samples. Once instantiated, it will periodically check the repository for new commits and, in the event there are new commits, will update it's set of snippets.

samplr-sprvsr

This process is a "supervisor" to the rest of the cluster. It reads a list of repositories to track from a Cloud Storage bucket Then interacts with the Kubernetes API (within the cluster) to dynamically add and delete Deployments and Services for each repository listed in the file.

Note: because this service runs in the cluster the service account it runs as must have permissions to edit and delete Deployments and Services.

samplr-rtr

This service is the "entrypoint" to the cluster. It is secured behind Cloud Endpoints, and exposes a gRPC reverse-proxy which inspects the incoming request and forwards it to the Service in the cluster which is responsible for handling that

Other Tools

samplrctl

This is a command line tool using Cobra Commands to query a repository checked out on disk for Snippets

e.g.

samplrctl snippets list /tmp/local-repository

Testing

To test the samplr code cd into the samplr directory and run

go test -v -race ./...

This will run the tests in the samplr directory as well as all subdirectories in it, recursively.

Integration Tests

The integration tests take a bit longer to run, and require network access, so they require a special command line option to run: -tags integration

go test -v -tags integration ./...

This uses the Go tag feature, which is enabled by putting a comment in a file of form

// +build TAGNAME

Deploying

The Makefile has several variables that can be overridden via environment variables in order to customize how the deployment occurs.

# The (gcloud) test cluster that is being worked against
GCP_CLUSTER_NAME ?= devrel-dev-cluster
GCP_CLUSTER_ZONE ?= us-central1-a
# The service account to run as
SERVICE_ACCOUNT_SECRET_NAME ?= service-account-maintnerd
# Bucket settings for Repositories
GCS_BUCKET_NAME ?= devrel-dev-settings
REPOS_FILE_NAME ?= public_repos.json

These defaults are largely the same for dev and prod, but there are noteable differences


Important! After Deploying, it is necessary to find the container that is responsible for running googleapis/google-cloud-java and manually editing the Memory Requests and Limits to be 3.5G and 4G respectively. This Repository has much higher memory requirements than all others.


Deploy to DEV
# Ensure you are using the dev project
make deploy
Deploy to PROD
# Ensure you are using the prod project
# Override the Cluster Name
export GCP_CLUSTER_NAME=devrel-services
# Override the bucket name
export GCS_BUCKET_NAME=devrel-prod-settings
make deploy
Notes

The Deployment process is done via Cloud Build.

The Deployment process is also done using the source code on your machine, as such local, potentially uncommitted or unreviewed changes may be pushed.

Debugging

Finding Problematic Deployments

If you know that repository bar in organization foo is experiencing problems, finding the deployment that is responsible for handling that Repository can be done by running:

kubectl deployments list -l owner=foo,repository=bar,samplr-sprvsr-autogen=true

That should return the singular Deployment responsible for that Repository.

Go to the Kubernetes Engine Section of the Cloud Console, click the "Workloads" tab and search for Name:{DEPLOYMENT_NAME}. Clicking on it will bring you to a drill down view where you can see certain metrics about the Deployment (most interesting are CPU and Memory utilization). There are also links to the Container's logs.

Key Things to Look At
  • The number of times a Pod has been restarted. If that number is relatively high (more than about 10), this might indicate a deeper issue with either Memory Consumption or resource allocation
  • CrashLoopBackoff Errors indicate that the Pod started successfully an ran for a while, but encountered an error and returned a non-zero exit code, was restarted and continued to return non-zero exit codes. This usually indicates a code-related problem that can be reproduced by running samplr locally.
Restarting Problematic Pods

There are two ways to "restart" a problematic pod.

Scaling

This is the "more correct" way to restart them, which is to scale the Deployment to 0, then re-scale it to one.

kubectl scale deployment --replicas=0 {DEPLOYMENT_NAME}

Wait for the scale to complete. Then run

kubectl scale deployment --replicas=1 {DEPLOYMENT_NAME}
Deleting Pods

Because the replicas for a deployment is set to 1, simply deleting the pod will cause Kubernetes to create a new one to satisfy the desired state of "1 Replicas".

Get pod name

kubectl get pods -l owner=foo,repository=bar,samplr-sprvsr-autogen=true

Delete pod

kubectl delete pod {POD_NAME}
Running Locally

When running locally, its usually best to run samplr in a container (though it is possible to run it without it)

To build the images, cd into the samplr directory and run make build to build the Docker images locally, then run

docker run -p 3009:3009 -it samplrd:dev samplrd --owner=foo --repository=bar

In order to run an instance of samplr in a container on your local machine. This command also forwards port 3009 on your machine to 3009 on the container's instance, which allows you to use tools such as BloomRPC in order to inspect the state of the container.

If you are debugging issues with history parsing, it might be useful to mount your /tmp/samplr directory to the /tmp directory in the container in order to debug how the git repositories are being parsed.

docker run -p 3009:3009 -v /tmp/samplr:/tmp -it samplrd:dev samplrd --owner=foo --repository=bar

A useful tool to run while the container is running is the docker stats command.

docker stats

This will bring up a TUI which displays statistics over your running containers. For samplr, the most interesting (and important) is the RAM and memory usage.

This is best run in a seperate window or tmux session.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatLog

func FormatLog(f logrus.Formatter)

FormatLog sets the log's formatter to the given one

func VerboseLog

func VerboseLog()

VerboseLog sets the log level to DebugLevel

Types

type Corpus

type Corpus struct {
	// contains filtered or unexported fields
}

Corpus holds all of a project's metadata.

func (*Corpus) ForEachRepo

func (c *Corpus) ForEachRepo(fn func(repo WatchedRepository) error) error

ForEachRepo iterates over the set of repositories and performs the given function on each and returns the first non-nill error it recieves.

func (*Corpus) ForEachRepoF

func (c *Corpus) ForEachRepoF(fn func(repo WatchedRepository) error, filter func(repo WatchedRepository) bool) error

ForEachRepoF iterates over the set of repositories that match the given filter and performs the given function on them, and returns the first non-nill error it recieves.

func (*Corpus) Initialize

func (c *Corpus) Initialize(ctx context.Context) error

Initialize should be the first call to the corpus to do the initial clone and synchronizing of the corpus's repository set.

func (*Corpus) RLock

func (c *Corpus) RLock()

RLock grabs the corpus's read lock. Grabbing the read lock prevents any concurrent writes from mutating the corpus. This is only necessary if the application is querying the corpus and calling its Update method concurrently.

func (*Corpus) RUnlock

func (c *Corpus) RUnlock()

RUnlock unlocks the corpus's read lock.

func (*Corpus) SetDebug

func (c *Corpus) SetDebug()

SetDebug instructs the Corpus to run in debug mode

func (*Corpus) SetVerbose

func (c *Corpus) SetVerbose(v bool)

SetVerbose enables or disables verbose logging.

func (*Corpus) Sync

func (c *Corpus) Sync(ctx context.Context) error

Sync instructs the Corpus to iterate over its tracked repositories and update all of them.

func (*Corpus) TrackGit

func (c *Corpus) TrackGit(url string, branch string) error

TrackGit instructs the Corpus to track a Git Repository at the given url and branch

type File

type File struct {
	FilePath  string
	GitCommit *GitCommit
	Size      int64
}

File represents a file at a git commit

type GitCommit

type GitCommit struct {
	Body           string
	Subject        string
	AuthorEmail    string
	AuthoredTime   time.Time
	CommitterEmail string
	CommittedTime  time.Time
	Hash           string
	Name           string
}

GitCommit represents a commit in git

type SampleMeta

type SampleMeta struct {
	Title       string           `yaml:"title"`
	Description string           `yaml:"description"`
	Usage       string           `yaml:"usage"`
	APIVersion  string           `yaml:"api_version"`
	Snippets    []SnippetMetaRef `yaml:"snippets"`
}

SampleMeta stores structured metadata about a singular Sample. It can have several Snippets associated with it.

type SampleMetadata

type SampleMetadata struct {
	Meta SampleMeta `yaml:"sample-metadata"`
}

SampleMetadata is the root note of a SampleMeta

type Snippet

type Snippet struct {
	Name     string
	Language string
	Versions []SnippetVersion
	Primary  SnippetVersion
}

Snippet represents a snippet of code

func CalculateSnippets

func CalculateSnippets(o, r string, iter git.CommitIter) ([]*Snippet, error)

CalculateSnippets scans the given set of commits and extracts the snippets found in them

type SnippetMetaRef

type SnippetMetaRef struct {
	RegionTag   string `yaml:"region_tag"`
	Description string `yaml:"description"`
	Usage       string `yaml:"usage"`
}

SnippetMetaRef stores strucutred data about a single Snippet

type SnippetVersion

type SnippetVersion struct {
	Name    string
	File    *File
	Lines   []string
	Content string
	Meta    SnippetVersionMeta
}

SnippetVersion represents a snippet at a particular commit in a repository

type SnippetVersionMeta

type SnippetVersionMeta struct {
	Title       string
	Description string
	Usage       string
	APIVersion  string
}

SnippetVersionMeta stores metadata about a particular SnippetVersion

type WatchedRepository

type WatchedRepository interface {
	// Unique Identifier of the Repository
	// TODO(colnnelson): Work out the details of this
	ID() string
	// Allows iterating over a WatchedRepository's snippets
	ForEachSnippet(func(snippet *Snippet) error) error
	// Allows iterating over a WatchedRepository's snippets that match the given filter
	ForEachSnippetF(func(snippet *Snippet) error, func(snippet *Snippet) bool) error
	// Allows iterating over a WatchedRepository's git commits
	ForEachGitCommit(func(commit *GitCommit) error) error
	// Allows iterating over a WatchedRepository's git commits that match the given filter
	ForEachGitCommitF(func(commit *GitCommit) error, func(commit *GitCommit) bool) error
	// The owner of the repository
	Owner() string
	// The name of the repository
	RepositoryName() string
	// Instructs the Repository to Update
	Update(ctx context.Context) error
}

WatchedRepository represents a repository being watched by the Corpus

Directories

Path Synopsis
cmd
cmd/completion
Package completion provides shell completion capabilities for the CLI.
Package completion provides shell completion capabilities for the CLI.
testutil
Package testutil contains helpers for working with commands in tests.
Package testutil contains helpers for working with commands in tests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL