github

package
v0.0.0-...-e86fd7f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2021 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package github implements the crawler.Crawler interface, getting data from the Github search API.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CloseResponseBody

func CloseResponseBody(resp *http.Response)

func Filename

func Filename(f string) queryField

Filename takes a filename and formats it according to the Github API.

func Filesize

func Filesize(r rangeFormatter) queryField

Filesize takes a rangeFormatter and formats it according to the Github API.

func FindFileSize

func FindFileSize(
	cache cachedSearch, targetFileCount, lowerBound, upperBound uint64) (uint64, error)

FindFileSize finds the filesize range from [lowerBound, return value] that has the largest file count that is smaller than or equal to githubMaxResultsPerQuery. It is important to note that this returned value could already be in a previous range if the next file size has more than 1000 results. It is left to the caller to handle this bit of logic and guarantee forward progession in this case.

func FindRangesForRepoSearch

func FindRangesForRepoSearch(cache cachedSearch, lowerBound, upperBound uint64) ([]string, error)

Outputs a (possibly incomplete) list of ranges to query to find most search results as permissible by the search github search API. Github search only allows 1,000 results per query (paginated). Source: https://developer.github.com/v3/search/

This leaves the possibility of having file sizes with more than 1000 results, This would mean that the search as it is could not find all files. If queries are sorted by last indexed, and retrieved on regular intervals, it should be sufficient to get most if not all documents.

func Keyword

func Keyword(k string) queryField

Keyword takes a single word, and formats it according to the Github API.

func NewCrawler

func NewCrawler(accessToken string, retryCount uint64, client *http.Client,
	query Query) githubCrawler

func Path

func Path(p string) queryField

Path takes a filepath and formats it according to the Github API.

func Repo

func Repo(r string) queryField

Repo takes a repository (i.e., kubernetes-sigs/kustomize) and formats it according to the Github API.

func User

func User(u string) queryField

Path takes a github username and formats it according to the Github API.

Types

type GhClient

type GhClient struct {
	RequestConfig
	// contains filtered or unexported fields
}

func (GhClient) Do

func (gcl GhClient) Do(query string) (*http.Response, error)

func (GhClient) ForwardPaginatedQuery

func (gcl GhClient) ForwardPaginatedQuery(ctx context.Context, query string,
	output chan<- GhResponseInfo) error

ForwardPaginatedQuery follows the links to the next pages and performs all of the queries for a given search query, relaying the data from each request back to an output channel.

func (GhClient) GetDefaultBranch

func (gcl GhClient) GetDefaultBranch(url, repo string, m map[string]string) (string, error)

GetDefaultBranch gets the default branch of a github repository. m is a map which maps a github repository to its default branch. If repo is already in m, the default branch for url will be obtained from m; otherwise, a query will be made to github to obtain the default branch.

func (GhClient) GetFileCreationTime

func (gcl GhClient) GetFileCreationTime(
	k GhFileSpec) (time.Time, error)

GetFileCreationTime gets the earliest date of a file.

func (GhClient) GetFileData

func (gcl GhClient) GetFileData(k GhFileSpec) ([]byte, error)

GetFileData gets the bytes from a file.

func (GhClient) GetRawUserContent

func (gcl GhClient) GetRawUserContent(query string) (*http.Response, error)

User content (file contents) is not API rate limited, so there's no use in throttling this call.

func (GhClient) GetReposData

func (gcl GhClient) GetReposData(query string) (*http.Response, error)

GetReposData performs a search query and handles rate limitting for the '/repos' endpoint as well as timed retries in the case of abuse prevention.

func (GhClient) SearchGithubAPI

func (gcl GhClient) SearchGithubAPI(query string) (*http.Response, error)

SearchGithubAPI performs a search query and handles rate limitting for the 'search/code?' endpoint as well as timed retries in the case of abuse prevention.

type GhFileSpec

type GhFileSpec struct {
	Path       string        `json:"path,omitempty"`
	Repository GitRepository `json:"repository,omitempty"`
}

type GhResponseInfo

type GhResponseInfo struct {
	*http.Response
	Parsed  *githubResponse
	Error   error
	NextURL string
	LastURL string
}

type GitRepository

type GitRepository struct {
	API      string `json:"url,omitempty"`
	URL      string `json:"html_url,omitempty"`
	FullName string `json:"full_name,omitempty"`
}

type Query

type Query []queryField

Example of formating a query: QueryWith(

Filename("kustomization.yaml"),
Filesize(RangeWithin{64, 192}),
Keyword("copyright"),
Keyword("2019"),

).String()

Outputs "q=filename:kustomization.yaml+size:64..192+copyright+2018" which would search for files that have [64, 192] bytes (inclusive range) and that contain the keywords 'copyright' and '2019' somewhere in the file.

func QueryWith

func QueryWith(qfs ...queryField) Query

func (Query) String

func (q Query) String() string

type RangeGreaterThan

type RangeGreaterThan struct {
	// contains filtered or unexported fields
}

RangeLessThan is a range of values strictly greater than (>) size.

func (RangeGreaterThan) RangeString

func (r RangeGreaterThan) RangeString() string

type RangeLessThan

type RangeLessThan struct {
	// contains filtered or unexported fields
}

RangeLessThan is a range of values strictly less than (<) size.

func (RangeLessThan) RangeString

func (r RangeLessThan) RangeString() string

type RangeQueryResult

type RangeQueryResult struct {
	// contains filtered or unexported fields
}

func (*RangeQueryResult) Add

func (r *RangeQueryResult) Add(other RangeQueryResult)

func (*RangeQueryResult) String

func (r *RangeQueryResult) String() string

type RangeWithin

type RangeWithin struct {
	// contains filtered or unexported fields
}

RangeWithin is an inclusive range from start to end.

func RangeSizes

func RangeSizes(s string) RangeWithin

func (RangeWithin) RangeString

func (r RangeWithin) RangeString() string

func (RangeWithin) Size

func (r RangeWithin) Size() uint64

type RequestConfig

type RequestConfig struct {
	// contains filtered or unexported fields
}

RequestConfig stores common variables that must be present for the queries. - CodeSearchRequests: ask Github to check the code indices given a query. - ContentsRequests: ask Github where to download a resource given a repo and a file path. - CommitsRequests: asks Github to list commits made one a file. Useful to determine the date of a file.

func (RequestConfig) CodeSearchRequestWith

func (rc RequestConfig) CodeSearchRequestWith(query Query) request

CodeSearchRequestWith given a list of query parameters that specify the (patial) query, returns a request object with the (parital) query. Must call the URL method to get the string value of the URL. See request.CopyWith, to understand why the request object is useful.

func (RequestConfig) CommitsRequest

func (rc RequestConfig) CommitsRequest(fullRepoName, path string) string

CommitsRequest given the repo name, and a filepath returns a formatted query for the Github API to find the commits that affect this file.

func (RequestConfig) ContentsRequest

func (rc RequestConfig) ContentsRequest(fullRepoName, path string) string

ContentsRequest given the repo name, and the filepath returns a formatted query for the Github API to find the dowload information of this filepath.

func (RequestConfig) ReposRequest

func (rc RequestConfig) ReposRequest(fullRepoName string) string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL