github

package
v0.0.0-...-32be1cf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2019 License: Apache-2.0 Imports: 19 Imported by: 0

Documentation

Overview

Package github implements the crawler.Crawler interface, getting data from the Github search API.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Filename

func Filename(f string) queryField

Filename takes a filename and formats it according to the Github API.

func Filesize

func Filesize(r rangeFormatter) queryField

Filesize takes a rangeFormatter and formats it according to the Github API.

func FindRangesForRepoSearch

func FindRangesForRepoSearch(cache cachedSearch) ([]string, error)

Outputs a (possibly incomplete) list of ranges to query to find most search results as permissible by the search github search API. Github search only allows 1,000 results per query (paginated). Source: https://developer.github.com/v3/search/

This leaves the possibility of having file sizes with more than 1000 results, This would mean that the search as it is could not find all files. If queries are sorted by last indexed, and retrieved on regular intervals, it should be sufficient to get most if not all documents.

func Keyword

func Keyword(k string) queryField

Keyword takes a single word, and formats it according to the Github API.

func NewCrawler

func NewCrawler(accessToken string, retryCount uint64, client *http.Client,
	query Query) githubCrawler

func Path

func Path(p string) queryField

Path takes a filepath and formats it according to the Github API.

Types

type GitHubClient

type GitHubClient struct {
	RequestConfig
	// contains filtered or unexported fields
}

func NewClient

func NewClient(accessToken string, retryCount uint64, client *http.Client) GitHubClient

func (GitHubClient) ForwardPaginatedQuery

func (gcl GitHubClient) ForwardPaginatedQuery(ctx context.Context, query string,
	output chan<- GithubResponseInfo) error

ForwardPaginatedQuery follows the links to the next pages and performs all of the queries for a given search query, relaying the data from each request back to an output channel.

func (GitHubClient) GetDefaultBranch

func (gcl GitHubClient) GetDefaultBranch(url string) (string, error)

func (GitHubClient) GetFileCreationTime

func (gcl GitHubClient) GetFileCreationTime(
	k GithubFileSpec) (time.Time, error)

GetFileCreationTime gets the earliest date of a file.

func (GitHubClient) GetFileData

func (gcl GitHubClient) GetFileData(k GithubFileSpec) ([]byte, error)

GetFileData gets the bytes from a file.

func (GitHubClient) GetRawUserContent

func (gcl GitHubClient) GetRawUserContent(query string) (*http.Response, error)

User content (file contents) is not API rate limited, so there's no use in throttling this call.

func (GitHubClient) GetReposData

func (gcl GitHubClient) GetReposData(query string) (*http.Response, error)

GetReposData performs a search query and handles rate limitting for the '/repos' endpoint as well as timed retries in the case of abuse prevention.

func (GitHubClient) SearchGithubAPI

func (gcl GitHubClient) SearchGithubAPI(query string) (*http.Response, error)

SearchGithubAPI performs a search query and handles rate limitting for the 'code/search?' endpoint as well as timed retries in the case of abuse prevention.

type GithubFileSpec

type GithubFileSpec struct {
	Path       string `json:"path,omitempty"`
	Repository struct {
		API      string `json:"url,omitempty"`
		URL      string `json:"html_url,omitempty"`
		FullName string `json:"full_name,omitempty"`
	} `json:"repository,omitempty"`
}

type GithubResponseInfo

type GithubResponseInfo struct {
	*http.Response
	Parsed  *githubResponse
	Error   error
	NextURL string
	LastURL string
}

type Query

type Query []queryField

Example of formating a query: QueryWith(

Filename("kustomization.yaml"),
Filesize(RangeWithin{64, 192}),
Keyword("copyright"),
Keyword("2019"),

).String()

Outputs "q=filename:kustomization.yaml+size:64..192+copyright+2018" which would search for files that have [64, 192] bytes (inclusive range) and that contain the keywords 'copyright' and '2019' somewhere in the file.

func QueryWith

func QueryWith(qfs ...queryField) Query

func (Query) String

func (q Query) String() string

type RangeGreaterThan

type RangeGreaterThan struct {
	// contains filtered or unexported fields
}

RangeLessThan is a range of values strictly greater than (>) size.

func (RangeGreaterThan) RangeString

func (r RangeGreaterThan) RangeString() string

type RangeLessThan

type RangeLessThan struct {
	// contains filtered or unexported fields
}

RangeLessThan is a range of values strictly less than (<) size.

func (RangeLessThan) RangeString

func (r RangeLessThan) RangeString() string

type RangeWithin

type RangeWithin struct {
	// contains filtered or unexported fields
}

RangeWithin is an inclusive range from start to end.

func (RangeWithin) RangeString

func (r RangeWithin) RangeString() string

type RequestConfig

type RequestConfig struct {
	// contains filtered or unexported fields
}

RequestConfig stores common variables that must be present for the queries. - CodeSearchRequests: ask Github to check the code indices given a query. - ContentsRequests: ask Github where to download a resource given a repo and a file path. - CommitsRequests: asks Github to list commits made one a file. Useful to determine the date of a file.

func NewRequestConfig

func NewRequestConfig(perPage uint64, accessToken string) RequestConfig

func (RequestConfig) CodeSearchRequestWith

func (rc RequestConfig) CodeSearchRequestWith(query Query) request

CodeSearchRequestWith given a list of query parameters that specify the (patial) query, returns a request object with the (parital) query. Must call the URL method to get the string value of the URL. See request.CopyWith, to understand why the request object is useful.

func (RequestConfig) CommitsRequest

func (rc RequestConfig) CommitsRequest(fullRepoName, path string) string

CommitsRequest given the repo name, and a filepath returns a formatted query for the Github API to find the commits that affect this file.

func (RequestConfig) ContentsRequest

func (rc RequestConfig) ContentsRequest(fullRepoName, path string) string

ContentsRequest given the repo name, and the filepath returns a formatted query for the Github API to find the dowload information of this filepath.

func (RequestConfig) ReposRequest

func (rc RequestConfig) ReposRequest(fullRepoName string) string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL