Documentation ¶
Overview ¶
Package github implements the crawler.Crawler interface, getting data from the Github search API.
Index ¶
- func Filename(f string) queryField
- func Filesize(r rangeFormatter) queryField
- func FindRangesForRepoSearch(cache cachedSearch) ([]string, error)
- func Keyword(k string) queryField
- func NewCrawler(accessToken string, retryCount uint64, client *http.Client, query Query) githubCrawler
- func Path(p string) queryField
- type GitHubClient
- func (gcl GitHubClient) ForwardPaginatedQuery(ctx context.Context, query string, output chan<- GithubResponseInfo) error
- func (gcl GitHubClient) GetDefaultBranch(url string) (string, error)
- func (gcl GitHubClient) GetFileCreationTime(k GithubFileSpec) (time.Time, error)
- func (gcl GitHubClient) GetFileData(k GithubFileSpec) ([]byte, error)
- func (gcl GitHubClient) GetRawUserContent(query string) (*http.Response, error)
- func (gcl GitHubClient) GetReposData(query string) (*http.Response, error)
- func (gcl GitHubClient) SearchGithubAPI(query string) (*http.Response, error)
- type GithubFileSpec
- type GithubResponseInfo
- type Query
- type RangeGreaterThan
- type RangeLessThan
- type RangeWithin
- type RequestConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Filename ¶
func Filename(f string) queryField
Filename takes a filename and formats it according to the Github API.
func Filesize ¶
func Filesize(r rangeFormatter) queryField
Filesize takes a rangeFormatter and formats it according to the Github API.
func FindRangesForRepoSearch ¶
Outputs a (possibly incomplete) list of ranges to query to find most search results as permissible by the search github search API. Github search only allows 1,000 results per query (paginated). Source: https://developer.github.com/v3/search/
This leaves the possibility of having file sizes with more than 1000 results, This would mean that the search as it is could not find all files. If queries are sorted by last indexed, and retrieved on regular intervals, it should be sufficient to get most if not all documents.
func Keyword ¶
func Keyword(k string) queryField
Keyword takes a single word, and formats it according to the Github API.
func NewCrawler ¶
Types ¶
type GitHubClient ¶
type GitHubClient struct { RequestConfig // contains filtered or unexported fields }
func NewClient ¶
func NewClient(accessToken string, retryCount uint64, client *http.Client) GitHubClient
func (GitHubClient) ForwardPaginatedQuery ¶
func (gcl GitHubClient) ForwardPaginatedQuery(ctx context.Context, query string, output chan<- GithubResponseInfo) error
ForwardPaginatedQuery follows the links to the next pages and performs all of the queries for a given search query, relaying the data from each request back to an output channel.
func (GitHubClient) GetDefaultBranch ¶
func (gcl GitHubClient) GetDefaultBranch(url string) (string, error)
func (GitHubClient) GetFileCreationTime ¶
func (gcl GitHubClient) GetFileCreationTime( k GithubFileSpec) (time.Time, error)
GetFileCreationTime gets the earliest date of a file.
func (GitHubClient) GetFileData ¶
func (gcl GitHubClient) GetFileData(k GithubFileSpec) ([]byte, error)
GetFileData gets the bytes from a file.
func (GitHubClient) GetRawUserContent ¶
func (gcl GitHubClient) GetRawUserContent(query string) (*http.Response, error)
User content (file contents) is not API rate limited, so there's no use in throttling this call.
func (GitHubClient) GetReposData ¶
func (gcl GitHubClient) GetReposData(query string) (*http.Response, error)
GetReposData performs a search query and handles rate limitting for the '/repos' endpoint as well as timed retries in the case of abuse prevention.
func (GitHubClient) SearchGithubAPI ¶
func (gcl GitHubClient) SearchGithubAPI(query string) (*http.Response, error)
SearchGithubAPI performs a search query and handles rate limitting for the 'code/search?' endpoint as well as timed retries in the case of abuse prevention.
type GithubFileSpec ¶
type GithubResponseInfo ¶
type Query ¶
type Query []queryField
Example of formating a query: QueryWith(
Filename("kustomization.yaml"), Filesize(RangeWithin{64, 192}), Keyword("copyright"), Keyword("2019"),
).String()
Outputs "q=filename:kustomization.yaml+size:64..192+copyright+2018" which would search for files that have [64, 192] bytes (inclusive range) and that contain the keywords 'copyright' and '2019' somewhere in the file.
type RangeGreaterThan ¶
type RangeGreaterThan struct {
// contains filtered or unexported fields
}
RangeLessThan is a range of values strictly greater than (>) size.
func (RangeGreaterThan) RangeString ¶
func (r RangeGreaterThan) RangeString() string
type RangeLessThan ¶
type RangeLessThan struct {
// contains filtered or unexported fields
}
RangeLessThan is a range of values strictly less than (<) size.
func (RangeLessThan) RangeString ¶
func (r RangeLessThan) RangeString() string
type RangeWithin ¶
type RangeWithin struct {
// contains filtered or unexported fields
}
RangeWithin is an inclusive range from start to end.
func (RangeWithin) RangeString ¶
func (r RangeWithin) RangeString() string
type RequestConfig ¶
type RequestConfig struct {
// contains filtered or unexported fields
}
RequestConfig stores common variables that must be present for the queries. - CodeSearchRequests: ask Github to check the code indices given a query. - ContentsRequests: ask Github where to download a resource given a repo and a file path. - CommitsRequests: asks Github to list commits made one a file. Useful to determine the date of a file.
func NewRequestConfig ¶
func NewRequestConfig(perPage uint64, accessToken string) RequestConfig
func (RequestConfig) CodeSearchRequestWith ¶
func (rc RequestConfig) CodeSearchRequestWith(query Query) request
CodeSearchRequestWith given a list of query parameters that specify the (patial) query, returns a request object with the (parital) query. Must call the URL method to get the string value of the URL. See request.CopyWith, to understand why the request object is useful.
func (RequestConfig) CommitsRequest ¶
func (rc RequestConfig) CommitsRequest(fullRepoName, path string) string
CommitsRequest given the repo name, and a filepath returns a formatted query for the Github API to find the commits that affect this file.
func (RequestConfig) ContentsRequest ¶
func (rc RequestConfig) ContentsRequest(fullRepoName, path string) string
ContentsRequest given the repo name, and the filepath returns a formatted query for the Github API to find the dowload information of this filepath.
func (RequestConfig) ReposRequest ¶
func (rc RequestConfig) ReposRequest(fullRepoName string) string