Version: v1.0.0 Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2015 License: BSD-3-Clause Imports: 5 Imported by: 2



Package config takes care of the configuration file parsing.



This section is empty.


This section is empty.


This section is empty.


type Config

type Config struct {
	// CloneDir is the path to the folder where all repositories are cloned.
	CloneDir string `json:"clone_dir"`

	// TarRepos tells whether repositories shall be stored as tar archives.
	TarRepos bool `json:"tar_repositories"`

	// TmpDir can be used to specify a temporary working directory. If
	// left unspecified, the default system temporary directory will be used.
	// If you have a ramdisk, you are advised to use it here.
	TmpDir string `json:"tmp_dir"`

	// TmpDirFileSizeLimit can be used to specify the maximum size in GB of an
	// object to be temporarily placed in TmpDir for processing. Files of size
	// larger than this value will not be processed in TmpDir.
	TmpDirFileSizeLimit float64 `json:"tmp_dir_file_size_limit"`

	// MaxFetcherWorkers defines the maximum number of workers for the
	// repositories fetching task.
	// It defaults to 1 but if your machine has good I/O throughput and a good
	// CPU, you probably want to increase this conservative value for
	// performance reasons. Note that fetching is I/O and networked bound
	// more than CPU bound and hence you probably do not want to increase this
	// value too much.
	MaxFetcherWorkers uint `json:"max_fetcher_workers"`

	// FetchTimeInterval corresponds to the time to wait betweeb 2 full
	// repositories fetching periods.
	FetchTimeInterval string `json:"fetch_time_interval"`

	// FetchLanguages is the list of programming languages to fetch.
	// If the list is empty or nil, the fetcher will fetch all repositories,
	// independently of the language.
	FetchLanguages []string `json:"fetch_languages"`

	// ThrottlerWaitTime can be used to specify how much time to wait, in
	// seconds, before resuming normal operations if the error rate is too high
	// (defaults to 1800).
	ThrottlerWaitTime uint `json:"throttler_wait_time"`

	// SlidingWindowSize can be used to specify the sliding window size to
	// consider for error throttling (defaults to 60).
	SlidingWindowSize uint `json:"throttler_sliding_window_size"`

	// LeakInterval corresponds to the time, in milliseconds, the throttler
	// waits before discarding an error (defaults to 1000, ie 1 second).
	LeakInterval uint `json:"throttler_leak_interval"`

	// Crawlers is a group of crawlers configuration.
	Crawlers []CrawlerConfig `json:"crawlers"`

	// CrawlingTimeInterval corresponds to the time to wait between 2 full
	// crawling periods.
	CrawlingTimeInterval string `json:"crawling_time_interval"`

	// Database is the database configuration.
	Database DatabaseConfig `json:"database"`

Config is the main configuration structure.

func ReadConfig

func ReadConfig(path string) (*Config, error)

ReadConfig reads a JSON formatted configuration file, verifies the values of the configuration parameters and fills the Config structure.

type CrawlerConfig

type CrawlerConfig struct {
	// Type defines the crawler type (eg: "github").
	Type string `json:"type"`

	// Languages is the list of programming languages of interest.
	Languages []string `json:"languages"`

	// Limit limits the number of repositories to crawl. Set this value to 0 to
	// not use a limit. Otherwise, crawling will stop when "limit" repositories
	// have been fetched.
	// Note that the behavior is slightly different whether UseSearchAPI is set
	// to true or not. When using the search API, this limit correspond to the
	// number of repositories to crawl per language listed in "languages".
	// Otherwise, this is a global limit, regardless of the language.
	Limit int64 `json:"limit"`

	// SinceID corresponds to the repository ID (eg: GitHub repository ID in
	// the case of the github crawler) from which to start querying repositories.
	// Note that this value is ignored when using the search API.
	SinceID int `json:"since_id"`

	// Fork indicate whether "fork" repositories need to be crawled or not.
	Fork bool `json:"fork"`

	// OAuthAccessToken is the API token. If not provided, crawld will work but
	// the number of API call is usually limited to a low number.
	// For instance, in the case of the GitHub crawler, unauthenticated
	// requests are limited to 60 per hour where authenticated requests goes up
	// to 5000 per hour.
	OAuthAccessToken string `json:"oauth_access_token"`

	// UseSearchAPI specifies whether to use the search API or not. The number
	// of results returned by a search API is usually limited. For instance,
	// the GitHub search API limits the results to 1000 repositories.
	// In the case of the github crawler, this means that the maximum number of
	// repositories that can be crawled is 1000 per language (the github crawler
	// orders the results by repository popularity with regard to the number of
	// stars). When a lot of data is wanted, this option shall therefore be set
	// to false.
	UseSearchAPI bool `json:"use_search_api"`

CrawlerConfig is a configuration for a crawler.

type DatabaseConfig

type DatabaseConfig struct {
	// HostName is the hostname, or IP address, of the database server.
	HostName string `json:"hostname"`

	// Port is the PostgreSQL port.
	Port uint `json:"port"`

	// UserName is the PostgreSQL user that has access to the database.
	UserName string `json:"username"`

	// Password is the password of the database user.
	Password string `json:"password"`

	// DBName is the database name.
	DBName string `json:"dbname"`

	// SSLMode defines the SSL mode for the connection to the database.
	// Refer to sslModes for the possible values and their meaning.
	SSLMode string `json:"ssl_mode"`

DatabaseConfig is a configuration for PostgreSQL database connection information

Source Files

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL