Documentation ¶
Index ¶
- func Btoi(b bool) int
- func DetectContributorType(name string, gender int) string
- func DetectGender(name string) int
- func ExpandUrl(url string) string
- func FacebookAccountDetails(territoryName string, account string)
- func FacebookPostsOut(posts []FacebookPost, territoryName string, params FacebookParams) (int, string, time.Time)
- func GetHarvestMd5(text string) string
- func GetKeywords(text string, minSize int, limit int) []string
- func GooglePlusAccountDetails(territoryName string, account string)
- func GooglePlusActivityByAccount(territoryName string, harvestState config.HarvestState, account string, ...) (url.Values, config.HarvestState)
- func GooglePlusActivitySearch(territoryName string, harvestState config.HarvestState, query string, ...) (url.Values, config.HarvestState)
- func InstagramAccountDetails(territoryName string, account string)
- func InstagramFindTags(keyword string) string
- func InstagramSearch(territoryName string, harvestState config.HarvestState, tag string, ...) (url.Values, config.HarvestState)
- func IsQuestion(text string, regexString ...string) bool
- func IsStopKeyword(word string) bool
- func LocaleToLanguageISO(code ...string) string
- func Log(event []byte, channelName string)
- func LogJson(message interface{}, channelName string)
- func New(configuration config.SocialHarvestConf, database *config.SocialHarvestDB)
- func NewFacebook(servicesConfig config.ServicesConfig)
- func NewFacebookTerritoryCredentials(territory string)
- func NewGenderData(femaleFilename string, maleFilename string)
- func NewGooglePlus(servicesConfig config.ServicesConfig)
- func NewGooglePlusTerritoryCredentials(territory string)
- func NewInstagram(servicesConfig config.ServicesConfig)
- func NewInstagramTerritoryCredentials(territory string)
- func NewLoggers(dir string)
- func NewTwitter(servicesConfig config.ServicesConfig)
- func NewTwitterTerritoryCredentials(territory string)
- func NewYouTube(servicesConfig config.ServicesConfig)
- func NewYouTubeTerritoryCredentials(territory string)
- func StoreHarvestedData(message interface{})
- func TwitterAccountDetails(territoryName string, account string)
- func TwitterAccountStream(territoryName string, harvestState config.HarvestState, options url.Values) (url.Values, config.HarvestState)
- func TwitterSearch(territoryName string, harvestState config.HarvestState, query string, ...) (url.Values, config.HarvestState)
- func YouTubeAccountDetails(territoryName string, account string)
- type FacebookAccount
- type FacebookParams
- type FacebookPost
- type MessageTag
- type PagingResult
- type TimeoutTransport
- type UsCensusName
- type Worker
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DetectContributorType ¶
Attempts to determine the contributor type (person, company, etc.) when not provided by a service API. In order to do this, we need a few values to test. TODO: More work on this...
func DetectGender ¶
Detects gender based on US Census database, returns 0 for unknown, -1 for female, and 1 for male
func FacebookAccountDetails ¶
Harvests Facebook account details to track changes in likes, etc. (only for public pages)
func FacebookPostsOut ¶
func FacebookPostsOut(posts []FacebookPost, territoryName string, params FacebookParams) (int, string, time.Time)
Takes an array of Post structs and converts it to JSON and logs to file (to be picked up by Fluentd, Logstash, Ik, etc.)
func GetHarvestMd5 ¶
Turns the harvest id into an md5 string (a simple concatenation would work but some databases such as MySQL have a limit on unique key values so md5 fits without worry)
func GooglePlusAccountDetails ¶
Harvests Google+ account details to track changes in followers, etc. (NOTE: Pages can't currently be tracked by the existing API, it's invite only)
func GooglePlusActivityByAccount ¶
func GooglePlusActivityByAccount(territoryName string, harvestState config.HarvestState, account string, options url.Values) (url.Values, config.HarvestState)
Gets public Google+ activities (posts) by account.
func GooglePlusActivitySearch ¶
func GooglePlusActivitySearch(territoryName string, harvestState config.HarvestState, query string, options url.Values) (url.Values, config.HarvestState)
Gets Google+ activities (posts) by searching for a keyword.
func InstagramAccountDetails ¶
Harvests Instagram account details to track changes in followers, etc.
func InstagramFindTags ¶
Try to find tags based on a keyword (just return one for now, that's all we need for our purposes)
func InstagramSearch ¶
func InstagramSearch(territoryName string, harvestState config.HarvestState, tag string, options url.Values) (url.Values, config.HarvestState)
Get recent Instagram for media related to specific tags on Instagram
func IsQuestion ¶
Detects questions in messages
func IsStopKeyword ¶
A list of stop words for keyword extraction (also available under data/keyword-stop-list.txt - mostly, more were added after testing)
func LocaleToLanguageISO ¶
Simple method for converting locale values like "en_US" (or even en-US) to ISO 639-1 (which would just be "en")
func LogJson ¶
func LogJson(message interface{}, channelName string)
Converts the various things to JSON first before sending those bytes to Log()
func New ¶
func New(configuration config.SocialHarvestConf, database *config.SocialHarvestDB)
Sets up a new harvester with the given configuration (which is comprised of several "services")
func NewFacebook ¶
func NewFacebook(servicesConfig config.ServicesConfig)
Set the appToken for future use (global)
func NewFacebookTerritoryCredentials ¶
func NewFacebookTerritoryCredentials(territory string)
If the territory has a different appToken to use
func NewGenderData ¶
Load data from CSV files in order to detect gender. If new files are being used, call this again.
func NewGooglePlus ¶
func NewGooglePlus(servicesConfig config.ServicesConfig)
func NewGooglePlusTerritoryCredentials ¶
func NewGooglePlusTerritoryCredentials(territory string)
If the territory has different keys to use
func NewInstagram ¶
func NewInstagram(servicesConfig config.ServicesConfig)
Set the client for future use
func NewInstagramTerritoryCredentials ¶
func NewInstagramTerritoryCredentials(territory string)
If the territory has different keys to use
func NewLoggers ¶
func NewLoggers(dir string)
Creates and configures new workers on each of the logging channels and sets the directory path to store the log files.
func NewTwitter ¶
func NewTwitter(servicesConfig config.ServicesConfig)
func NewTwitterTerritoryCredentials ¶
func NewTwitterTerritoryCredentials(territory string)
If the territory has different keys to use
func NewYouTube ¶
func NewYouTube(servicesConfig config.ServicesConfig)
func NewYouTubeTerritoryCredentials ¶
func NewYouTubeTerritoryCredentials(territory string)
If the territory has different keys to use
func StoreHarvestedData ¶
func StoreHarvestedData(message interface{})
Rather than using an observer, just call this function instead (the observer was causing memory leaks) TODO: Look back into channels in the future because I like the idea of pub/sub. In the future it could expand into something useful. The thing I don't like (and why I used the observer) is passing all the configuration stuff around.
func TwitterAccountDetails ¶
Harvests Twitter account details to track changes in followers, etc.
func TwitterAccountStream ¶
func TwitterAccountStream(territoryName string, harvestState config.HarvestState, options url.Values) (url.Values, config.HarvestState)
Harvests from a specific Twitter account stream
func TwitterSearch ¶
func TwitterSearch(territoryName string, harvestState config.HarvestState, query string, options url.Values) (url.Values, config.HarvestState)
Search for status updates and just pass the Tweet along (no special mapping required like FacebookPost{} because the Tweet struct is used across multiple API calls unlike Facebook) All "search" functions (and anything that gets data from an API) will now normalize the data, mapping it to a Social Harvest struct. This means there will be no way to get the original data from the service (back in the main app or from any other Go package that imports the harvester). This is fine because if someone wanted the original data, they could use packages like anaconda directly. What happens now is all data pulled from earch service's API will be sent to a channel (the harvester observer). However, this function should NOT be called in a go-subroutine though. We don't want to make multiple API calls in parallel (rate limits). NOTE: The number of items sent to the observer will be returned along with the last message's time and id. The main package can record this in the harvest logs/table. The harvester will not keep track of this information itself. Its only job is to gather data, send it to the channel and report back on how much was sent (and the last id/time). Period. It doens't care if the data is stored in a database, logged, or streamed out from an API. It just harvests and sends without looking or caring. Whereas previously it would be doing the db calls and logging, etc. This has now all been taken care of with the observer. All of these other processes simply subscribe and listen.
Always passed in first (always): the territory name, and the position in the harvest (HarvestState) ... the rest are going to vary based on the API but typically are the query and options @return options(for pagination), count of items, last id, last time.
func YouTubeAccountDetails ¶
Harvests YouTube channel details to track changes in subscribers. (in theory this could be a comma separated list of account names)
Types ¶
type FacebookAccount ¶
type FacebookAccount struct { // "id" must exist in response. note the leading comma. Id string `json:"id,required"` About string `json:"about"` Category string `json:"category"` Checkins int `json:"checkins"` CompanyOverview string `json:"company_overview"` Description string `json:"description"` Founded string `json:"founded"` GeneralInfo string `json:"general_info"` Likes int `json:"likes"` Link string `json:"link"` Location struct { Street string `json:"street"` City string `json:"city"` State string `json:"state"` Zip string `json:"zip"` Country string `json:"country"` Longitude float64 `json:"longitude"` Latitude float64 `json:"latitude"` } `json:"location"` Name string `json:"name"` Phone string `json:"phone"` TalkingAboutCount int `json:"talking_about_count"` WereHereCount int `json:"were_here_count"` Username string `json:"username"` Website string `json:"website"` Products string `json:"products"` // User specific (the above is a mix of page and user) Gender string `json:"gender"` Locale string `json:"locale"` FirstName string `json:"first_name"` LastName string `json:"last_name"` }
Facebook accounts can be for a user or a page
func FacebookGetUserInfo ¶
func FacebookGetUserInfo(id string, params FacebookParams) FacebookAccount
Gets basic info about an account on Facebook
type FacebookParams ¶
type FacebookParams struct { IncludeEntities string `url:"include_entities,omitempty"` Limit string `url:"limit,omitempty"` Count string `url:"count,omitempty"` Type string `url:"type,omitempty"` Lang string `url:"lang,omitempty"` Q string `url:"q,omitempty"` AccessToken string `url:"access_token,omitempty"` Until string `url:"until,omitempty"` Since string `url:"since,omitempty"` }
func FacebookFeed ¶
func FacebookFeed(territoryName string, harvestState config.HarvestState, account string, params FacebookParams) (FacebookParams, config.HarvestState)
Gets the public posts for a given user or page id (or name actually)
func FacebookSearch ¶
func FacebookSearch(territoryName string, harvestState config.HarvestState, params FacebookParams) (FacebookParams, config.HarvestState)
Searches public posts on Facebook
type FacebookPost ¶
type FacebookPost struct { // "id" must exist in response. note the leading comma. Id string `json:"id,required"` From struct { Id string `json:"id"` Name string `json:"name"` Category string `json:"category"` } `json:"from"` To struct { Data []struct { Id string `json:"id"` Name string `json:"name"` Category string `json:"category"` } `json:"data"` } `json:"to"` CreatedTime string `json:"created_time"` UpdatedTime string `json:"updated_time"` Message string `json:"message"` Description string `json:"description"` Caption string `json:"caption"` Picture string `json:"picture"` Source string `json:"source"` Link string `json:"link"` Count int `json:"count"` } `json:"shares"` Name string `json:"name"` // Should always be "post" right? No, facebook also includes "status" and "link" and "photo" in there, even with the type param set to post. Seems like something changed/broke. Type string `json:"type"` // This can tell us if the user is posting from a mobile device...with some logic. Or just which client apps/SaaS' are most popular to post from (also true for Twitter and could be good data to have). Application struct { Name string `json:"name"` Namespace string `json:"namespace"` Id string `json:"id"` } `json:"application"` MessageTags map[string][]*MessageTag `json:"message_tags"` StoryTags map[string][]*MessageTag `json:"story_tags"` Story string `json:"story"` // Typically accompanies items of type photo. ObjectId string `json:"object_id"` // This only exists on user/page /feed items...and it'll usually be "shared_story" but sometimes I've seen "mobile_status_update" ... which tells us the user is on a mobile device. // Is it important to keep? I don't know. Probably not right now. StatusType string `json:"status_type"` }
type MessageTag ¶
type PagingResult ¶
type TimeoutTransport ¶
type UsCensusName ¶
For determining gender, we use the US Census database https://www.census.gov/genealogy/www/data/1990surnames/names_files.html Note: we could also stistically guess ethnicity, https://www.census.gov/genealogy/www/data/2000surnames/index.html Frequency is the one we want. Cumulative frequency is in relation to all names in the database. So if there was a tie for example, "Pat" being both a male and female name...We could look at the cumulative to see if the Census saw more Pats who were male vs. female... This should be extremely rare and maybe not a great way to break ties, but works.
type Worker ¶
type Worker struct {
// contains filtered or unexported fields
}
func NewWorker ¶
Each worker gets an id and a series name which get combined for a file name and directory within the root directory defined in the Social Harvest configuration.