Documentation ¶
Overview ¶
Package wikipedia fetches Wikipedia articles
Package wikipedia fetches Wikipedia articles
Index ¶
- Variables
- func Languages(supported []language.Tag) ([]language.Tag, []language.Tag)
- type Aliases
- type Award
- type City
- type Claims
- type Coordinate
- type Country
- type DateTime
- type Definition
- type Descriptions
- type Education
- type Fetcher
- type File
- type FileType
- type Instrument
- type Interment
- type Item
- type JiveData
- type Labels
- type Member
- type Military
- type Nomination
- type Population
- type PostgreSQL
- type Quantity
- type Spouse
- type Synonym
- type Team
- type Text
- type Wikidata
- type Wikipedia
- type Wikiquote
- type Wiktionary
Constants ¶
This section is empty.
Variables ¶
var Available = map[language.Tag]struct{}{}/* 295 elements not displayed */
Available is a map of all languages that Wikipedia supports. https://en.wikipedia.org/wiki/List_of_Wikipedias There is also a separate entry for Wiktionary and Wikiquote:
https://en.wiktionary.org/wiki/Wiktionary:List_of_languages https://en.wikiquote.org/wiki/Wikiquote:Other_language_Wikiquotes
We sort their table by # of Articles descending.
var CirrusURL, _ = url.Parse("https://dumps.wikimedia.org/other/cirrussearch/current/")
CirrusURL is the url for the cirrus wikipedia files
var WikiDataURL, _ = url.Parse("https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2")
WikiDataURL comes from a different url (smaller file)...cirrus link is formatted differently.
Functions ¶
Types ¶
type Award ¶
type Award struct { Item []Wikidata `json:"item,omitempty"` Date []DateTime `json:"date,omitempty" property:"P585"` }
Award is an award someone won
type City ¶
type City struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` }
City is a geographical city NOTE: This datastructure is duplicative and can be combined into a more general label and combined with Country, Interment, Military, etc. structs. Perhaps just add all the qualifiers to our Wikidata structure?
type Claims ¶
type Claims struct { Country []Country `json:"country,omitempty"` Image []string `json:"image,omitempty"` BirthPlace []Wikidata `json:"birthplace,omitempty"` Sex []Wikidata `json:"sex,omitempty"` Father []Wikidata `json:"father,omitempty"` Mother []Wikidata `json:"mother,omitempty"` Spouse []Spouse `json:"spouse,omitempty"` CountryOfCitizenship []Wikidata `json:"country_of_citizenship,omitempty"` // country of residence Instance []Wikidata `json:"instance,omitempty"` Capital []City `json:"capital,omitempty"` Currency []Wikidata `json:"currency,omitempty"` Flag []string `json:"flag,omitempty"` Teams []Team `json:"teams,omitempty"` // sports teams Education []Education `json:"education,omitempty"` Occupation []Wikidata `json:"occupation,omitempty"` Signature []string `json:"signature,omitempty"` Interment []Interment `json:"interment,omitempty"` // burial/ashes location Genre []Wikidata `json:"genre,omitempty"` Religion []Wikidata `json:"religion,omitempty"` Awards []Award `json:"awards,omitempty"` Ethnicity []Wikidata `json:"ethnicity,omitempty"` Military []Military `json:"military,omitempty"` // military branch RecordLabel []Wikidata `json:"record_label,omitempty"` Position []Wikidata `json:"position,omitempty"` // e.g. position on team...forward, center, etc.. MusicBrainz []string `json:"musicbrainz,omitempty"` Partner []Spouse `json:"partner,omitempty"` Origin []Wikidata `json:"origin,omitempty"` // country of origin DeathCause []Wikidata `json:"cause_of_death,omitempty"` // there is also P1196 "manner of death" Members []Member `json:"members,omitempty"` Residence []Wikidata `json:"residence,omitempty"` Hand []Wikidata `json:"hand,omitempty"` // left or right-handed Coordinate []Coordinate `json:"coordinate,omitempty"` Birthday []DateTime `json:"birthday,omitempty"` Death []DateTime `json:"death,omitempty"` Start []DateTime `json:"start,omitempty"` Publication []DateTime `json:"publication,omitempty"` Sport []Wikidata `json:"sport,omitempty"` Drafted []Wikidata `json:"drafted,omitempty"` GivenName []Wikidata `json:"given_name,omitempty"` Influences []Wikidata `json:"influences,omitempty"` Location []Wikidata `json:"location,omitempty"` Website []string `json:"website,omitempty"` Population []Population `json:"population,omitempty"` Instrument []Instrument `json:"instrument,omitempty"` // Jimi Hendrix Fender Stratocaster Participant []Wikidata `json:"participant,omitempty"` Nominations []Nomination `json:"nominations,omitempty"` Languages []Wikidata `json:"languages,omitempty"` // languages spoken and/or written proficiency BirthName []Text `json:"birth_name,omitempty"` Spotify []string `json:"spotify,omitempty"` USDA []string `json:"usda,omitempty"` Twitter []string `json:"twitter,omitempty"` Instagram []string `json:"instagram,omitempty"` Facebook []string `json:"facebook,omitempty"` YouTube []string `json:"youtube,omitempty"` WorkStart []DateTime `json:"work_start,omitempty"` // better name??? P571 is similar tag Height []Quantity `json:"height,omitempty"` Weight []Quantity `json:"weight,omitempty"` Siblings []Wikidata `json:"siblings,omitempty"` }
Claims are the formatted and condensed version of the Wikidata claims
func (*Claims) Scan ¶
Scan unmarshals jsonb data http://www.booneputney.com/development/gorm-golang-jsonb-value-copy/
type Coordinate ¶
type Coordinate struct { Latitude []float64 `json:"latitude,omitempty"` Longitude []float64 `json:"longitude,omitempty"` Altitude []float64 `json:"altitude,omitempty"` Precision []float64 `json:"precision,omitempty"` Globe []Wikidata `json:"globe,omitempty"` }
Coordinate is a Wikipedia coordinate
type Country ¶
type Country struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` }
Country is a geographical country
type DateTime ¶
type DateTime struct { Value string `json:"value,omitempty"` Calendar Wikidata `json:"calendar,omitempty"` }
DateTime is the raw, unformatted version of a datetime Note: Wikidata only uses Gregorian and Julian calendars
type Definition ¶
type Definition struct { Part string `json:"part,omitempty"` Meaning string `json:"meaning,omitempty"` Synonyms []Synonym `json:"synonyms,omitempty"` }
Definition is a single definition and synonyms
type Descriptions ¶
Descriptions holds the descriptions for an Item
func (*Descriptions) Scan ¶
func (d *Descriptions) Scan(value interface{}) error
Scan unmarshals jsonb data
type Education ¶
type Education struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` Degree []Wikidata `json:"degree,omitempty" property:"P512"` Major []Wikidata `json:"major,omitempty" property:"P812"` }
Education represents the education of a person
type File ¶
type File struct { URL *url.URL Base string Dir string ABS string Type FileType // contains filtered or unexported fields }
File is a wikipedia/wikidata dump file
func CirrusLinks ¶
CirrusLinks finds the latest cirrus links available from wikipedia. e.g. enwiki-20171009-cirrussearch-content.json.gz Note: Cirrus is their elasticsearch-formatted dump files. The cirrussearch urls for wikipedia includes the wikibase_item and has a more similar layout to their API than the dumps found at https://dumps.wikimedia.org/enwiki/latest/.
type FileType ¶
type FileType string
FileType is a type of Wikipedia file
const ( // WikidataFT is a Wikidata file type WikidataFT FileType = "wikidata" // WikipediaFT is a Wikipedia file type WikipediaFT FileType = "wikipedia" // WikiquoteFT is a Wikiquote file type WikiquoteFT FileType = "wikiquote" // WiktionaryFT is a Wiktionary file type WiktionaryFT FileType = "wiktionary" )
type Instrument ¶
type Instrument struct { Item []Wikidata `json:"item,omitempty"` Manufacturer []Wikidata `json:"manufacturer,omitempty" property:"P176"` }
Instrument is a musical instrument (guitar, drums, etc)
type Interment ¶
type Interment struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` }
Interment is the place a person was buried
type Item ¶
type Item struct { Wikipedia *Wikidata Wikiquote Wikiquote `json:"wikiquote"` Wiktionary Wiktionary `json:"wiktionary"` }
Item is the contains the complete wiki info for a person, thing or word.
type JiveData ¶
JiveData is a Wikipedia data provider
type Member ¶
type Member struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` Date []DateTime `json:"date,omitempty" property:"P585"` // some don't have start/end time just a point-in-time. }
Member is a part of a group (band, etc)
type Military ¶
type Military struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` }
Military is a person's history in the military
type Nomination ¶
type Nomination struct { Item []Wikidata `json:"item,omitempty"` For []Wikidata `json:"for,omitempty" property:"P1686"` Date []DateTime `json:"date,omitempty" property:"P585"` }
Nomination is a nomination for an award
type Population ¶
type Population struct { Value []Quantity `json:"value,omitempty"` Date []DateTime `json:"date,omitempty" property:"P585"` }
Population is a point-in-time value of a country's population
type PostgreSQL ¶
PostgreSQL contains our client and database info
func (*PostgreSQL) Dump ¶
func (p *PostgreSQL) Dump(ft FileType, lang language.Tag, rows chan interface{}) error
Dump creates a temporary table and dumps rows via our transaction
func (*PostgreSQL) Fetch ¶
Fetch retrieves an Item from PostgreSQL https://www.wikidata.org/w/api.php
type Quantity ¶
type Quantity struct { Amount string `json:"amount,omitempty"` Unit Wikidata `json:"unit,omitempty"` }
Quantity is a Wikipedia quantity
type Spouse ¶
type Spouse struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` // do we also need P585 as we do for Partner? Place []Wikidata `json:"location,omitempty" property:"P2842"` // AKA Location P276 }
Spouse represents a person's spouse or partner
type Synonym ¶
type Synonym struct { Language string `json:"language,omitempty"` Word string `json:"word,omitempty"` }
Synonym is a Wiktionary link to another word
type Team ¶
type Team struct { Item []Wikidata `json:"item,omitempty"` Start []DateTime `json:"start,omitempty" property:"P580"` End []DateTime `json:"end,omitempty" property:"P582"` Position []Wikidata `json:"position,omitempty" property:"P413"` Number []string `json:"number,omitempty" property:"P1618"` }
Team represents a team on which a person played
type Text ¶
type Text struct { Text string `json:"value,omitempty"` Language string `json:"language,omitempty"` }
Text is a language and value
type Wikidata ¶
type Wikidata struct { ID string `json:"id,omitempty"` Labels `json:"labels,omitempty"` Aliases `json:"aliases,omitempty"` Descriptions `json:"descriptions,omitempty"` *Claims `json:"claims,omitempty"` }
Wikidata is a Wikidata item
func (*Wikidata) UnmarshalJSON ¶
UnmarshalJSON formats and extracts only the info we need from claims
type Wikipedia ¶
type Wikipedia struct { ID string `json:"wikibase_item"` Language string `json:"language"` OutgoingLink []string `json:"outgoing_link,omitempty"` Popularity float64 `json:"popularity_score,omitempty"` Title string `json:"title"` Text string `json:"text"` // contains filtered or unexported fields }
Wikipedia holds the summary text of an article
func (*Wikipedia) UnmarshalJSON ¶
UnmarshalJSON truncates the text
type Wikiquote ¶
type Wikiquote struct { ID string `json:"wikibase_item,omitempty"` Language string `json:"language,omitempty"` Source string `json:"source_text,omitempty"` // "text" isn't parseable Quotes []string `json:"quotes,omitempty"` }
Wikiquote holds the summary text of an article another option is xml: https://dumps.wikimedia.org/enwikiquote/20180201/enwikiquote-20180201-pages-articles-multistream.xml.bz2
func (*Wikiquote) UnmarshalJSON ¶
UnmarshalJSON extracts the raw quotes from the source_text
type Wiktionary ¶
type Wiktionary struct { Title string `json:"title"` Language string `json:"language,omitempty"` Source string `json:"source_text,omitempty"` // "text" isn't parseable // Etymology string // origin of the word...not implemented yet // Pronunciation string // not implemented yet Definitions []*Definition `json:"definitions,omitempty"` }
Wiktionary holds the structure for a word and it's definition(s)
func (*Wiktionary) UnmarshalJSON ¶
func (w *Wiktionary) UnmarshalJSON(data []byte) error
UnmarshalJSON extracts the raw info needed from the source_text