wikipedia

package
v0.0.0-...-73de0e8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 2, 2018 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package wikipedia fetches Wikipedia articles

Index

Constants

This section is empty.

Variables

View Source
var Available = map[language.Tag]struct{}{}/* 295 elements not displayed */

Available is a map of all languages that Wikipedia supports. https://en.wikipedia.org/wiki/List_of_Wikipedias We sort their table by # of Articles descending.

View Source
var CirrusURL, _ = url.Parse("https://dumps.wikimedia.org/other/cirrussearch/current/")

CirrusURL is the url for the cirrus wikipedia files

View Source
var WikiDataURL, _ = url.Parse("https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2")

WikiDataURL comes from a different url (smaller file)...cirrus link is formatted differently.

Functions

func Languages

func Languages(supported []language.Tag) ([]language.Tag, []language.Tag)

Languages verifies languages based on Wikipedia's supported languages. An empty slice of supported languages implies you support every language available.

Types

type Aliases

type Aliases map[string][]Text

Aliases holds the alternative names for an Item

func (*Aliases) Scan

func (a *Aliases) Scan(value interface{}) error

Scan unmarshals jsonb data

type Award

type Award struct {
	Item []Wikidata `json:"item,omitempty"`
	Date []DateTime `json:"date,omitempty" property:"P585"`
}

Award is an award someone won

type Claims

type Claims struct {
	Image       []string    `json:"image,omitempty"`
	BirthPlace  []Wikidata  `json:"birthplace,omitempty"`
	Sex         []Wikidata  `json:"sex,omitempty"`
	Father      []Wikidata  `json:"father,omitempty"`
	Mother      []Wikidata  `json:"mother,omitempty"`
	Spouse      []Spouse    `json:"spouse,omitempty"`
	Country     []Wikidata  `json:"country,omitempty"` // country of residence
	Instance    []Wikidata  `json:"instance,omitempty"`
	Capital     []Wikidata  `json:"capital,omitempty"`
	Currency    []Wikidata  `json:"currency,omitempty"`
	Flag        []string    `json:"flag,omitempty"`
	Teams       []Team      `json:"teams,omitempty"` // sports teams
	Education   []Education `json:"education,omitempty"`
	Occupation  []Wikidata  `json:"occupation,omitempty"`
	Signature   []string    `json:"signature,omitempty"`
	Interment   []Interment `json:"interment,omitempty"` // burial/ashes location
	Genre       []Wikidata  `json:"genre,omitempty"`
	Religion    []Wikidata  `json:"religion,omitempty"`
	Awards      []Award     `json:"awards,omitempty"`
	Ethnicity   []Wikidata  `json:"ethnicity,omitempty"`
	Military    []Military  `json:"military,omitempty"` // military branch
	RecordLabel []Wikidata  `json:"record_label,omitempty"`
	Discography []Wikidata  `json:"discography,omitempty"`
	Position    []Wikidata  `json:"position,omitempty"` // e.g. position on team...forward, center, etc..
	Partner     []Spouse    `json:"partner,omitempty"`
	Origin      []Wikidata  `json:"origin,omitempty"`         // country of origin
	DeathCause  []Wikidata  `json:"cause_of_death,omitempty"` // there is also P1196 "manner of death"
	Members     []Member    `json:"members,omitempty"`
	Residence   []Wikidata  `json:"residence,omitempty"`
	Hand        []Wikidata  `json:"hand,omitempty"` // left or right-handed
	//Coordinate  []Coordinate `json:"coordinate,omitempty"`
	Birthday    []DateTime   `json:"birthday,omitempty"`
	Death       []DateTime   `json:"death,omitempty"`
	Start       []DateTime   `json:"start,omitempty"`
	Sport       []Wikidata   `json:"sport,omitempty"`
	Drafted     []Wikidata   `json:"drafted,omitempty"`
	GivenName   []Wikidata   `json:"given_name,omitempty"`
	Influences  []Wikidata   `json:"influences,omitempty"`
	Location    []Wikidata   `json:"location,omitempty"`
	Website     []string     `json:"website,omitempty"`
	Population  []Population `json:"population,omitempty"`
	Instrument  []Instrument `json:"instrument,omitempty"` // Jimi Hendrix Fender Stratocaster
	Participant []Wikidata   `json:"participant,omitempty"`
	Nominations []Nomination `json:"nominations,omitempty"`
	Languages   []Wikidata   `json:"languages,omitempty"` // languages spoken and/or written proficiency
	BirthName   []Text       `json:"birth_name,omitempty"`
	Spotify     []string     `json:"spotify,omitempty"`
	Twitter     []string     `json:"twitter,omitempty"`
	Instagram   []string     `json:"instagram,omitempty"`
	Facebook    []string     `json:"facebook,omitempty"`
	YouTube     []string     `json:"youtube,omitempty"`
	WorkStart   []DateTime   `json:"work_start,omitempty"` // better name??? P571 is similar tag
	Height      []Quantity   `json:"height,omitempty"`
	Weight      []Quantity   `json:"weight,omitempty"`
	Siblings    []Wikidata   `json:"siblings,omitempty"`
}

Claims are the formatted and condensed version of the Wikidata claims

func (*Claims) Scan

func (c *Claims) Scan(value interface{}) error

Scan unmarshals jsonb data http://www.booneputney.com/development/gorm-golang-jsonb-value-copy/

type Coordinate

type Coordinate struct {
	Latitude  []float64  `json:"latitude,omitempty"`
	Longitude []float64  `json:"longitude,omitempty"`
	Altitude  []float64  `json:"altitude,omitempty"`
	Precision []float64  `json:"precision,omitempty"`
	Globe     []Wikidata `json:"globe,omitempty"`
}

Coordinate is a Wikipedia coordinate

type DateTime

type DateTime struct {
	Value    string   `json:"value,omitempty"`
	Calendar Wikidata `json:"calendar,omitempty"`
}

DateTime is the raw, unformatted version of a datetime Note: Wikidata only uses Gregorian and Julian calendars

type Descriptions

type Descriptions map[string]Text

Descriptions holds the descriptions for an Item

func (*Descriptions) Scan

func (d *Descriptions) Scan(value interface{}) error

Scan unmarshals jsonb data

type Education

type Education struct {
	Item   []Wikidata `json:"item,omitempty"`
	Start  []DateTime `json:"start,omitempty" property:"P580"`
	End    []DateTime `json:"end,omitempty" property:"P582"`
	Degree []Wikidata `json:"degree,omitempty" property:"P512"`
	Major  []Wikidata `json:"major,omitempty" property:"P812"`
}

Education represents the education of a person

type Fetcher

type Fetcher interface {
	Setup() error
	Fetch(query string, lang language.Tag) (*Item, error)
}

Fetcher outlines the methods used to retrieve Wikipedia snippets

type File

type File struct {
	URL *url.URL

	Base string
	Dir  string
	ABS  string
	// contains filtered or unexported fields
}

File is a wikipedia/wikidata dump file

func CirrusLinks(supported []language.Tag) ([]*File, error)

CirrusLinks finds the latest cirrus links available from wikipedia. e.g. enwiki-20171009-cirrussearch-content.json.gz Note: Cirrus is their elasticsearch-formatted dump files. The cirrussearch urls for wikipedia includes the wikibase_item and has a more similar layout to their API than the dumps found at https://dumps.wikimedia.org/enwiki/latest/.

func NewFile

func NewFile(u *url.URL, l language.Tag) *File

NewFile returns a new file and sets the URL and Base.

func (*File) Download

func (f *File) Download() error

Download downloads a wikipedia/wikidata dump file

func (*File) Parse

func (f *File) Parse(truncate int) error

Parse parses a wikipedia/wikidata dump file and sends it to Dumper

func (*File) SetABS

func (f *File) SetABS(dir string) *File

SetABS sets the absolute path for a file

func (*File) SetDumper

func (f *File) SetDumper(d dumper) *File

SetDumper sets the Dumper for a file

type Instrument

type Instrument struct {
	Item         []Wikidata `json:"item,omitempty"`
	Manufacturer []Wikidata `json:"manufacturer,omitempty" property:"P176"`
}

Instrument is a musical instrument (guitar, drums, etc)

type Interment

type Interment struct {
	Item  []Wikidata `json:"item,omitempty"`
	Start []DateTime `json:"start,omitempty" property:"P580"`
	End   []DateTime `json:"end,omitempty" property:"P582"`
}

Interment is the place a person was buried

type Item

type Item struct {
	Wikipedia
	*Wikidata
}

Item is the text portion of a wikipedia article

type Labels

type Labels map[string]Text

Labels holds the labels for an Item

func (*Labels) Scan

func (l *Labels) Scan(value interface{}) error

Scan unmarshals jsonb data

type Member

type Member struct {
	Item  []Wikidata `json:"item,omitempty"`
	Start []DateTime `json:"start,omitempty" property:"P580"`
	End   []DateTime `json:"end,omitempty" property:"P582"`
	Date  []DateTime `json:"date,omitempty" property:"P585"` // some don't have start/end time just a point-in-time.
}

Member is a part of a group (band, etc)

type Military

type Military struct {
	Item  []Wikidata `json:"item,omitempty"`
	Start []DateTime `json:"start,omitempty" property:"P580"`
	End   []DateTime `json:"end,omitempty" property:"P582"`
}

Military is a person's history in the military

type Nomination

type Nomination struct {
	Item []Wikidata `json:"item,omitempty"`
	For  []Wikidata `json:"for,omitempty" property:"P1686"`
	Date []DateTime `json:"date,omitempty" property:"P585"`
}

Nomination is a nomination for an award

type Population

type Population struct {
	Value []Quantity `json:"value,omitempty"`
	Date  []DateTime `json:"date,omitempty" property:"P585"`
}

Population is a point-in-time value of a country's population

type PostgreSQL

type PostgreSQL struct {
	*sql.DB
}

PostgreSQL contains our client and database info

func (*PostgreSQL) Dump

func (p *PostgreSQL) Dump(wikidata bool, lang language.Tag, rows chan interface{}) error

Dump creates a temporary table and dumps rows via our transaction

func (*PostgreSQL) Fetch

func (p *PostgreSQL) Fetch(query string, lang language.Tag) (*Item, error)

Fetch retrieves an Item from PostgreSQL https://www.wikidata.org/w/api.php

func (*PostgreSQL) Setup

func (p *PostgreSQL) Setup() error

Setup creates our functions

type Quantity

type Quantity struct {
	Amount string   `json:"amount,omitempty"`
	Unit   Wikidata `json:"unit,omitempty"`
}

Quantity is a Wikipedia quantity

type Spouse

type Spouse struct {
	Item  []Wikidata `json:"item,omitempty"`
	Start []DateTime `json:"start,omitempty" property:"P580"`
	End   []DateTime `json:"end,omitempty" property:"P582"`       // do we also need P585 as we do for Partner?
	Place []Wikidata `json:"location,omitempty" property:"P2842"` // AKA Location P276
}

Spouse represents a person's spouse or partner

type Team

type Team struct {
	Item     []Wikidata `json:"item,omitempty"`
	Start    []DateTime `json:"start,omitempty" property:"P580"`
	End      []DateTime `json:"end,omitempty" property:"P582"`
	Position []Wikidata `json:"position,omitempty" property:"P413"`
	Number   []string   `json:"number,omitempty" property:"P1618"`
}

Team represents a team on which a person played

type Text

type Text struct {
	Text     string `json:"value,omitempty"`
	Language string `json:"language,omitempty"`
}

Text is a language and value

type Wikidata

type Wikidata struct {
	ID           string `json:"id,omitempty"`
	Labels       `json:"labels,omitempty"`
	Aliases      `json:"aliases,omitempty"`
	Descriptions `json:"descriptions,omitempty"`
	*Claims
}

Wikidata is a Wikidata item

func (*Wikidata) UnmarshalJSON

func (w *Wikidata) UnmarshalJSON(b []byte) error

UnmarshalJSON formats and extracts only the info we need from claims

type Wikipedia

type Wikipedia struct {
	ID       string `json:"wikibase_item,omitempty"`
	Language string `json:"language,omitempty"`
	Title    string `json:"title,omitempty"`
	Text     string `json:"text,omitempty"`
	// contains filtered or unexported fields
}

Wikipedia holds the summary text of an article

func (*Wikipedia) UnmarshalJSON

func (w *Wikipedia) UnmarshalJSON(data []byte) error

UnmarshalJSON truncates the text

Directories

Path Synopsis
cmd
dumper
Dumper downloads and dumps wikipedia/wikidata/wikiquotes data to a postgresql database.
Dumper downloads and dumps wikipedia/wikidata/wikiquotes data to a postgresql database.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL