
Package wikidump is a golang package that provides utility functions for downloading and extracting wikipedia dumps.


This package can be installed with the go get command:

go get


This package depends on p7zip>=16.02 for 7zip files extraction.






func SQL2CSV

func SQL2CSV(r io.Reader) io.Reader

SQL2CSV transforms on the fly a SQL data dump from into a clean CSV


type Wikidump

type Wikidump struct {
	// contains filtered or unexported fields

Wikidump represent a hub from which request particular dump files of wikipedia.

func From

func From(tmpDir, lang string, t time.Time) (w Wikidump, err error)

From creates a new wikidump from the specified date.

func Latest

func Latest(tmpDir, lang string, checkFor ...string) (w Wikidump, err error)

Latest creates a new wikidump from the latest valid wikipedia dump.

func (Wikidump) CheckFor

func (w Wikidump) CheckFor(filenames ...string) error

CheckFor checks for file existence in the wikidump

func (Wikidump) Date

func (w Wikidump) Date() time.Time

Date returns the date of the current Dump

func (Wikidump) Open

func (w Wikidump) Open(filename string) func(context.Context) (io.ReadCloser, error)

Open returns an iterator over the resources associated with the current filename, the download can be stopped by the context. Once the iterator is depleted, it returns an io.EOF error. Once an error is returned by the iterator, any subsequent call will return the same error. It is the caller's responsibility to call Close on the Reader when done. Open takes care of checking SHA1 sum, retry download and decompressing files.

