Package wikiassignment is a golang package that provides utility functions for automatically assigning wikipedia pages to topics.
API documentation can be found in the associated godoc reference.
Topics data can be found in overpedia.
This package can be installed with the go get command:
go get github.com/negapedia/wikiassignment/...
You will need a machine with internet connection, 16GB of RAM (for the english version) and docker storage base directory properly setted.
This package depends on
PETSc. The associated dockerfile provides a complete environment in which use this package. Otherwise
PETSc can be installed following the same steps as in the dockerfile or in the PETSc installation page.
lang: wikipedia nationalization to parse or custom JSON, default
date: wikipedia dump date in the format AAAAMMDD, default
Examples of use
docker run negapedia/wikiassignment export -lang en -date 20060102: basic usage, run the image on the english nationalization dump in date 2 January 2006 and store the result in the in-containter
/datafolder, containing: ..1.
semanticgraph.jsonmaps source page ID to the array of target page IDs. ..2.
partition.jsonmaps typology of node (article,category or topic) to the array of page IDs belonging to it. ..3.
absorptionprobabilities.csvrepresents each page in a row with its ID and the weight assignment for each topic. ..4.
pages.csvrepresents pages in the form requested by wiki2overpediadb.
docker run -v /path/2/out/dir:/data negapedia/wikiassignment -d export -lang en: ..1. run the image as before. ..2. mount as a volume the guest
/datafolder to the host folder
/path/2/out/dir, the output folder, so that at the end of the operations
/path/2/out/dirwill contain the result. This folder can be changed to an arbitrary folder of your choice. ..3. run the image in detatched mode. For further explanations please refer to docker run reference.
docker pull negapedia/wikiassignmentUpdate the image to the last revision.
docker kill --signal=SIGQUIT $(docker ps -ql)Quit the last container and log trace dump.
docker logs -f $(docker ps -ql)Fetch the logs of the last container.
docker system prune -fa --volumesRemove all unused images and volume without asking for confirmation.
Package wikiassignment provides utility functions for automatically assigning wikipedia pages to topics.
const ( //TopicNamespaceID represents topic namespace ID TopicNamespaceID = 6666 //CategoryNamespaceID represents category namespace ID in Wikipedia dumps CategoryNamespaceID = 14 //ArticleNamespaceID represents article namespace ID in Wikipedia dumps ArticleNamespaceID = 0 )
This section is empty.
Filter represents a filter to be applied to the semantic graph before the transformation into assignment
type SemanticGraphSources ¶
SemanticGraphSources represents the data sources needed to build the wikipedia semantic graph
Build returns the semantic graph, the distance in hops from any node to the closer topic and a map from namespaces ID to pages ID.