io

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2018 License: BSD-3-Clause Imports: 21 Imported by: 6

Documentation

Overview

Package io is an Input/Output package. GoSPN reads and writes from .data files. To run GoSPN we must first convert a dataset into a data file. For now, GoSPN supports converting PGM and PBM image files into data files.

Converting PGM files is done by io/pgm.go, whilst PBM files are handled by io/pbm.go. Function names are (supposed to be) intuitive: the input format (e.g. PGM) followed by a suffix to indicate whether it is a folder or not (e.g. F) to the output format data (e.g. PGMFToData). The Buffered variant is for big datasets. Instead of saving every file stream in memory, we concurrently run each stream according to the number of CPUs in the user's machine.

We differentiate Data from Evidence. Data is supposed to contain the classification labels, that is, data is the training set. Evidence removes the instance's labels and acts as test set.

For output we follow the same format as input. VarSetToPGM, for instance, takes a variable instantiation set and converts it into a PGM image. This is useful for image completion.

Other output functions include DrawGraphTools and DrawGraph. DrawGraphTools draws the given SPN into a graph-tool python script. This script can be run just like any pythons script. After doing so, a new image of the SPN will be generated. Note that this requires the graph-tool library (https://graph-tool.skewed.de/). DrawGraph uses Graphviz to draw the graph. You can then run the resulting dot script with sfdp, neato or any other layout program. This requires the graphviz library (http://www.graphviz.org/).

WriteToFile writes the SPN to a file. TODO: ReadFromFile should read the SPN from a .mdl file.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNonIntegerType  = errors.New("gospn: npy data type is non integer.")
	ErrNonDatasetShape = errors.New("gospn: npy data does not have dimension two.")
)
View Source
var Orientations = []CmplType{Top, Bottom, Left, Right}

Orientations contains all CmplType orientations.

View Source
var (
	// Quadrants is an array of all CmplTypes.
	Quadrants = [...]CmplType{Top, Right, Left, Bottom}
)

Functions

func ARFFToData

func ARFFToData(dirname, fname, dname string)

ARFFToData. Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.

func BufferedPGMFToData

func BufferedPGMFToData(dirname, dname string) (int, int, int)

BufferedPGMFToData parses large quantities of files concurrently into a data file dname.

func DownloadFromURL

func DownloadFromURL(u, p string, override bool) error

DownloadFromURL takes an URL u and a destination path p, downloading the contents of u to p. If p is not a complete path (not a directory, contains extension), then the name of the file to be downloaded is copied as the new file's name. If override is set to true and p points to a file, this function overrides file p with the new download. Having said that, take extreme care when using override!

func DrawGraph

func DrawGraph(filename string, s spn.SPN)

DrawGraph creates a file filename and draws an SPN spn in Graphviz dot.

func DrawGraphTools

func DrawGraphTools(filename string, s spn.SPN)

DrawGraphTools creates a file filename and draws an SPN spn in graph-tools. The resulting file is a python source code that outputs a PNG image of the graph.

func GetDataPath

func GetDataPath(dataset string) string

func GetPath

func GetPath(relpath string) string

GetPath gets the absolute path relative to relpath.

func ImgCmplToPGM

func ImgCmplToPGM(filename string, orig, cmpl spn.VarSet, typ CmplType, w, h, max int)

ImgCmplToPGM creates a new file distinguishing the original part of the image from the completion done by the SPN and indicated by typ.

func ImgCmplToPPM

func ImgCmplToPPM(filename string, orig, cmpl spn.VarSet, typ CmplType, w, h int)

ImgCmplToPPM creates a new file distinguishing the original part of the image from the completion done by the SPN and indicated by typ.

func LoadSPN

func LoadSPN(filename string) (spn.SPN, error)

LoadSPN reads a binary file that contains a serialized SPN.

func PBMFToData

func PBMFToData(dirname, dname string)

PBMFToData (PBM Folder to Data file). Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.

func PBMFToEvidence

func PBMFToEvidence(dirname, dname string)

PBMFToEvidence (PBM file to evidence).

func PBMToData

func PBMToData(dirname, dname string, class int)

PBMToData (PBM to Data file). If class is true, it's a classifying problem and will label as class.

func PGMFToData

func PGMFToData(dirname, dname string) (int, int, int)

PGMFToData (PGM Folder to Data file). Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.

func PGMFToEvidence

func PGMFToEvidence(dirname, dname string) (int, int, int)

PGMFToEvidence (PGM file to evidence).

func ParseArff

func ParseArff(filename string) (name string, sc map[int]*learn.Variable, vals []map[int]int,
	labels map[int]map[string]int)

ParseArff takes an ARFF dataset file and returns three structures.

The first is a map that maps VARID -> learn.Variable, containing the internal information necessary for learning. The second is a slice of maps that correspond to the instances of the dataset. Each element in this slice is a map representing this instance. This map is a function VARID -> Value of the variable represented by VARID. The third is a map containing the names/labels of variables when they are of type class or string. It is a function VAR_CLASSID -> string, where the string is the actual label.

As an example, consider the ARFF dataset below:

	% Example dataset sampling a modified rain/slippery road scenario as seen on Adnan Darwiche's
	% Modeling and Reasoning with Bayesian Networks (Section 4.3).
	% We modified variable Winter, changing it to Season and made it into a numeric (yet
	% categorical) variable just to showcase how we deal with numeric variables.
	@RELATION weather
	% GoSPN doesn't (yet) support continuous variables. It does accept discrete values sent as
	% numeric type. In this case we assume a variable season that is discrete and has 4 possible
	% values: 0, 1, 2, 3 with 0-3 being numeric representations for spring-winter.
	@ATTRIBUTE season NUMERIC
	% We can also use the numeric type as boolean.
	@ATTRIBUTE sprinkler numeric
	% Or just use class. In the case class is used, ParseArff returns the labels describing the
	% valuations in the instances.
	@ATTRIBUTE rain {true,false}
	% We can also use string. Just like class, labels are returned separately.
	@ATTRIBUTE wet_grass string
	@ATTRIBUTE slippery STRING
	@data
	0,0,true,true,false
	0,1,false,false,true
	1,0,false,false,false
	1,1,false,true,false
	1,0,true,false,true
	2,0,true,true,true
	2,0,false,false,true
	3,0,true,false,false
 3,1,false,true,false

For numeric variables, we take the highest value in the dataset and set this value as the categorical upper bound of the variable.

func ParseData

func ParseData(filename string) (map[int]*learn.Variable, []map[int]int)

ParseData reads from a file named filename and returns the scope and data map of the parsed data file.

func ParseDataNL

func ParseDataNL(filename string) (map[int]*learn.Variable, []map[int]int, []int)

ParseDataNL reads from a file named filename and returns the scope and data map of the parsed data file. This version doesn't add labels as variables, but return them separately as a slice.

func ParseEvidence

func ParseEvidence(filename string) (map[int]*learn.Variable, []map[int]int, []int)

ParseEvidence takes an evidence file that contains the instantiations of a subset of variables as evidence to be computed during inference. It may contain multiple instantiations.

Returns a slice of maps, with each key corresponding to a variable ID and each associated value as the valuation of such variable; and the scope.

func ParsePartitionedData

func ParsePartitionedData(filename string, p float64, rseed int64) (map[int]*learn.Variable,
	[]map[int]int, []map[int]int, []int)

ParsePartitionedData reads a data file and, with p probability, chooses ((1-p)*100)% of the data file to be used as evidence file. For instance, p=0.7 will create a map[int]*learn.Variable, which contains the data variables, and two []map[int]int. The first []map[int]int returned is the training data, which composes 70% of the data file. The second map will return the evidence table with the remaining 30% data file. This partitioning is defined by the pseudo-random seed rseed. If rseed < 0, then use the default pseudo-random seed. It also returns the labels of each test line.

Note: since this function "breaks" the order of classification, it returns a separate label containing the actual classification of each instantiation.

func ReadFromFile

func ReadFromFile(filename string) spn.SPN

ReadFromFile reads an SPN from an spn mdl file.

func SaveSPN

func SaveSPN(filename string, S spn.SPN) error

SaveSPN serializes an SPN and writes it to a file. Suggested extension: ".spn".

func SplitHalf

func SplitHalf(O spn.VarSet, t CmplType, w, h int) (spn.VarSet, spn.VarSet)

SplitHalf assumes O is an image with dimensions (w, h). It then splits O in half according to the given CmplType. The return value of SplitHalf is then the two spn.VarSet partitions.

func VarSetToPBM

func VarSetToPBM(filename string, state spn.VarSet, w, h int)

VarSetToPBM takes a state and draws according to the SPN that generated the instantiation.

func VarSetToPGM

func VarSetToPGM(filename string, state spn.VarSet, w, h, max int)

VarSetToPGM takes a state and draws according to the SPN that generated the instantiation.

func VarSetToPPM

func VarSetToPPM(filename string, state spn.VarSet, w, h, max int)

VarSetToPPM takes a state and draws according to the SPN that generated the instantiation.

Types

type BFSPair

type BFSPair struct {
	Spn    spn.SPN
	Pname  string
	Weight float64
}

BFSPair (Breadth-First Search Pair) is a tuple (SPN, string).

type CmplType

type CmplType string

CmplType indicates which type of image completion are we referring to.

const (
	// Top image completion.
	Top CmplType = "TOP"
	// Bottom image completion.
	Bottom CmplType = "BOTTOM"
	// Left image completion.
	Left CmplType = "LEFT"
	// Right image completion.
	Right CmplType = "RIGHT"
)

type NpyReader

type NpyReader struct {
	// contains filtered or unexported fields
}

NpyReader is a .npy reader. GoSPN supports only integer data for now.

func NewNpyReader

func NewNpyReader(fname string) (*NpyReader, error)

NewNpyReader creates a new *NpyReader from .npy file fname.

func (*NpyReader) Close

func (r *NpyReader) Close()

Close closes this reader's stream.

func (*NpyReader) Read

func (r *NpyReader) Read(n int) ([]map[int]int, []int, error)

Read reads n instances from file and returns a dataset and label slice.

func (*NpyReader) ReadAll

func (r *NpyReader) ReadAll() ([]map[int]int, []int, error)

ReadAll reads all instances from file and returns a dataset and label slice.

func (*NpyReader) ReadBalanced

func (r *NpyReader) ReadBalanced(n int, c int) ([]map[int]int, []int, error)

ReadBalanced returns a balanced dataset and label slice totalling n instances. As argument, it takes the number of classes c.

func (*NpyReader) Reset

func (r *NpyReader) Reset() error

Reset resets the file pointer so it points to the beginning of data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL