Documentation ¶
Overview ¶
Package io is an Input/Output package. GoSPN reads and writes from .data files. To run GoSPN we must first convert a dataset into a data file. For now, GoSPN supports converting PGM and PBM image files into data files.
Converting PGM files is done by io/pgm.go, whilst PBM files are handled by io/pbm.go. Function names are (supposed to be) intuitive: the input format (e.g. PGM) followed by a suffix to indicate whether it is a folder or not (e.g. F) to the output format data (e.g. PGMFToData). The Buffered variant is for big datasets. Instead of saving every file stream in memory, we concurrently run each stream according to the number of CPUs in the user's machine.
We differentiate Data from Evidence. Data is supposed to contain the classification labels, that is, data is the training set. Evidence removes the instance's labels and acts as test set.
For output we follow the same format as input. VarSetToPGM, for instance, takes a variable instantiation set and converts it into a PGM image. This is useful for image completion.
Other output functions include DrawGraphTools and DrawGraph. DrawGraphTools draws the given SPN into a graph-tool python script. This script can be run just like any pythons script. After doing so, a new image of the SPN will be generated. Note that this requires the graph-tool library (https://graph-tool.skewed.de/). DrawGraph uses Graphviz to draw the graph. You can then run the resulting dot script with sfdp, neato or any other layout program. This requires the graphviz library (http://www.graphviz.org/).
WriteToFile writes the SPN to a file. TODO: ReadFromFile should read the SPN from a .mdl file.
Index ¶
- Variables
- func ARFFToData(dirname, fname, dname string)
- func BufferedPGMFToData(dirname, dname string) (int, int, int)
- func DownloadFromURL(u, p string, override bool) error
- func DrawGraph(filename string, s spn.SPN)
- func DrawGraphTools(filename string, s spn.SPN)
- func GetDataPath(dataset string) string
- func GetPath(relpath string) string
- func ImgCmplToPGM(filename string, orig, cmpl spn.VarSet, typ CmplType, w, h, max int)
- func ImgCmplToPPM(filename string, orig, cmpl spn.VarSet, typ CmplType, w, h int)
- func LoadSPN(filename string) (spn.SPN, error)
- func PBMFToData(dirname, dname string)
- func PBMFToEvidence(dirname, dname string)
- func PBMToData(dirname, dname string, class int)
- func PGMFToData(dirname, dname string) (int, int, int)
- func PGMFToEvidence(dirname, dname string) (int, int, int)
- func ParseArff(filename string) (name string, sc map[int]*learn.Variable, vals []map[int]int, ...)
- func ParseData(filename string) (map[int]*learn.Variable, []map[int]int)
- func ParseDataNL(filename string) (map[int]*learn.Variable, []map[int]int, []int)
- func ParseEvidence(filename string) (map[int]*learn.Variable, []map[int]int, []int)
- func ParsePartitionedData(filename string, p float64, rseed int64) (map[int]*learn.Variable, []map[int]int, []map[int]int, []int)
- func ReadFromFile(filename string) spn.SPN
- func SaveSPN(filename string, S spn.SPN) error
- func SplitHalf(O spn.VarSet, t CmplType, w, h int) (spn.VarSet, spn.VarSet)
- func VarSetToPBM(filename string, state spn.VarSet, w, h int)
- func VarSetToPGM(filename string, state spn.VarSet, w, h, max int)
- func VarSetToPPM(filename string, state spn.VarSet, w, h, max int)
- type BFSPair
- type CmplType
- type NpyReader
Constants ¶
This section is empty.
Variables ¶
var ( ErrNonIntegerType = errors.New("gospn: npy data type is non integer.") ErrNonDatasetShape = errors.New("gospn: npy data does not have dimension two.") )
Orientations contains all CmplType orientations.
Functions ¶
func ARFFToData ¶
func ARFFToData(dirname, fname, dname string)
ARFFToData. Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.
func BufferedPGMFToData ¶
BufferedPGMFToData parses large quantities of files concurrently into a data file dname.
func DownloadFromURL ¶
DownloadFromURL takes an URL u and a destination path p, downloading the contents of u to p. If p is not a complete path (not a directory, contains extension), then the name of the file to be downloaded is copied as the new file's name. If override is set to true and p points to a file, this function overrides file p with the new download. Having said that, take extreme care when using override!
func DrawGraphTools ¶
DrawGraphTools creates a file filename and draws an SPN spn in graph-tools. The resulting file is a python source code that outputs a PNG image of the graph.
func GetDataPath ¶
func ImgCmplToPGM ¶
ImgCmplToPGM creates a new file distinguishing the original part of the image from the completion done by the SPN and indicated by typ.
func ImgCmplToPPM ¶
ImgCmplToPPM creates a new file distinguishing the original part of the image from the completion done by the SPN and indicated by typ.
func PBMFToData ¶
func PBMFToData(dirname, dname string)
PBMFToData (PBM Folder to Data file). Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.
func PBMFToEvidence ¶
func PBMFToEvidence(dirname, dname string)
PBMFToEvidence (PBM file to evidence).
func PBMToData ¶
PBMToData (PBM to Data file). If class is true, it's a classifying problem and will label as class.
func PGMFToData ¶
PGMFToData (PGM Folder to Data file). Each class is in a subfolder of dirname. dname is the output file. Arg dirname must be an absolute path. Arg dname must be the filename only.
func PGMFToEvidence ¶
PGMFToEvidence (PGM file to evidence).
func ParseArff ¶
func ParseArff(filename string) (name string, sc map[int]*learn.Variable, vals []map[int]int, labels map[int]map[string]int)
ParseArff takes an ARFF dataset file and returns three structures.
The first is a map that maps VARID -> learn.Variable, containing the internal information necessary for learning. The second is a slice of maps that correspond to the instances of the dataset. Each element in this slice is a map representing this instance. This map is a function VARID -> Value of the variable represented by VARID. The third is a map containing the names/labels of variables when they are of type class or string. It is a function VAR_CLASSID -> string, where the string is the actual label.
As an example, consider the ARFF dataset below:
% Example dataset sampling a modified rain/slippery road scenario as seen on Adnan Darwiche's % Modeling and Reasoning with Bayesian Networks (Section 4.3). % We modified variable Winter, changing it to Season and made it into a numeric (yet % categorical) variable just to showcase how we deal with numeric variables. @RELATION weather % GoSPN doesn't (yet) support continuous variables. It does accept discrete values sent as % numeric type. In this case we assume a variable season that is discrete and has 4 possible % values: 0, 1, 2, 3 with 0-3 being numeric representations for spring-winter. @ATTRIBUTE season NUMERIC % We can also use the numeric type as boolean. @ATTRIBUTE sprinkler numeric % Or just use class. In the case class is used, ParseArff returns the labels describing the % valuations in the instances. @ATTRIBUTE rain {true,false} % We can also use string. Just like class, labels are returned separately. @ATTRIBUTE wet_grass string @ATTRIBUTE slippery STRING @data 0,0,true,true,false 0,1,false,false,true 1,0,false,false,false 1,1,false,true,false 1,0,true,false,true 2,0,true,true,true 2,0,false,false,true 3,0,true,false,false 3,1,false,true,false
For numeric variables, we take the highest value in the dataset and set this value as the categorical upper bound of the variable.
func ParseData ¶
ParseData reads from a file named filename and returns the scope and data map of the parsed data file.
func ParseDataNL ¶
ParseDataNL reads from a file named filename and returns the scope and data map of the parsed data file. This version doesn't add labels as variables, but return them separately as a slice.
func ParseEvidence ¶
ParseEvidence takes an evidence file that contains the instantiations of a subset of variables as evidence to be computed during inference. It may contain multiple instantiations.
Returns a slice of maps, with each key corresponding to a variable ID and each associated value as the valuation of such variable; and the scope.
func ParsePartitionedData ¶
func ParsePartitionedData(filename string, p float64, rseed int64) (map[int]*learn.Variable, []map[int]int, []map[int]int, []int)
ParsePartitionedData reads a data file and, with p probability, chooses ((1-p)*100)% of the data file to be used as evidence file. For instance, p=0.7 will create a map[int]*learn.Variable, which contains the data variables, and two []map[int]int. The first []map[int]int returned is the training data, which composes 70% of the data file. The second map will return the evidence table with the remaining 30% data file. This partitioning is defined by the pseudo-random seed rseed. If rseed < 0, then use the default pseudo-random seed. It also returns the labels of each test line.
Note: since this function "breaks" the order of classification, it returns a separate label containing the actual classification of each instantiation.
func ReadFromFile ¶
ReadFromFile reads an SPN from an spn mdl file.
func SplitHalf ¶
SplitHalf assumes O is an image with dimensions (w, h). It then splits O in half according to the given CmplType. The return value of SplitHalf is then the two spn.VarSet partitions.
func VarSetToPBM ¶
VarSetToPBM takes a state and draws according to the SPN that generated the instantiation.
func VarSetToPGM ¶
VarSetToPGM takes a state and draws according to the SPN that generated the instantiation.
Types ¶
type CmplType ¶
type CmplType string
CmplType indicates which type of image completion are we referring to.
type NpyReader ¶
type NpyReader struct {
// contains filtered or unexported fields
}
NpyReader is a .npy reader. GoSPN supports only integer data for now.
func NewNpyReader ¶
NewNpyReader creates a new *NpyReader from .npy file fname.
func (*NpyReader) ReadAll ¶
ReadAll reads all instances from file and returns a dataset and label slice.
func (*NpyReader) ReadBalanced ¶
ReadBalanced returns a balanced dataset and label slice totalling n instances. As argument, it takes the number of classes c.