Documentation ¶
Index ¶
- func CreateMetadataIndex(client *elastic.Client, index string, overwrite bool) error
- func DatasetMatches(m *model.Metadata, variables []string) bool
- func IngestMetadata(client *elastic.Client, index string, datasetPrefix string, ...) error
- func IsMetadataVariable(v *model.Variable) bool
- func LoadDatasetStats(m *model.Metadata, datasetPath string) error
- func LoadImportance(m *model.Metadata, importanceFile string) error
- func LoadMetadataFromClassification(schemaPath string, classificationPath string, normalizeVariableNames bool) (*model.Metadata, error)
- func LoadMetadataFromMergedSchema(schemaPath string) (*model.Metadata, error)
- func LoadMetadataFromOriginalSchema(schemaPath string) (*model.Metadata, error)
- func LoadMetadataFromRawFile(datasetPath string, classificationPath string) (*model.Metadata, error)
- func LoadSummary(m *model.Metadata, summaryFile string, useCache bool) error
- func LoadSummaryFromDescription(m *model.Metadata, summaryFile string) error
- func LoadSummaryMachine(m *model.Metadata, summaryFile string) error
- func SetTypeProbabilityThreshold(threshold float64)
- func VerifyAndUpdate(m *model.Metadata, dataPath string) error
- func WriteMergedSchema(m *model.Metadata, path string, mergedDataResource *model.DataResource) error
- func WriteSchema(m *model.Metadata, path string) error
- type DataResourceParser
- type DatasetSource
- type Media
- type Raw
- type Table
- type Timeseries
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CreateMetadataIndex ¶
CreateMetadataIndex creates a new ElasticSearch index with our target mappings. An ngram analyze is defined and applied to the variable names to allow for substring searching.
func DatasetMatches ¶
DatasetMatches determines if the metadata variables match.
func IngestMetadata ¶
func IngestMetadata(client *elastic.Client, index string, datasetPrefix string, datasetSource DatasetSource, meta *model.Metadata) error
IngestMetadata adds a document consisting of the metadata to the provided index.
func IsMetadataVariable ¶
IsMetadataVariable indicates whether or not a variable is additional metadata added to the source.
func LoadDatasetStats ¶
LoadDatasetStats loads the dataset and computes various stats.
func LoadImportance ¶
LoadImportance wiull load the importance feature selection metric.
func LoadMetadataFromClassification ¶
func LoadMetadataFromClassification(schemaPath string, classificationPath string, normalizeVariableNames bool) (*model.Metadata, error)
LoadMetadataFromClassification loads metadata from a merged schema and classification file.
func LoadMetadataFromMergedSchema ¶
LoadMetadataFromMergedSchema loads metadata from a merged schema file.
func LoadMetadataFromOriginalSchema ¶
LoadMetadataFromOriginalSchema loads metadata from a schema file.
func LoadMetadataFromRawFile ¶
func LoadMetadataFromRawFile(datasetPath string, classificationPath string) (*model.Metadata, error)
LoadMetadataFromRawFile loads metadata from a raw file and a classification file.
func LoadSummary ¶
LoadSummary loads a description summary
func LoadSummaryFromDescription ¶
LoadSummaryFromDescription loads a summary from the description.
func LoadSummaryMachine ¶
LoadSummaryMachine loads a machine-learned summary.
func SetTypeProbabilityThreshold ¶
func SetTypeProbabilityThreshold(threshold float64)
SetTypeProbabilityThreshold below which a suggested type is not used as variable type
func VerifyAndUpdate ¶
VerifyAndUpdate will update the metadata when inconsistentices or errors are found.
func WriteMergedSchema ¶
func WriteMergedSchema(m *model.Metadata, path string, mergedDataResource *model.DataResource) error
WriteMergedSchema exports the current meta data as a merged schema file.
Types ¶
type DataResourceParser ¶
type DataResourceParser interface {
Parse(res *gabs.Container) (*model.DataResource, error)
}
DataResourceParser is a parser for a data resource in the schema document.
type DatasetSource ¶
type DatasetSource string
DatasetSource flags the type of ingest action that created a dataset
const ( // ProvenanceSimon identifies the type provenance as Simon ProvenanceSimon = "d3m.primitives.distil.simon" // ProvenanceSchema identifies the type provenance as schema ProvenanceSchema = "schema" // Seed flags a dataset as ingested from seed data Seed DatasetSource = "seed" // Contrib flags a dataset as being ingested from contributed data Contrib DatasetSource = "contrib" // Augmented flags a dataset as being ingested from augmented data Augmented DatasetSource = "augmented" )
type Media ¶
type Media struct {
Type string
}
Media is a data resource that is backed by media files.
type Raw ¶
type Raw struct {
// contains filtered or unexported fields
}
Raw is a data resource that is contained within one file which does not have fields specified in the schema.
type Table ¶
type Table struct { }
Table is a data respurce that is contained within one or many tabular files.
type Timeseries ¶
type Timeseries struct { }
Timeseries is a data resource that is contained within one or many timeseries files.
func (*Timeseries) Parse ¶
func (r *Timeseries) Parse(res *gabs.Container) (*model.DataResource, error)
Parse extracts the data resource from the data schema document.