Documentation ¶
Overview ¶
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset provides a common approach for storing JSON object documents on local disc. It is intended as a single user system for intermediate processing of JSON content for analysis or batch processing. It is not a database management system (if you need a JSON database system I would suggest looking at Couchdb, Mongo and Redis as a starting point).
The approach dataset takes is to store JSON documents in a pairtree structure under the collection folder. The keys are the JSON document names. JSON documents (and possibly their attachments) are then stored based on that assignment in the pairtree. Conversely the collection.json document is used to find and retrieve documents from the collection. The layout of the metadata is as follows
+ Collection - a directory
- Collection/collection.json - metadata for retrieval
- Collection/[Pairtree] - holds individual JSON docs and attachments
A key feature of dataset is to be Posix shell friendly. This has lead to storing the JSON documents in a directory structure that standard Posix tooling can traverse. It has also mean that the JSON documents themselves remain on "disc" as plain text. This has facilitated integration with many other applications, programming langauages and systems.
Attachments are non-JSON documents explicitly "attached" that share the same pairtree path but are placed in a sub directory called "_". If the document name is "Jane.Doe.json" and the attachment is photo.jpg the JSON document is "pairtree/Ja/ne/.D/e./Jane.Doe.json" and the photo is in "pairtree/Ja/ne/.D/e./_/photo.jpg".
Additional operations beside storing and reading JSON documents are also supported. These include creating lists (arrays) of JSON documents from a list of keys, listing keys in the collection, counting documents in the collection, indexing and searching by indexes.
The primary use case driving the development of dataset is harvesting API content for library systems (e.g. EPrints, Invenio, ArchivesSpace, ORCID, CrossRef, OCLC). The harvesting needed to be done in such a way as to leverage existing Posix tooling (e.g. grep, sed, etc) for processing and analysis.
Initial use case:
Caltech Library has many repository, catelog and record management systems (e.g. EPrints, Invenion, ArchivesSpace, Islandora, Invenio). It is common practice to harvest data from these systems for analysis or processing. Harvested records typically come in XML or JSON format. JSON has proven a flexibly way for working with the data and in our more modern tools the common format we use to move data around. We needed a way to standardize how we stored these JSON records for intermediate processing to allow us to use the growing ecosystem of JSON related tooling available under Posix/Unix compatible systems.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package dataset includes the operations needed for processing collections of JSON documents and their attachments.
Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index ¶
- Constants
- func Check(cName string, verbose bool) error
- func Close(cName string) error
- func CloseAll() error
- func Collections() []string
- func CreateJSON(cName string, key string, src []byte) error
- func DecodeJSON(src []byte, obj *map[string]interface{}) error
- func DeleteJSON(cName string, key string) error
- func EncodeJSON(obj map[string]interface{}) ([]byte, error)
- func FrameClear(cName string, fName string) error
- func FrameDelete(cName string, fName string) error
- func FrameExists(cName string, fName string) bool
- func FrameKeys(cName string, fName string) []string
- func FrameObjects(cName string, fName string) ([]map[string]interface{}, error)
- func FrameReframe(cName string, fName string, keys []string, verbose bool) error
- func FrameRefresh(cName string, fName string, verbose bool) error
- func Frames(cName string) []string
- func GetContact(cName string) string
- func GetVersion(cName string) string
- func GetWhat(cName string) string
- func GetWhen(cName string) string
- func GetWhere(cName string) string
- func GetWho(cName string) string
- func IsCollection(p string) bool
- func IsOpen(cName string) bool
- func KeyExists(cName string, key string) bool
- func Keys(cName string) []string
- func Open(cName string) error
- func ReadJSON(cName string, key string) ([]byte, error)
- func Repair(cName string, verbose bool) error
- func SetContact(cName string, contact string) error
- func SetVersion(cName string, version string) error
- func SetWhat(cName string, what string) error
- func SetWhen(cName string, when string) error
- func SetWhere(cName string, where string) error
- func SetWho(cName string, names string) error
- func UpdateJSON(cName string, key string, src []byte) error
- type Attachment
- type CMap
- type Collection
- func (c *Collection) AttachFile(keyName, semver string, fullName string) error
- func (c *Collection) AttachFiles(keyName string, semver string, fileNames ...string) error
- func (c *Collection) AttachStream(keyName, semver, fullName string, buf io.Reader) error
- func (c *Collection) Attachments(keyName string) ([]string, error)
- func (c *Collection) Clone(cloneName string, keys []string, verbose bool) error
- func (c *Collection) CloneSample(trainingCollectionName string, testCollectionName string, keys []string, ...) error
- func (c *Collection) Close() error
- func (c *Collection) Create(name string, data map[string]interface{}) error
- func (c *Collection) CreateJSON(key string, src []byte) error
- func (c *Collection) CreateObjectsJSON(keyList []string, src []byte) error
- func (c *Collection) Delete(name string) error
- func (c *Collection) DocPath(name string) (string, error)
- func (c *Collection) ExportCSV(fp io.Writer, eout io.Writer, f *DataFrame, verboseLog bool) (int, error)
- func (c *Collection) ExportTable(eout io.Writer, f *DataFrame, verboseLog bool) (int, [][]interface{}, error)
- func (c *Collection) FrameClear(name string) error
- func (c *Collection) FrameCreate(name string, keys []string, dotPaths []string, labels []string, verbose bool) (*DataFrame, error)
- func (c *Collection) FrameDelete(name string) error
- func (c *Collection) FrameExists(name string) bool
- func (c *Collection) FrameObjects(fName string) ([]map[string]interface{}, error)
- func (c *Collection) FrameRead(name string) (*DataFrame, error)
- func (c *Collection) FrameReframe(name string, keys []string, verbose bool) error
- func (c *Collection) FrameRefresh(name string, verbose bool) error
- func (c *Collection) Frames() []string
- func (c *Collection) GetAttachedFiles(keyName string, semver string, filterNames ...string) error
- func (c *Collection) ImportCSV(buf io.Reader, idCol int, skipHeaderRow bool, overwrite bool, verboseLog bool) (int, error)
- func (c *Collection) ImportTable(table [][]interface{}, idCol int, useHeaderRow bool, ...) (int, error)
- func (c *Collection) IsKeyNotFound(e error) bool
- func (c *Collection) Join(key string, obj map[string]interface{}, overwrite bool) error
- func (c *Collection) KeyExists(key string) bool
- func (c *Collection) KeySortByExpression(keys []string, expr string) ([]string, error)
- func (c *Collection) Keys() []string
- func (c *Collection) Length() int
- func (c *Collection) MergeFromTable(frameName string, table [][]interface{}, overwrite bool, verbose bool) error
- func (c *Collection) MergeIntoTable(frameName string, table [][]interface{}, overwrite bool, verbose bool) ([][]interface{}, error)
- func (c *Collection) ObjectList(keys []string, dotPaths []string, labels []string, verbose bool) ([]map[string]interface{}, error)
- func (c *Collection) Prune(keyName string, semver string, filterNames ...string) error
- func (c *Collection) Read(name string, data map[string]interface{}, cleanObject bool) error
- func (c *Collection) ReadJSON(name string) ([]byte, error)
- func (c *Collection) SaveFrame(name string, f *DataFrame) error
- func (c *Collection) Update(name string, data map[string]interface{}) error
- func (c *Collection) UpdateJSON(name string, src []byte) error
- type DataFrame
- type Err
- type KeyValue
- type KeyValues
- type Semver
Constants ¶
const ( // Asc is used to identify ascending sorts Asc = iota // Desc is used to identify descending sorts Desc = iota )
const ( // License is a formatted from for dataset package based command line tools License = `` /* 1530-byte string literal not displayed */ )
const Version = "1.0.1"
Variables ¶
This section is empty.
Functions ¶
func Check ¶ added in v0.1.0
Check checks a dataset collection and reports error to console. NOTE: Collection objects are locked during check!
func Close ¶ added in v0.1.0
Close closes a dataset collections previously opened by CMapOpen(). It will also set the internal cMap variable to nil if there are no remaining collections.
func CloseAll ¶ added in v0.1.0
func CloseAll() error
CloseAll goes through the service collection list and closes each one.
func Collections ¶ added in v0.1.0
func Collections() []string
Collections returns a list of collections previously opened with CMapOpen()
func CreateJSON ¶ added in v0.1.0
CreateJSON takes a collection name, key and JSON object document and creates a new JSON object in the collection using the key.
func DecodeJSON ¶ added in v0.1.0
DecodeJSON provides a common method for decoding data for use in Dataset.
func DeleteJSON ¶ added in v0.1.0
DeleteJSON takes a collection name and key and removes and JSON object from the collection.
func EncodeJSON ¶ added in v0.1.0
EncodeJSON provides a common method for encoding data for use in Dataset.
func FrameClear ¶ added in v0.1.0
FrameClear clears the object and key list from a frame
func FrameDelete ¶ added in v0.1.0
FrameDelete deletes a frame from a service collection
func FrameExists ¶ added in v0.1.0
FrameExists returns true if frame found in service collection, otherwise false
func FrameObjects ¶ added in v0.1.0
FrameObjects returns a JSON document of a copy of the objects in a frame for the service collection. It is analogous to a dataset.ReadJSON but for a frame's object list
func FrameReframe ¶ added in v0.1.0
FrameReframe updates the frame object list. If a list of keys is provided then the object will be replaced with updated objects based on the keys provided.
func FrameRefresh ¶ added in v0.1.0
FrameRefresh updates the frame object list's for the keys provided. Any new keys
cause a new object to be appended to the end of the list.
func GetContact ¶ added in v0.1.0
GetContact gets the contact info for the collection.
func GetVersion ¶ added in v0.1.0
GetVersion gets the version info for the collection.
func IsCollection ¶ added in v0.0.45
IsCollection checks to see if a given path contains a collection.json file
func KeyExists ¶ added in v0.1.0
KeyExists returns true if the key exists in the collection or false otherwise
func Open ¶
Open opens a dataset collection for use in a service like context. CMap collections remain "open" until explicitly closed or closed via CloseAll(). Writes to the collections are run through a mutex to prevent collisions. Subsequent CMapOpen() will open additional collections under the the service.
func ReadJSON ¶ added in v0.1.0
ReadJSON takes a collection name, key and returns a JSON object document.
func Repair ¶ added in v0.0.3
Repair repairs a collection NOTE: Collection objects are locked during repair!
func SetContact ¶ added in v0.1.0
SetContact sets the metadata value for the collection's version.
func SetVersion ¶ added in v0.1.0
SetVersion sets the metadata value for the collection's version.
Types ¶
type Attachment ¶
type Attachment struct { // Name is the filename and path to be used inside the generated tar file Name string `json:"name"` // Size remains to to help us migrate pre v0.0.61 collections. // It should reflect the last size added. Size int64 `json:"size"` // Sizes is the sizes associated with the version being attached Sizes map[string]int64 `json:"sizes"` // Current holds the semver to the last added version Version string `json:"version"` // Checksum, current implemented as a MD5 checksum for now // You should have one checksum per attached version. Checksums map[string]string `json:"checksums"` // HRef points at last attached version of the attached document, e.g. v0.0.0/photo.png // If you moved an object out of the pairtree it should be a URL. HRef string `json:"href"` // VersionHRefs is a map to all versions of the attached document // { // "v0.0.0": "... /photo.png", // "v0.0.1": "... /photo.png", // "v0.0.2": "... /photo.png" // } VersionHRefs map[string]string `json:"version_hrefs"` // Created a date string in RTC3339 format Created string `json:"created"` // Modified a date string in RFC3339 format Modified string `json:"modified"` // Metadata is a map for application specific metadata about attachments. Metadata map[string]interface{} `json:"metadata,omitempty"` }
Attachment is a structure for holding non-JSON content metadata you wish to store alongside a JSON document in a collection
type CMap ¶ added in v0.1.0
type CMap struct {
// contains filtered or unexported fields
}
CMap holds a map of collection names to *Collection
type Collection ¶
type Collection struct { // DatasetVersion of the collection DatasetVersion string `json:"dataset_version"` // Name (filename) of collection Name string `json:"name"` // KeyMap holds the document key to path in the collection KeyMap map[string]string `json:"keymap"` // FrameMap is a list of frame names and with rel path to the frame defined in the collection FrameMap map[string]string `json:"frames"` // Created is the date/time the init command was run in // RFC1123 format. Created string `json:"created,omitempty"` // Version of collection being stored in semvar notation Version string `json:"version,omitempty"` // Contact info Contact string `json:"contact,omitempty"` // CodeMeta is a relative path or URL to a Code Meta // JSON document for the collection. Often it'll be // in the collection's root and have the value "codemeta.json" // but also may be stored someplace else. It should be // an empty string if the codemeta.json file has not been // created. CodeMeta string `json:"codemeta,omitempty"` // Who is the person(s)/organization(s) that created the collection Who []string `json:"who,omitempty"` // What - description of collection What string `json:"what,omitempty"` // When - date associated with collection (e.g. 2021, // 2021-10, 2021-10-02), should map to an approx date like in // archival work. When string `json:"when,omitempty"` // Where - location (e.g. URL, address) of collection Where string `json:"where,omitempty"` // contains filtered or unexported fields }
Collection is the container holding a pairtree containing JSON docs
func GetCollection ¶ added in v0.1.0
func GetCollection(cName string) (*Collection, error)
GetCollection takes a collection name, opens it if necessary and returns a handle to the CMapCollection struct and error value.
func InitCollection ¶ added in v0.0.8
func InitCollection(name string) (*Collection, error)
InitCollection - creates a new collection.
func (*Collection) AttachFile ¶ added in v0.0.33
func (c *Collection) AttachFile(keyName, semver string, fullName string) error
AttachFile is for attaching a single non-JSON document to a dataset record. It will replace ANY existing attached content with the same semver and basename.
func (*Collection) AttachFiles ¶
func (c *Collection) AttachFiles(keyName string, semver string, fileNames ...string) error
AttachFiles attaches non-JSON documents to a JSON document in the collection. Attachments are stored in a tar file, if tar file exits then attachment(s) are appended to tar file.
func (*Collection) AttachStream ¶ added in v0.0.63
func (c *Collection) AttachStream(keyName, semver, fullName string, buf io.Reader) error
AttachStream is for attaching open a non-JSON file buffer (via an io.Reader).
func (*Collection) Attachments ¶
func (c *Collection) Attachments(keyName string) ([]string, error)
Attachments returns a list of files and size attached for a key name in the collection
func (*Collection) Clone ¶ added in v0.0.39
func (c *Collection) Clone(cloneName string, keys []string, verbose bool) error
Clone copies the current collection records into a newly initialized collection given a list of keys and new collection name. Returns an error value if there is a problem. Clone does NOT copy attachments, only the JSON records.
func (*Collection) CloneSample ¶ added in v0.0.39
func (c *Collection) CloneSample(trainingCollectionName string, testCollectionName string, keys []string, sampleSize int, verbose bool) error
CloneSample takes the current collection, a sample size, a training collection name and a test collection name. The training collection will be created and receive a random sample of the records from the current collection based on the sample size provided. Sample size must be greater than zero and less than the total number of records in the current collection.
If the test collection name is not an empty string it will be created and any records not in the training collection will be cloned from the current collection into the test collection.
func (*Collection) Close ¶
func (c *Collection) Close() error
Close closes a collection, writing the updated keys to disc
func (*Collection) Create ¶
func (c *Collection) Create(name string, data map[string]interface{}) error
Create a JSON doc from an map[string]interface{} and adds it to a collection, if problem returns an error name must be unique. Document must be an JSON object (not an array).
func (*Collection) CreateJSON ¶ added in v0.0.33
func (c *Collection) CreateJSON(key string, src []byte) error
CreateJSON adds a JSON doc to a collection, if a problem occurs it returns an error
func (*Collection) CreateObjectsJSON ¶ added in v0.0.70
func (c *Collection) CreateObjectsJSON(keyList []string, src []byte) error
CreateObjectsJSON takes a list of keys and creates a default object for each key as quickly as possible. NOTE: if object already exist creation is skipped without reporting an error.
func (*Collection) Delete ¶
func (c *Collection) Delete(name string) error
Delete removes a JSON doc from a collection
func (*Collection) DocPath ¶
func (c *Collection) DocPath(name string) (string, error)
DocPath returns a full path to a key or an error if not found
func (*Collection) ExportCSV ¶ added in v0.0.3
func (c *Collection) ExportCSV(fp io.Writer, eout io.Writer, f *DataFrame, verboseLog bool) (int, error)
ExportCSV takes a reader and frame and iterates over the objects generating rows and exports then as a CSV file
func (*Collection) ExportTable ¶ added in v0.0.47
func (c *Collection) ExportTable(eout io.Writer, f *DataFrame, verboseLog bool) (int, [][]interface{}, error)
ExportTable takes a reader and frame and iterates over the objects generating rows and exports then as a CSV file
func (*Collection) FrameClear ¶ added in v0.1.0
func (c *Collection) FrameClear(name string) error
FrameClear empties the frame's object and key lists but leaves in place the Frame definition. Use Reframe() to re-populate a frame based on a new key list.
func (*Collection) FrameCreate ¶ added in v0.1.0
func (c *Collection) FrameCreate(name string, keys []string, dotPaths []string, labels []string, verbose bool) (*DataFrame, error)
FrameCreate takes a set of collection keys, dot paths and labels builds an ObjectList and assembles additional metadata returning a new Frame associated with the collection as well as an error value. If there is a mis-match in number of labels and dot paths an an error will be returned. If the frame already exists an error will be returned.
Conceptually a frame is an ordered list of objects. Frames are associated with a collection and the objects in a frame can easily be refreshed. Frames also serve as the basis for indexing a dataset collection and provide the data paths (expressed as a list of "dot paths"), labels (aka attribute names), and type information needed for indexing and search.
If you need to update a frame's objects use FrameRefresh(). If you need to change a frame's objects or ordering use FrameReframe().
func (*Collection) FrameDelete ¶ added in v0.1.0
func (c *Collection) FrameDelete(name string) error
FrameDelete removes a frame from a collection, returns an error if frame can't be deleted.
func (*Collection) FrameExists ¶ added in v0.1.0
func (c *Collection) FrameExists(name string) bool
FrameExists checkes to see if a frame is already defined. Returns true if it exists otherwise false
func (*Collection) FrameObjects ¶ added in v0.1.0
func (c *Collection) FrameObjects(fName string) ([]map[string]interface{}, error)
FrameObjects returns a copy of a DataFrame's object list given a collection's frame name.
func (*Collection) FrameRead ¶ added in v0.1.0
func (c *Collection) FrameRead(name string) (*DataFrame, error)
FrameRead retrieves a frame from a collection. Returns the DataFrame and an error value
func (*Collection) FrameReframe ¶ added in v0.1.0
func (c *Collection) FrameReframe(name string, keys []string, verbose bool) error
FrameReframe **replaces** a frame's object list based on the keys provided.
func (*Collection) FrameRefresh ¶ added in v0.1.0
func (c *Collection) FrameRefresh(name string, verbose bool) error
FrameRefresh updates a DataFrames' object list based on the existing keys in the frame. It doesn't change the order of objects. NOTE: If an object is missing in the collection it gets pruned from the object list.
func (*Collection) Frames ¶ added in v0.0.41
func (c *Collection) Frames() []string
Frames retrieves a list of available frames associated with a collection
func (*Collection) GetAttachedFiles ¶
func (c *Collection) GetAttachedFiles(keyName string, semver string, filterNames ...string) error
GetAttachedFiles returns an error if encountered, a side effect is the file(s) are written to the current work directory If no filterNames provided then return all attachments are written out An error value is always returned.
func (*Collection) ImportCSV ¶ added in v0.0.3
func (c *Collection) ImportCSV(buf io.Reader, idCol int, skipHeaderRow bool, overwrite bool, verboseLog bool) (int, error)
ImportCSV takes a reader and iterates over the rows and imports them as a JSON records into dataset. BUG: returns lines processed should probably return number of rows imported
func (*Collection) ImportTable ¶ added in v0.0.4
func (c *Collection) ImportTable(table [][]interface{}, idCol int, useHeaderRow bool, overwrite, verboseLog bool) (int, error)
ImportTable takes a [][]interface{} and iterates over the rows and imports them as a JSON records into dataset.
func (*Collection) IsKeyNotFound ¶ added in v0.0.69
func (c *Collection) IsKeyNotFound(e error) bool
IsKeyNotFound checks an error message and returns true if it is a key not found error.
func (*Collection) Join ¶ added in v0.0.47
func (c *Collection) Join(key string, obj map[string]interface{}, overwrite bool) error
Join takes a key, a map[string]interface{}{} and overwrite bool and merges the map with an existing JSON object in the collection. BUG: This is a naive join, it assumes the keys in object are top level properties.
func (*Collection) KeyExists ¶ added in v0.1.0
func (c *Collection) KeyExists(key string) bool
KeyExists returns true if key is in collection's KeyMap, false otherwise
func (*Collection) KeySortByExpression ¶ added in v0.0.33
func (c *Collection) KeySortByExpression(keys []string, expr string) ([]string, error)
KeySortByExpression takes a array of keys and a sort expression and turns a sorted list of keys.
func (*Collection) Keys ¶
func (c *Collection) Keys() []string
Keys returns a list of keys in a collection
func (*Collection) Length ¶ added in v0.0.6
func (c *Collection) Length() int
Length returns the number of keys in a collection
func (*Collection) MergeFromTable ¶ added in v0.0.47
func (c *Collection) MergeFromTable(frameName string, table [][]interface{}, overwrite bool, verbose bool) error
MergeFromTable - uses a DataFrame associated in the collection to map columns from a table into JSON object attributes saving the JSON object in the collection. If overwrite is true then JSON objects for matching keys will be updated, if false only new objects will be added to collection. Returns an error value
func (*Collection) MergeIntoTable ¶ added in v0.0.47
func (c *Collection) MergeIntoTable(frameName string, table [][]interface{}, overwrite bool, verbose bool) ([][]interface{}, error)
MergeIntoTable - uses a DataFrame associated in the collection to map attributes into table appending new content and optionally overwriting existing content for rows with matching ids. Returns a new table (i.e. [][]interface{}) or error.
func (*Collection) ObjectList ¶ added in v0.0.61
func (c *Collection) ObjectList(keys []string, dotPaths []string, labels []string, verbose bool) ([]map[string]interface{}, error)
ObjectList (on a collection) takes a set of collection keys and builds an ordered array of objects from the array of keys, dot paths and labels provided.
func (*Collection) Prune ¶ added in v0.0.33
func (c *Collection) Prune(keyName string, semver string, filterNames ...string) error
Prune a non-JSON document from a JSON document in the collection.
func (*Collection) Read ¶
func (c *Collection) Read(name string, data map[string]interface{}, cleanObject bool) error
Read finds the record in a collection, updates the data interface provide and if problem returns an error name must exist or an error is returned
func (*Collection) ReadJSON ¶ added in v0.0.33
func (c *Collection) ReadJSON(name string) ([]byte, error)
ReadJSON finds a the record in the collection and returns the JSON source
func (*Collection) SaveFrame ¶ added in v0.0.47
func (c *Collection) SaveFrame(name string, f *DataFrame) error
SaveFrame saves a frame in a collection or returns an error
func (*Collection) Update ¶
func (c *Collection) Update(name string, data map[string]interface{}) error
Update JSON doc in a collection from the provided data interface (note: JSON doc must exist or returns an error )
func (*Collection) UpdateJSON ¶ added in v0.0.33
func (c *Collection) UpdateJSON(name string, src []byte) error
UpdateJSON a JSON doc in a collection, returns an error if there is a problem
type DataFrame ¶ added in v0.0.41
type DataFrame struct { // Explicit at creation Name string `json:"frame_name"` // CollectionName holds the name of the collection the frame was generated from. In theory you could // define a frame in one collection and use its results in another. A DataFrame can be rendered as a JSON // document. CollectionName string `json:"collection_name"` // DotPaths is a slice holding the definitions of what each Object attribute's data source is. DotPaths []string `json:"dot_paths"` // Labels are new attribute names for fields create from the provided // DotPaths. Typically this is used to surface a deeper dotpath's // value as something more useful in the frame's context (e.g. // first_title from an array of titles might be labeled "title") Labels []string `json:"labels"` // NOTE: Keys is an orded list of object keys in the frame. Keys []string `json:"keys"` // NOTE: Object map privides a quick index by key to object index. ObjectMap map[string]interface{} `json:"object_map"` // Created is the date the frame is originally generated and defined Created time.Time `json:"created"` // Updated is the date the frame is updated (e.g. reframed) Updated time.Time `json:"updated"` }
DataFrame is the basic structure holding a list of objects as well as the definition of the list (so you can regenerate an updated list from a changed collection). It persists with the collection.
func FrameCreate ¶ added in v0.1.0
func FrameCreate(cName string, fName string, keys []string, dotPaths []string, labels []string, verbose bool) (*DataFrame, error)
FrameCreate creates a frame in a service collection
func (*DataFrame) Grid ¶ added in v0.0.41
Grid returns a Grid representaiton of a DataFrame's ObjectList
type KeyValue ¶ added in v0.0.7
type KeyValue struct { // JSON Record ID in collection ID string // The value of the field to be sorted from record Value interface{} }
KeyValue holds an ID string and value interface, this lets us work with numeric keys and to sort them.
type KeyValues ¶ added in v0.0.7
type KeyValues []KeyValue
KeyValues is a list of keys (strings) to records. This type exists to allow easy sorting.
type Semver ¶ added in v0.0.62
type Semver struct { // Major version number (required, must be an integer as string) Major string `json:"major"` // Minor version number (required, must be an integer as string) Minor string `json:"minor"` // Patch level (optional, must be an integer as string) Patch string `json:"patch,omitempty"` // Suffix string, (optional, any string) Suffix string `json:"suffix,omitempty"` }
Semver holds the information to generate a semver string
func ParseSemver ¶ added in v0.0.62
ParseSemver takes a byte slice and returns a version struct, and an error value.
func (*Semver) IncMajor ¶ added in v0.0.64
IncMajor increments a major version number, zeros minor and patch values. Returns an error if increment fails.
func (*Semver) IncMinor ¶ added in v0.0.64
IncMinor increments a minor version number and zeros the patch level or returns an error. Returns an error if increment fails.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
dataset
dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections on local disc.
|
dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections on local disc. |
tbl.go provides some utility functions to move string one and two demensional slices into/out of one and two deminsional slices.
|
tbl.go provides some utility functions to move string one and two demensional slices into/out of one and two deminsional slices. |