tilelibrary

package
v0.0.0-...-7a0a068 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: AGPL-3.0 Imports: 16 Imported by: 0

Documentation

Overview

Package tilelibrary is a package for implementing tile libraries in Go. It is assumed that the tile information provided beforehand is imputed, or that having nocalls is okay in SGLF files. The library does check for completeness of tiles, but doesn't modify them to be complete or imputed before writing them to files. Various functions to merge, liftover, import, export, and modify libraries are provided. Should be used in conjunction with the structures package. Note: Do not add tiles to libraries made from SGLFv2 files (since it's not clear how the tile information would be included) Note: adding tiles to a library at any point will require sorting that library before writing it to a file.

Index

Constants

This section is empty.

Variables

View Source
var ErrBadLiftover = errors.New("not a valid mapping file")

ErrBadLiftover is an error for when a file is not a liftover mapping.

View Source
var ErrBadSource = errors.New("source library is not part of the destination library")

ErrBadSource is an error for when the origin of a liftover mapping is not a subset of the destination library.

View Source
var ErrCannotAddTile = errors.New("library was built from sglfv2 files--cannot add new tile")

ErrCannotAddTile is an error when trying to add tiles to libraries built from SGLFv2 files, since it's not clear how to add tiles properly to them.

View Source
var ErrInconsistentHash = errors.New("bases and hash do not match")

ErrInconsistentHash is an error for when the hash of a set of bases does not match the TileVariant hash.

View Source
var ErrIncorrectSourceText = errors.New("library doesn't have the right intermediate file(s) as its Text field")

ErrIncorrectSourceText is an error for when the source file/directory doesn't match the function it's used in.

View Source
var ErrInvalidReferenceLibrary = errors.New("reference library field is not a library pointer")

ErrInvalidReferenceLibrary is an error when the ReferenceLibrary field of a TileVariant is not a pointer to a Library.

View Source
var ErrTileContradiction = errors.New("a tile that was added is not found in the library")

ErrTileContradiction is an error that occurs when a tile that is known to be in the library was not found (usually this is used after adding that tile to the library)

Functions

func ReadMapping

func ReadMapping(filepath string) (mapping [][][]int, sourceID, destinationID [md5.Size]byte, err error)

ReadMapping gets the information from a mapping given its filepath. It also returns the hashes for the source and destination libraries, in that order.

func WriteMapping

func WriteMapping(filename string, mapping LiftoverMapping) error

WriteMapping writes a LiftoverMapping to a specified file. The format is path/step/source1,destination1;source2,destination2;... Current suffix for mappings: .sglfmapping (make sure all filenames end with this suffix) Returns any error encountered, or nil if there's no error.

Types

type KnownVariants

type KnownVariants struct {
	List   [](*structures.TileVariant) // List to keep track of relative tile ordering (implicitly assigns tile variant numbers by index after sorting)
	Counts []int                       // Counts of each variant so far
}

KnownVariants is a struct to hold the known variants in a specific step. KnownVariants will also keep track of the count of each tile--the variant at List[i] has been seen Counts[i] times.

type Library

type Library struct {
	Paths []concurrentPath // The paths of the library.
	ID    [md5.Size]byte   // The ID of a library.

	Components [][md5.Size]byte // the IDs of the libraries that this library is composed of (empty if this library was not made from others)
	// contains filtered or unexported fields
}

Library is a struct to represent a library of tile variants from a set of genomes.

A Library is separated into paths, in the Paths field, represented as a slice of concurrentPaths. This makes Libraries safe for concurrent use in terms of modification of tiles.

Libraries have IDs for easy discussion and reference. Currently, the IDs are calculated by the MD5 hash algorithm. It hashes all of the location and tile information (except annotations) in order, by path, then step, then in the order of variants by increasing variant number. Upon reaching a new path or step, the uint32 form, separated into 4 bytes, of the path/step is added to the hash. Then, for each variant, infomation is added to the list of bytes to be hashed in the following order:

-Variant number, in uint32 form, separated into bytes
-Total count of this variant
-Tile length, in terms of steps.
-The hash of the tile variant, as bytes.

Hashing everything including location infomation ensures that two libraries with the same tiles and counts but with some tiles in different locations would not be considered the same libraries.

The Components of a library determine specifically which libraries are allowed to liftover to that library (since without being part of the components, it's impossible to know easily if you can make a liftover mapping)

In terms of usage, create a new library using New, which will set up the Paths of the Library for you, and will set the reference text file and any component libraries.

Notes: files and directories of tile libraries will not be automatically deleted. If files or directories must be deleted, you must add this functionality (e.g. by using os.Remove). In addition, the ID is not automatically updated when using AddTile, since AssignID is not quick. The caller must use AssignID on the library to update its ID after adding all tiles.

func CompileDirectoriesToLibrary

func CompileDirectoriesToLibrary(directories []string, libraryTextFile string, gzipped bool) (*Library, error)

CompileDirectoriesToLibrary creates a new Library based on the directories given, sorts it, and gives it its ID (so this library is ready for use). Returns the library pointer and an error, if any (nil if no error was encounted)

func New

func New(textFile string, componentLibraries [][md5.Size]byte) (*Library, error)

New sets up the basic structure for a library and returns a pointer to a new library. For consistency, it's best to use an absolute path for the text file. Relative paths will still work, but they are not recommended.

func SequentialCompileDirectoriesToLibrary

func SequentialCompileDirectoriesToLibrary(directories []string, libraryTextFile string) (*Library, error)

SequentialCompileDirectoriesToLibrary creates a new Library based on the directories given, sorts it, and gives it its ID (so this library is ready for use). This adds each directory sequentially (so each genome is done one at a time, rather than doing one path of all genomes all at once) Returns the library pointer and an error, if any (nil if no error was encounted)

func (*Library) AddDirectories

func (l *Library) AddDirectories(directories []string, gzipped bool) error

AddDirectories adds information from a list of directories for genomes into a library, but parses by path. Will return any error encountered.

func (*Library) AddLibraryFastJ

func (l *Library) AddLibraryFastJ(directory string) error

AddLibraryFastJ adds a directory of gzipped FastJ files to a specific library. Will return any error encountered.

func (*Library) AddLibrarySGLFv2

func (l *Library) AddLibrarySGLFv2() error

AddLibrarySGLFv2 adds a directory of SGLFv2 files to a library. Library should be initialized with this directory as the Text field, so that text files of bases and directories aren't mixed together. Returns any error encountered, or nil if there's no error.

func (*Library) AddPathFromDirectories

func (l *Library) AddPathFromDirectories(directories []string, genomePath int, gzipped bool) error

AddPathFromDirectories parses the same path for all genomes, represented by a list of directories, and puts the information in a Library. Will return any error encountered.

func (*Library) AddTile

func (l *Library) AddTile(genomePath, step int, new *structures.TileVariant, bases string) error

AddTile is a function to add a tile (without sorting). Safe to use without checking existence of the tile beforehand (since the function will do that for you). Will return any error encountered. Note: AddTile will write any new tiles to disk in an intermediate file.

func (*Library) Annotate

func (l *Library) Annotate(path, step int, hash structures.VariantHash, annotation string) bool

Annotate is a method to annotate (or re-annotate) a Tile at a specific path and step. If no match is found, the user is notified through the returned boolean.

func (*Library) AssignID

func (l *Library) AssignID()

AssignID assigns a library its ID. Current method takes the path, step, variant number, variant count, variant hash, and variant length into account when making the ID.

func (*Library) Equals

func (l *Library) Equals(l2 *Library) bool

Equals checks for equality between two libraries. It does not check similarity in text or components, and tiles are checked by hash. HashEquals will generally be a faster way of checking equality--this is best used when you need to be completely sure about library equality (or inequality)

func (*Library) FindFrequency

func (l *Library) FindFrequency(path, step int, toFind *structures.TileVariant) int

FindFrequency is a function to find the frequency of a specific tile at a specific path and step. A tile that is not found at a specific location has a frequency of 0.

func (*Library) HashEquals

func (l *Library) HashEquals(l2 *Library) bool

HashEquals is a simpler way of checking library equality, since two libraries with the same ID are almost certainly equal. It's faster to use than Equals, given that the IDs have been calculated already.

func (*Library) MergeLibraries

func (l *Library) MergeLibraries(libraryToMerge *Library, textFile string) (*Library, error)

MergeLibraries is a function to merge the library given with the base library. This version creates a new library. Returns the library pointer and an error, if any (nil if no error was encounted)

func (*Library) MergeLibrariesWithoutCreation

func (l *Library) MergeLibrariesWithoutCreation(libraryToMerge *Library) (*Library, error)

MergeLibrariesWithoutCreation merges libraries without creating a new one, using the "mainLibrary" instead. Returns the library pointer and an error, if any (nil if no error was encounted)

func (*Library) SortLibrary

func (l *Library) SortLibrary()

SortLibrary is a function to sort the library once all initial genomes are done being added. This function should only be used once after initial setup of the library, after all tiles have been added, since it sorts everything. The sort function compares tile counts and hashes, so the order in which tiles are added doesn't matter.

func (*Library) TileExists

func (l *Library) TileExists(path, step int, toCheck *structures.TileVariant) (int, bool)

TileExists is a function to check if a specific tile exists at a specific path and step in a library. Returns the index of the variant and the boolean true, if found--otherwise, returns 0 and false, meaning not found. It creates more room for new steps and variants, if necessary.

func (*Library) WriteLibraryToSGLF

func (l *Library) WriteLibraryToSGLF(directoryToWriteTo string) error

WriteLibraryToSGLF writes the contents of a library to SGLF files to a specified directory. Will return any error encountered.

func (*Library) WriteLibraryToSGLFv2

func (l *Library) WriteLibraryToSGLFv2(directoryToWriteTo string) error

WriteLibraryToSGLFv2 writes the contents of a library to SGLFv2 files to a specified directory. Will return any error encountered.

type LiftoverMapping

type LiftoverMapping struct {
	Mapping            [][][]int // The actual mapping between the two libraries
	SourceLibrary      *Library  // The source library to map from.
	DestinationLibrary *Library  // The destination library to map to.
}

LiftoverMapping is a representation of a liftover from one library to another, essentially becoming a translation of variants from the source to the destination. If a = LiftoverMapping.Mapping[b][c][d], then in path b, step c, variant d in the first library maps to variant a in path b and step c in the second.

func CreateMapping

func CreateMapping(source, destination *Library) (LiftoverMapping, error)

CreateMapping creates a liftover mapping from the source library to the destination library. Returns the mapping and an error, if any (nil if no error was encounted)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL