db

package
v1.8.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2025 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

View Source
const (
	SecondsInAMonth = 2628000
	SecondsInAYear  = SecondsInAMonth * 12
	ErrInvalidAge   = Error("not a valid age")
)
View Source
const (
	GUTABucket  = "gut"
	ChildBucket = "children"

	DBOpenMode = 0640
)
View Source
const (
	ErrDBExists    = Error("database already exists")
	ErrDBNotExists = Error("database doesn't exist")
	ErrDirNotFound = Error("directory not found")
)
View Source
const ErrInvalidType = Error("not a valid file type")

Variables

Functions

func NewDBSet

func NewDBSet(dir string) (*dbSet, error)

NewDBSet creates a new NewDBSet that knows where its database files are located or should be created.

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB is used to create and query a database made from a dguta file, which is the directory,group,user,type,age summary output produced by the summary packages' DirGroupUserTypeAge.Output() method.

func NewDB

func NewDB(paths ...string) *DB

NewDB returns a *DB that can be used to create or query a dguta database. Provide the path to directory that (will) store(s) the database files. In the case of only reading databases with Open(), you can supply multiple directory paths to query all of them simultaneously.

func (*DB) Add

func (d *DB) Add(dguta RecordDGUTA) error

Add is a dgutaParserCallBack that is called during parsing of dguta file data. It batches up the DGUTs we receive, and writes them to the database when a batch is full.

func (*DB) Children

func (d *DB) Children(dir string) []string

Children returns the directory paths that are directly inside the given directory.

Returns an empty slice if dir had no children (because it was a leaf dir, or didn't exist at all).

The same children from multiple databases are de-duplicated.

You must call Open() before calling this.

func (*DB) Close

func (d *DB) Close() error

Close closes the database(s) after reading. You should call this once you've finished reading, but it's not necessary; errors are ignored.

func (*DB) CreateDB

func (d *DB) CreateDB() error

CreateDB creates a new database set, but only if it doesn't already exist.

func (*DB) DirInfo

func (d *DB) DirInfo(dir string, filter *Filter) (*DirSummary, error)

DirInfo tells you the total number of files, their total size, oldest atime and newset mtime nested under the given directory, along with the UIDs, GIDs and FTs of those files. See GUTAs.Summary for an explanation of the filter.

Returns an error if dir doesn't exist.

You must call Open() before calling this.

func (*DB) Info

func (d *DB) Info() (*DBInfo, error)

Info opens our constituent databases read-only, gets summary info about their contents, returns that info and closes the databases.

func (*DB) Open

func (d *DB) Open() error

Open opens the database(s) for reading. You need to call this before using the query methods like DirInfo() and Which(). You should call Close() after you've finished.

func (*DB) SetBatchSize

func (d *DB) SetBatchSize(batchSize int)

type DBInfo

type DBInfo struct {
	NumDirs     int
	NumDGUTAs   int
	NumParents  int
	NumChildren int
}

type DCSs

type DCSs []*DirSummary

DCSs is a Size-sortable slice of DirSummary.

func (DCSs) Len

func (d DCSs) Len() int

func (DCSs) Less

func (d DCSs) Less(i, j int) bool

func (DCSs) SortByDirAndAge

func (d DCSs) SortByDirAndAge()

SortByDirAndAge sorts by Dir first then Age instead of Size.

func (DCSs) Swap

func (d DCSs) Swap(i, j int)

type DGUTA

type DGUTA struct {
	Dir   string
	GUTAs GUTAs
}

DGUTA handles all the *GUTA information for a directory.

func DecodeDGUTAbytes

func DecodeDGUTAbytes(dir, encoded []byte) *DGUTA

DecodeDGUTAbytes converts the byte slices returned by DGUTA.Encode() back in to a *DGUTA.

func (*DGUTA) Append

func (d *DGUTA) Append(other *DGUTA)

Append appends the GUTAs in the given DGUTA to our own. Useful when you have 2 DGUTAs for the same Dir that were calculated on different subdirectories independently, and now you're dealing with DGUTAs for their common parent directories.

func (*DGUTA) Summary

func (d *DGUTA) Summary(filter *Filter) *DirSummary

Summary sums the count and size of all our GUTAs and returns the results, along with the oldest atime and newset mtime (seconds since Unix epoch) and unique set of UIDs, GIDs and FTs in all our GUTAs.

See GUTAs.Summary for an explanation of the filter.

type DirGUTAFileType

type DirGUTAFileType uint8

DirGUTAFileType is one of the special file types that the directory,group,user,filetype,age summaries group on.

const (
	DGUTAFileTypeOther      DirGUTAFileType = 0
	DGUTAFileTypeTemp       DirGUTAFileType = 1
	DGUTAFileTypeVCF        DirGUTAFileType = 2
	DGUTAFileTypeVCFGz      DirGUTAFileType = 3
	DGUTAFileTypeBCF        DirGUTAFileType = 4
	DGUTAFileTypeSam        DirGUTAFileType = 5
	DGUTAFileTypeBam        DirGUTAFileType = 6
	DGUTAFileTypeCram       DirGUTAFileType = 7
	DGUTAFileTypeFasta      DirGUTAFileType = 8
	DGUTAFileTypeFastq      DirGUTAFileType = 9
	DGUTAFileTypeFastqGz    DirGUTAFileType = 10
	DGUTAFileTypePedBed     DirGUTAFileType = 11
	DGUTAFileTypeCompressed DirGUTAFileType = 12
	DGUTAFileTypeText       DirGUTAFileType = 13
	DGUTAFileTypeLog        DirGUTAFileType = 14
	DGUTAFileTypeDir        DirGUTAFileType = 15
)

func FileTypeStringToDirGUTAFileType

func FileTypeStringToDirGUTAFileType(ft string) (DirGUTAFileType, error)

FileTypeStringToDirGUTAFileType converts the String() representation of a DirGUTAFileType back in to a DirGUTAFileType. Errors if an invalid string supplied.

func (DirGUTAFileType) String

func (d DirGUTAFileType) String() string

String lets you convert a DirGUTAFileType to a meaningful string.

type DirGUTAge

type DirGUTAge uint8

DirGUTAge is one of the age types that the directory,group,user,filetype,age summaries group on. All is for files of all ages. The AgeA* consider age according to access time. The AgeM* consider age according to modify time. The *\dM ones are age in the number of months, and the *\dY ones are in number of years.

const (
	DGUTAgeAll DirGUTAge = 0
	DGUTAgeA1M DirGUTAge = 1
	DGUTAgeA2M DirGUTAge = 2
	DGUTAgeA6M DirGUTAge = 3
	DGUTAgeA1Y DirGUTAge = 4
	DGUTAgeA2Y DirGUTAge = 5
	DGUTAgeA3Y DirGUTAge = 6
	DGUTAgeA5Y DirGUTAge = 7
	DGUTAgeA7Y DirGUTAge = 8
	DGUTAgeM1M DirGUTAge = 9
	DGUTAgeM2M DirGUTAge = 10
	DGUTAgeM6M DirGUTAge = 11
	DGUTAgeM1Y DirGUTAge = 12
	DGUTAgeM2Y DirGUTAge = 13
	DGUTAgeM3Y DirGUTAge = 14
	DGUTAgeM5Y DirGUTAge = 15
	DGUTAgeM7Y DirGUTAge = 16
)

func AgeStringToDirGUTAge

func AgeStringToDirGUTAge(age string) (DirGUTAge, error)

AgeStringToDirGUTAge converts the String() representation of a DirGUTAge back in to a DirGUTAge. Errors if an invalid string supplied.

func (DirGUTAge) FitsAgeInterval

func (d DirGUTAge) FitsAgeInterval(atime, mtime, refTime int64) bool

FitsAgeInterval takes a dguta and the mtime and atime and reference time. It checks the value of age inside the dguta, and then returns true if the mtime or atime respectively fits inside the age interval. E.g. if age = 3, this corresponds to DGUTAgeA6M, so atime is checked to see if it is older than 6 months.

type DirInfo

type DirInfo struct {
	Current  *DirSummary
	Children []*DirSummary
}

DirInfo holds nested file count, size, UID and GID information on a directory, and also its immediate child directories.

func (*DirInfo) IsSameAsChild

func (d *DirInfo) IsSameAsChild() bool

IsSameAsChild tells you if this DirInfo has only 1 child, and the child has the same file count. Ie. our child contains the same files as us.

type DirSummary

type DirSummary struct {
	Dir     string
	Count   uint64
	Size    uint64
	Atime   time.Time
	Mtime   time.Time
	UIDs    []uint32
	GIDs    []uint32
	FTs     []DirGUTAFileType
	Age     DirGUTAge
	Modtime time.Time
}

DirSummary holds nested file count, size, atime and mtime information on a directory. It also holds which users and groups own files nested under the directory, what the file types are, and the age group.

type Error

type Error string

func (Error) Error

func (e Error) Error() string

type Filter

type Filter struct {
	GIDs []uint32
	UIDs []uint32
	FTs  []DirGUTAFileType
	Age  DirGUTAge
}

Filter can be applied to a GUTA to see if it has one of the specified GIDs, UIDs and FTs or has the specified Age, in which case it passes the filter.

If the Filter has one of those properties set to nil, or the whole Filter is nil, a GUTA will be considered to pass the filter.

The exeception to this is when FTs != []{DGUTFileTypeTemp}, and the GUTA has an FT of DGUTAFileTypeTemp. A GUTA for a temporary file will always fail to pass the filter unless filtering specifically for temporary files, because other GUTA objects will represent the same file on disk but with another file type, and you won't want to double-count.

type GUTA

type GUTA struct {
	GID   uint32
	UID   uint32
	FT    DirGUTAFileType
	Age   DirGUTAge
	Count uint64
	Size  uint64
	Atime int64 // seconds since Unix epoch
	Mtime int64 // seconds since Unix epoch
	// contains filtered or unexported fields
}

GUTA handles group,user,type,age,count,size information.

func (*GUTA) PassesFilter

func (g *GUTA) PassesFilter(filter *Filter) (bool, bool)

PassesFilter checks to see if this GUTA has a GID in the filter's GIDs (considered true if GIDs is nil), and has a UID in the filter's UIDs (considered true if UIDs is nil), and an Age the same as the filter's Age, and has an FT in the filter's FTs (considered true if FTs is nil). The second bool returned will match the first unless FT is DGUTAFileTypeTemp, in which case it will be false, unless the filter FTs == []{DGUTAFileTypeTemp}).

type GUTAs

type GUTAs []*GUTA

GUTAs is a slice of *GUTA, offering ways to filter and summarise the information in our *GUTAs.

func (GUTAs) Summary

func (g GUTAs) Summary(filter *Filter) *DirSummary

Summary sums the count and size of all our GUTA elements and returns the results, along with the oldest atime and newset mtime (in seconds since Unix epoch) and lists of the unique UIDs, GIDs and FTs in our GUTA elements.

Provide a Filter to ignore GUTA elements that do not match one of the specified GIDs, one of the UIDs, one of the FTs, and the specified Age. If one of those properties is nil, does not filter on that property.

Provide nil to do no filtering, but providing Age: summary.DGUTAgeAll is recommended.

Note that FT 1 is "temp" files, and because a file can be both temporary and another type, if your Filter's FTs slice doesn't contain just DGUTAFileTypeTemp, any GUTA with FT DGUTAFileTypeTemp is always ignored. (But the FTs list will still indicate if you had temp files that passed other filters.)

type RecordDGUTA

type RecordDGUTA struct {
	Dir      *summary.DirectoryPath
	GUTAs    GUTAs
	Children []string
}

func (*RecordDGUTA) EncodeToBytes

func (d *RecordDGUTA) EncodeToBytes() ([]byte, []byte)

EncodeToBytes returns our Dir as a []byte and our GUTAs encoded in another []byte suitable for storing on disk.

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

Tree is used to do high-level queries on DB.Store() database files.

func NewTree

func NewTree(paths ...string) (*Tree, error)

NewTree, given the paths to one or more dguta database files (as created by DB.Store()), returns a *Tree that can be used to do high-level queries on the stats of a tree of disk folders. You should Close() the tree after use.

func (*Tree) Close

func (t *Tree) Close()

Close should be called after you've finished querying the tree to release its database locks.

func (*Tree) DirHasChildren

func (t *Tree) DirHasChildren(dir string, filter *Filter) bool

DirHasChildren tells you if the given directory has any child directories with files in them that pass the filter. See GUTAs.Summary for an explanation of the filter.

func (*Tree) DirInfo

func (t *Tree) DirInfo(dir string, filter *Filter) (*DirInfo, error)

DirInfo tells you the total number of files and their total size nested under the given directory, along with the UIDs and GIDs that own those files. See GUTAs.Summary for an explanation of the filter.

It also tells you the same information about the immediate child directories of the given directory (if the children have files in them that pass the filter).

Returns an error if dir doesn't exist.

func (*Tree) FileLocations

func (t *Tree) FileLocations(dir string, filter *Filter) (DCSs, error)

FileLocations, starting from the given dir, finds the first directory that directly contains filter-passing files along every branch from dir.

See GUTAs.Summary for an explanation of the filter.

The results are returned sorted by directory.

func (*Tree) Where

func (t *Tree) Where(dir string, filter *Filter, recurseCount split.SplitFn) (DCSs, error)

Where tells you where files are nested under dir that pass the filter. With a depth of 0 it only returns the single deepest directory that has all passing files nested under it.

The recurseCount function returns a path dependent depth value.

With a depth of 1, it also returns the results that calling Where() with a depth of 0 on each of the deepest directory's children would give. And so on recursively for higher depths.

See GUTAs.Summary for an explanation of the filter.

It's recommended to set the Age filter to summary.DGUTAgeAll.

For example, if all user 354's files are in the directories /a/b/c/d (2 files), /a/b/c/d/1 (1 files), /a/b/c/d/2 (2 files) and /a/b/e/f/g (2 files), Where("/", &Filter{UIDs: []uint32{354}}, 0) would tell you that "/a/b" has 7 files. With a depth of 1 it would tell you that "/a/b" has 7 files, "/a/b/c/d" has 5 files and "/a/b/e/f/g" has 2 files. With a depth of 2 it would tell you that "/a/b" has 7 files, "/a/b/c/d" has 5 files, "/a/b/c/d/1" has 1 file, "/a/b/c/d/2" has 2 files, and "/a/b/e/f/g" has 2 files.

The returned DirSummarys are sorted by Size, largest first.

Returns an error if dir doesn't exist.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL