Documentation ¶
Overview ¶
Package files contains functionality for dealing with files, including remote files (e.g. HTTP). The files.Files type is the central API for interacting with files.
Index ¶
- Variables
- func DefaultCacheDir() (dir string)
- func DefaultTempDir() (dir string)
- func DetectMagicNumber(ctx context.Context, newRdrFn NewReaderFunc) (detected drivertype.Type, score float32, err error)
- type Files
- func (fs *Files) AddDriverDetectors(detectFns ...TypeDetectFunc)
- func (fs *Files) AddStdin(ctx context.Context, f *os.File) error
- func (fs *Files) CacheClearAll(ctx context.Context) error
- func (fs *Files) CacheClearSource(ctx context.Context, src *source.Source, clearDownloads bool) error
- func (fs *Files) CacheDir() string
- func (fs *Files) CacheDirFor(src *source.Source) (dir string, err error)
- func (fs *Files) CacheLockAcquire(ctx context.Context, src *source.Source) (unlock func(), err error)
- func (fs *Files) CachePaths(src *source.Source) (srcCacheDir, cacheDB, checksums string, err error)
- func (fs *Files) CachedBackingSourceFor(ctx context.Context, src *source.Source) (backingSrc *source.Source, ok bool, err error)
- func (fs *Files) Close() error
- func (fs *Files) CreateTemp(pattern string, clean bool) (*os.File, error)
- func (fs *Files) DetectStdinType(ctx context.Context) (drivertype.Type, error)
- func (fs *Files) DetectType(ctx context.Context, handle, loc string) (drivertype.Type, error)
- func (fs *Files) Filesize(ctx context.Context, src *source.Source) (size int64, err error)
- func (fs *Files) NewBuffer() ioz.Buffer
- func (fs *Files) NewReader(ctx context.Context, src *source.Source, ingesting bool) (io.ReadCloser, error)
- func (fs *Files) Ping(ctx context.Context, src *source.Source) error
- func (fs *Files) TempDir() string
- func (fs *Files) WriteIngestChecksum(ctx context.Context, src, backingSrc *source.Source) (err error)
- type NewReaderFunc
- type TypeDetectFunc
Constants ¶
This section is empty.
Variables ¶
var ( OptHTTPRequestTimeout = options.NewDuration( "http.request.timeout", nil, time.Second*10, "HTTP/S request initial response timeout duration", `How long to wait for initial response from a HTTP/S endpoint before timeout occurs. Reading the body of the response, such as a large HTTP file download, is not affected by this option. Example: 500ms or 3s. Contrast with http.response.timeout.`, options.TagSource, ) OptHTTPResponseTimeout = options.NewDuration( "http.response.timeout", nil, 0, "HTTP/S request completion timeout duration", `How long to wait for the entire HTTP transaction to complete. This includes reading the body of the response, such as a large HTTP file download. Typically this is set to 0, indicating no timeout. Contrast with http.request.timeout.`, options.TagSource, ) OptHTTPSInsecureSkipVerify = options.NewBool( "https.insecure-skip-verify", nil, false, "Skip HTTPS TLS verification", "Skip HTTPS TLS verification. Useful when downloading against self-signed certs.", options.TagSource, ) OptDownloadContinueOnError = downloader.OptContinueOnError OptDownloadCache = downloader.OptCache )
var OptCacheLockTimeout = options.NewDuration( "cache.lock.timeout", nil, time.Second*5, "Wait timeout to acquire cache lock", `Wait timeout to acquire cache lock. During this period, retry will occur if the lock is already held by another process. If zero, no retry occurs.`, )
OptCacheLockTimeout is the time allowed to acquire a cache lock.
See also: driver.OptIngestCache.
Functions ¶
func DefaultCacheDir ¶
func DefaultCacheDir() (dir string)
DefaultCacheDir returns the sq cache dir. This is generally in USER_CACHE_DIR/*/sq, but could also be in TEMP_DIR/*/sq/cache or similar. It is not guaranteed that the returned dir exists or is accessible.
func DefaultTempDir ¶
func DefaultTempDir() (dir string)
DefaultTempDir returns the default sq temp dir. It is not guaranteed that the returned dir exists or is accessible.
func DetectMagicNumber ¶
func DetectMagicNumber(ctx context.Context, newRdrFn NewReaderFunc, ) (detected drivertype.Type, score float32, err error)
DetectMagicNumber is a TypeDetectFunc that detects the "magic number" from the start of files.
Types ¶
type Files ¶
type Files struct {
// contains filtered or unexported fields
}
Files is the centralized API for interacting with files. It provides a uniform mechanism for reading files, whether from local disk, stdin, or remote HTTP.
func New ¶
func New(ctx context.Context, optReg *options.Registry, cfgLock lockfile.LockFunc, tmpDir, cacheDir string, ) (*Files, error)
New returns a new Files instance. The caller must invoke Files.Close when done with the instance.
func (*Files) AddDriverDetectors ¶
func (fs *Files) AddDriverDetectors(detectFns ...TypeDetectFunc)
AddDriverDetectors adds driver type detectors.
func (*Files) AddStdin ¶
AddStdin copies f to fs's cache: the stdin data in f is later accessible via Files.NewReader(src) where src.Handle is source.StdinHandle; f's type can be detected via DetectStdinType.
func (*Files) CacheClearAll ¶
CacheClearAll clears the entire cache dir. Note that this operation is distinct from Files.doCacheSweep.
func (*Files) CacheClearSource ¶
func (fs *Files) CacheClearSource(ctx context.Context, src *source.Source, clearDownloads bool) error
CacheClearSource clears the ingest cache for src. If arg downloads is true, the source's download dir is also cleared. The caller should typically first acquire the cache lock for src via Files.cacheLockFor.
func (*Files) CacheDir ¶
CacheDir returns the cache dir. It is not guaranteed that the returned dir exists.
func (*Files) CacheDirFor ¶
CacheDirFor gets the cache dir for handle. It is not guaranteed that the returned dir exists or is accessible.
func (*Files) CacheLockAcquire ¶
func (fs *Files) CacheLockAcquire(ctx context.Context, src *source.Source) (unlock func(), err error)
CacheLockAcquire acquires the cache lock for src. The caller must invoke the returned unlock func.
func (*Files) CachePaths ¶
CachePaths returns the paths to the cache files for src. There is no guarantee that these files exist, or are accessible. It's just the paths.
func (*Files) CachedBackingSourceFor ¶
func (fs *Files) CachedBackingSourceFor(ctx context.Context, src *source.Source) (backingSrc *source.Source, ok bool, err error, )
CachedBackingSourceFor returns the underlying backing source for src, if it exists. If it does not exist, ok returns false.
func (*Files) CreateTemp ¶
CreateTemp creates a new temporary file in fs's temp dir with the given filename pattern, as per the os.CreateTemp docs. If arg clean is true, the file is added to the cleanup sequence invoked by fs.Close. It is the callers responsibility to close the returned file.
func (*Files) DetectStdinType ¶
DetectStdinType detects the type of stdin as previously added by AddStdin. An error is returned if AddStdin was not first invoked. If the type cannot be detected, TypeNone and nil are returned.
func (*Files) DetectType ¶
DetectType returns the driver type of loc. This may result in loading files into the cache.
func (*Files) Filesize ¶
Filesize returns the file size of src.Location. If the source is being ingested asynchronously, this function may block until loading completes. An error is returned if src is not a document/file source.
func (*Files) NewBuffer ¶ added in v0.48.0
NewBuffer returns a new ioz.Buffer instance which may be in-memory or on-disk, or both, for use as a temporary buffer for potentially large data that may not fit in memory. The caller MUST invoke ioz.Buffer.Close on the returned buffer when done.
func (*Files) NewReader ¶
func (fs *Files) NewReader(ctx context.Context, src *source.Source, ingesting bool) (io.ReadCloser, error)
NewReader returns a new io.ReadCloser for src.Location. Arg ingesting is a performance hint that indicates that the reader is being used to ingest data (as opposed to, say, sampling the data for type detection). It's an error to invoke NewReader for a src after having invoked it for the same src with ingesting=true.
If src.Handle is StdinHandle, AddStdin must first have been invoked.
The caller must close the reader.
func (*Files) TempDir ¶
TempDir returns the temp dir. It is not guaranteed that the returned dir exists.
func (*Files) WriteIngestChecksum ¶
func (fs *Files) WriteIngestChecksum(ctx context.Context, src, backingSrc *source.Source) (err error)
WriteIngestChecksum is invoked (after successful ingestion) to write the checksum of the source document file vs the ingest DB. Thus, if the source document changes, the checksum will no longer match, and the ingest DB will be considered invalid.
type NewReaderFunc ¶
type NewReaderFunc func(ctx context.Context) (io.ReadCloser, error)
NewReaderFunc returns a func that returns an io.ReadCloser. The caller is responsible for closing the returned io.ReadCloser.
type TypeDetectFunc ¶
type TypeDetectFunc func(ctx context.Context, newRdrFn NewReaderFunc) ( detected drivertype.Type, score float32, err error)
TypeDetectFunc interrogates a byte stream to determine the source driver type. A score is returned indicating the confidence that the driver type has been detected. A score <= 0 is failure, a score >= 1 is success; intermediate values indicate some level of confidence. An error is returned only if an IO problem occurred. The implementation gets access to the byte stream by invoking newRdrFn, and is responsible for closing any reader it opens.
Directories ¶
Path | Synopsis |
---|---|
internal
|
|
downloader
Package downloader provides a mechanism for getting files from HTTP/S URLs, making use of a mostly RFC-compliant cache.
|
Package downloader provides a mechanism for getting files from HTTP/S URLs, making use of a mostly RFC-compliant cache. |