Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( ErrFileTooLarge = errors.New("file too large") ErrInvalidHash = errors.New("invalid hash") // DefaultBucker for S3 DefaultBucket = "sandcrawler" )
var Version = "0.3.13"
Version of library and cli tools.
Functions ¶
This section is empty.
Types ¶
type BlobRequestOptions ¶ added in v0.3.5
type BlobRequestOptions struct { Folder string Blob []byte SHA1Hex string Ext string Prefix string Bucket string }
BlobRequestOptions wraps the blob request options, both for setting and retrieving a blob.
Currently used folder names:
- "pdf" for thumbnails - "xml_doc" for TEI-XML - "html_body" for HTML TEI-XML - "unknown" for generic
Default bucket is "sandcrawler-dev", other buckets via infra:
- "sandcrawler" for sandcrawler_grobid_bucket - "thumbnail" for sandcrawler_thumbnail_bucket - "sandcrawler" for sandcrawler_text_bucket
type PutBlobResponse ¶
PutBlobResponse wraps a blob put request response.
type WebSpoolService ¶ added in v0.3.5
WebSpoolService saves web payload to a configured directory. TODO: add limit in size (e.g. 80% of disk or absolute value)
func (*WebSpoolService) BlobHandler ¶ added in v0.3.5
func (svc *WebSpoolService) BlobHandler(w http.ResponseWriter, r *http.Request)
BlobHandler receives binary blobs and saves them on disk. This handler returns as soon as the file has been written into the spool directory of the service, using a sharded SHA1 as path.
func (*WebSpoolService) SpoolListHandler ¶ added in v0.3.5
func (svc *WebSpoolService) SpoolListHandler(w http.ResponseWriter, r *http.Request)
SpoolListHandler returns a single, long jsonlines response with information about all files in the spool directory.
func (*WebSpoolService) SpoolStatusHandler ¶ added in v0.3.5
func (svc *WebSpoolService) SpoolStatusHandler(w http.ResponseWriter, r *http.Request)
SpoolStatusHandler returns HTTP 200, if a given file is in the spool directory and HTTP 404, if the file is not in the spool directory.
type WrapS3 ¶ added in v0.3.5
type WrapS3 struct {
Client *minio.Client
}
WrapS3 slightly wraps I/O around our S3 store with convenience methods.
func NewWrapS3 ¶ added in v0.3.5
func NewWrapS3(endpoint string, opts *WrapS3Options) (*WrapS3, error)
NewWrapS3 creates a new, slim wrapper around S3.
func (*WrapS3) PutBlob ¶ added in v0.3.7
func (wrap *WrapS3) PutBlob(ctx context.Context, req *BlobRequestOptions) (*PutBlobResponse, error)
PutBlob takes a data to be put into S3 and saves it.