span-tagger

command
v0.1.330 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 7, 2021 License: GPL-3.0 Imports: 14 Imported by: 0

Documentation

Overview

WIP: span-tagger will be a replacement of span-tag, with improvements:

1. Get rid of a filterconfig JSON format, only use AMSL discovery output (turned into an sqlite3 db, via span-amsl-discovery -db ...); that should get rid of siskin/amsl.py, span-tag, span-freeze and the whole span/filter tree.

2. Allow for updated file output or just TSV of attachments (which we could diff for debugging or other things).

Usage:

$ span-amsl-discovery -db amsl.db -live https://live.server
$ taskcat AIIntermediateSchema | span-tagger -db amsl.db > tagged.ndj

TODO:

* [ ] cover all attachment modes from https://git.io/JvdmC * [ ] add tests * [ ] logs * [ ] make main short

Performance:

Single threaded 170M records, about 4 hours, thanks to caching (but only about 10MB/s); 210m29.179s for 173759327 records; 13G output.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL