Documentation ¶
Overview ¶
WIP: span-tagger will be a replacement of span-tag, with improvements:
1. Get rid of a filterconfig JSON format, only use AMSL discovery output (turned into an sqlite3 db, via span-amsl-discovery -db ...); that should get rid of siskin/amsl.py, span-tag, span-freeze and the whole span/filter tree.
2. Allow for updated file output or just TSV of attachments (which we could diff for debugging or other things).
Usage:
$ span-amsl-discovery -db amsl.db -live https://live.server $ taskcat AIIntermediateSchema | span-tagger -db amsl.db > tagged.ndj
TODO:
* [ ] cover all attachment modes from https://git.io/JvdmC * [ ] add tests * [ ] logs * [ ] make main short
Performance:
Single threaded 170M records, about 4 hours, thanks to caching (but only about 10MB/s); 210m29.179s for 173759327 records; 13G output.
Click to show internal directories.
Click to hide internal directories.