Directories
¶
| Path | Synopsis |
|---|---|
|
Package warc reads WARC files the way Common Crawl stores them: a stream of gzip members, one record per member.
|
Package warc reads WARC files the way Common Crawl stores them: a stream of gzip members, one record per member. |
|
Package wat reads WAT files, the Common Crawl archive of per-page metadata: the response status and content type, the HTML title and meta tags, and the outbound links.
|
Package wat reads WAT files, the Common Crawl archive of per-page metadata: the response status and content type, the HTML title and meta tags, and the outbound links. |
|
Package wet reads WET files, the Common Crawl archive of extracted plain text.
|
Package wet reads WET files, the Common Crawl archive of extracted plain text. |
Click to show internal directories.
Click to hide internal directories.