Xobaque
This is a search engine that does not crawl the web. It only knows
what users upload. It works from the command line and as a web server.
If you have a web site generated from Markdown files, for example, you
can use the upload command to have Xobaque index your site. You can
use a RSS or Atom feed to update the index. You can use an OPML file
linking to multiple feeds to update the index from a large number of
sources.
Xobaque doesn't scrape websites and so it cannot verify what's in the
feeds. Accepting updates from strangers would allow them to add fake
pages to the index. Xobaque currently doesn't have any moderation
support so be careful when accepting submissions.
Xobaque is writen on top of the SQLite FTS5
Extension.
If you need logging or access control, put Xobaque behind a regular
web server such as Apache acting as a reverse proxy.
Documentation
This project uses man(1) pages. They are generated from text files
using scdoc. These are the files
available:
xobaque(1): This
man page documents how to run Xobaque on the web.
xobaque-import(1):
This man page documents how to index feeds (RSS and Atom) and feed
collections (OPML).
xobaque-upload(1):
This man page documents how to index text files.
xobaque-search(1):
This man page documents how to search the index from the command line.
xobaque-templates(5):
This man page documents how to write the HTML templates.
xobaque-releases(7):
This man page lists all the releases and their user-visible changes.
Build it using Go
go build
Installation
The Makefile installs the binary and the manpages into ~/.local.
make install
The xobaque.service and xobaque.socket files can help you set up
the system using systemd with Unix domain socket activation, if
that sounds like something you'd like.
Dependencies
This section lists the non-standard libraries Xobaque uses and their
respective licenses.
github.com/antchfx/htmlquery is
used to find the title in pages. MIT.
github.com/antchfx/xmlquery is
used to find the next feed in paginated feeds. MIT.
github.com/antchfx/xpath is used
to parse RSS and Atom feeds to find the next feed in paginated feeds.
MIT.
github.com/gilliek/go-opml/opml
is used to parse OPML files. BSD-3-Clause.
github.com/google/subcommands
is used for the parsing and documenting of subcommands. Apache-2.0.
github.com/microcosm-cc/bluemonday
is used to strip feed items of all HTML. BSD-3-Clause.
github.com/mmcdole/gofeed is used
to parse RSS, Atom and JSON feeds. MIT.
github.com/stretchr/testify/assert
is used for testing. MIT.
modernc.org/sqlite is used
for SQLite access. BSD-3-Clause.
github.com/temoto/robotstxt is
used to parse the /robots.txt files. MIT.
Bugs
If you spot any, contact me.