BlogWatcher
Never miss a post. Track any blog — RSS or not.

A Go CLI tool to track blog articles, detect new posts, and manage read/unread status.
Supports RSS/Atom feeds and HTML scraping as fallback.
Forked from Hyaxia/blogwatcher.
English | 中文
Quick Start
# Install
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest
# Track a blog
blogwatcher add "Paul Graham" https://paulgraham.com/articles.html
# Scan for new articles
blogwatcher scan
# Read unread articles
blogwatcher articles
What's New
Added on top of Hyaxia/blogwatcher
OPML Import — bulk-import blog subscriptions from feed reader exports (Feedly, Inoreader, etc.):
blogwatcher import subscriptions.opml
Improved RSS Discovery — better feed detection via Content-Type headers and rel="self" links, fixing feeds that previously weren't auto-detected (e.g. TechCrunch tag pages).
Category support — organize your blogs into named groups and filter by category:
blogwatcher add "Tech Blog" https://example.com -c engineering
blogwatcher edit "Tech Blog" -c research
blogwatcher blogs -c engineering
blogwatcher articles -c engineering
blogwatcher categories
Features
|
|
| 📡 Dual Source Support |
Tries RSS feeds first, falls back to HTML scraping |
| 🔍 Auto Feed Discovery |
Detects RSS/Atom URLs from blog homepages |
| 📥 OPML Import 🆕 |
Bulk-import subscriptions from Feedly / Inoreader OPML exports |
| 🔗 Better Feed Detection 🆕 |
Finds feeds via Content-Type headers and rel="self" links |
| 🗂️ Category Support |
Organize blogs into named groups, filter articles by category |
| ✅ Read/Unread Tracking |
Keep track of what you've read |
| 🚫 Duplicate Prevention |
Never tracks the same article twice |
| ⚡ Concurrent Scanning |
Configurable parallel workers |
Installation
# Install the CLI
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest
# Or build locally
git clone https://github.com/hanw39/blogwatcher
cd blogwatcher
go build ./cmd/blogwatcher
Windows and Linux binaries are available on the Releases page.
Usage
Importing from OPML
# Import all subscriptions from a feed reader export (Feedly, Inoreader, etc.)
blogwatcher import subscriptions.opml
Handles OPML 1.0/2.0 and nested categories. Duplicate blogs are reported but not re-added.
Adding Blogs
# Add a blog (auto-discovers RSS feed)
blogwatcher add "My Favorite Blog" https://example.com/blog
# Add with explicit feed URL
blogwatcher add "Tech Blog" https://techblog.com --feed-url https://techblog.com/rss.xml
# Add with HTML scraping selector (for blogs without feeds)
blogwatcher add "No-RSS Blog" https://norss.com --scrape-selector "article h2 a"
# Add and assign to a category (created automatically if it doesn't exist)
blogwatcher add "Tech Blog" https://techblog.com -c engineering
Managing Blogs
# List all tracked blogs
blogwatcher blogs
# Filter blogs by category
blogwatcher blogs -c engineering
# Remove a blog (and all its articles)
blogwatcher remove "My Favorite Blog"
# Remove without confirmation
blogwatcher remove "My Favorite Blog" -y
Editing Blogs
# Assign a blog to a category
blogwatcher edit "Tech Blog" -c engineering
# Remove a blog from its category
blogwatcher edit "Tech Blog" -c ""
Managing Categories
# List all categories with blog counts
blogwatcher categories
Categories (3):
changelog 3 blogs
engineering 7 blogs
research 3 blogs
Scanning for New Articles
# Scan all blogs (8 concurrent workers by default)
blogwatcher scan
# Scan a specific blog
blogwatcher scan "Tech Blog"
# Custom workers
blogwatcher scan -w 4
# Silent mode (outputs "scan done" when complete — useful for cron)
blogwatcher scan -s
Viewing Articles
# List unread articles
blogwatcher articles
# List all articles (including read)
blogwatcher articles -a
# Filter by blog
blogwatcher articles -b "Tech Blog"
# Filter by category
blogwatcher articles -c engineering
# Combine filters
blogwatcher articles -a -c engineering
Managing Read Status
# Mark an article as read (use article ID shown in articles list)
blogwatcher read 42
# Mark as unread
blogwatcher unread 42
# Mark all unread as read
blogwatcher read-all
# Mark all unread from a specific blog as read
blogwatcher read-all -b "Tech Blog" -y
How It Works
Scanning Process
- For each tracked blog, BlogWatcher attempts to parse its RSS/Atom feed
- If no feed URL is configured, it tries to auto-discover one from the blog homepage
- If RSS parsing fails and a
scrape_selector is configured, it falls back to HTML scraping
- New articles are saved to the database as unread
- Already-tracked articles are skipped
HTML Scraping
When RSS isn't available, provide a CSS selector that matches article links:
--scrape-selector "article h2 a" # Links inside article h2 tags
--scrape-selector ".post-title a" # Links with post-title class
--scrape-selector "#blog-posts a" # Links inside blog-posts ID
Database
SQLite database at ~/.blogwatcher/blogwatcher.db:
| Table |
Description |
categories |
Blog categories |
blogs |
Tracked blogs (name, URL, feed URL, scrape selector, category) |
articles |
Discovered articles (title, URL, dates, read status) |
Development
Requirements: Go 1.24+
# Run tests
go test ./...
# Build
go build ./cmd/blogwatcher
Publishing a Release
git tag vX.Y.Z
git push origin vX.Y.Z
License
MIT