blogwatcher

module

v1.0.1 Latest Latest Go to latest Published: Apr 23, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/hanw39/blogwatcher

Links

Open Source Insights

README ¶

BlogWatcher

Never miss a post. Track any blog — RSS or not.

A Go CLI tool to track blog articles, detect new posts, and manage read/unread status.
Supports RSS/Atom feeds and HTML scraping as fallback.

Forked from Hyaxia/blogwatcher.

English | 中文

Quick Start

# Install
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest

# Track a blog
blogwatcher add "Paul Graham" https://paulgraham.com/articles.html

# Scan for new articles
blogwatcher scan

# Read unread articles
blogwatcher articles

What's New

Added on top of Hyaxia/blogwatcher

OPML Import — bulk-import blog subscriptions from feed reader exports (Feedly, Inoreader, etc.):

blogwatcher import subscriptions.opml

Improved RSS Discovery — better feed detection via Content-Type headers and rel="self" links, fixing feeds that previously weren't auto-detected (e.g. TechCrunch tag pages).

Category support — organize your blogs into named groups and filter by category:

blogwatcher add "Tech Blog" https://example.com -c engineering
blogwatcher edit "Tech Blog" -c research
blogwatcher blogs -c engineering
blogwatcher articles -c engineering
blogwatcher categories

Features


📡 Dual Source Support	Tries RSS feeds first, falls back to HTML scraping
🔍 Auto Feed Discovery	Detects RSS/Atom URLs from blog homepages
📥 OPML Import 🆕	Bulk-import subscriptions from Feedly / Inoreader OPML exports
🔗 Better Feed Detection 🆕	Finds feeds via Content-Type headers and `rel="self"` links
🗂️ Category Support	Organize blogs into named groups, filter articles by category
✅ Read/Unread Tracking	Keep track of what you've read
🚫 Duplicate Prevention	Never tracks the same article twice
⚡ Concurrent Scanning	Configurable parallel workers

Installation

# Install the CLI
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest

# Or build locally
git clone https://github.com/hanw39/blogwatcher
cd blogwatcher
go build ./cmd/blogwatcher

Windows and Linux binaries are available on the Releases page.

Usage

Importing from OPML

# Import all subscriptions from a feed reader export (Feedly, Inoreader, etc.)
blogwatcher import subscriptions.opml

Handles OPML 1.0/2.0 and nested categories. Duplicate blogs are reported but not re-added.

Adding Blogs

# Add a blog (auto-discovers RSS feed)
blogwatcher add "My Favorite Blog" https://example.com/blog

# Add with explicit feed URL
blogwatcher add "Tech Blog" https://techblog.com --feed-url https://techblog.com/rss.xml

# Add with HTML scraping selector (for blogs without feeds)
blogwatcher add "No-RSS Blog" https://norss.com --scrape-selector "article h2 a"

# Add and assign to a category (created automatically if it doesn't exist)
blogwatcher add "Tech Blog" https://techblog.com -c engineering

Managing Blogs

# List all tracked blogs
blogwatcher blogs

# Filter blogs by category
blogwatcher blogs -c engineering

# Remove a blog (and all its articles)
blogwatcher remove "My Favorite Blog"

# Remove without confirmation
blogwatcher remove "My Favorite Blog" -y

Editing Blogs

# Assign a blog to a category
blogwatcher edit "Tech Blog" -c engineering

# Remove a blog from its category
blogwatcher edit "Tech Blog" -c ""

Managing Categories

# List all categories with blog counts
blogwatcher categories

Categories (3):

  changelog  3 blogs
  engineering  7 blogs
  research  3 blogs

Scanning for New Articles

# Scan all blogs (8 concurrent workers by default)
blogwatcher scan

# Scan a specific blog
blogwatcher scan "Tech Blog"

# Custom workers
blogwatcher scan -w 4

# Silent mode (outputs "scan done" when complete — useful for cron)
blogwatcher scan -s

Viewing Articles

# List unread articles
blogwatcher articles

# List all articles (including read)
blogwatcher articles -a

# Filter by blog
blogwatcher articles -b "Tech Blog"

# Filter by category
blogwatcher articles -c engineering

# Combine filters
blogwatcher articles -a -c engineering

Managing Read Status

# Mark an article as read (use article ID shown in articles list)
blogwatcher read 42

# Mark as unread
blogwatcher unread 42

# Mark all unread as read
blogwatcher read-all

# Mark all unread from a specific blog as read
blogwatcher read-all -b "Tech Blog" -y

How It Works

Scanning Process

For each tracked blog, BlogWatcher attempts to parse its RSS/Atom feed
If no feed URL is configured, it tries to auto-discover one from the blog homepage
If RSS parsing fails and a scrape_selector is configured, it falls back to HTML scraping
New articles are saved to the database as unread
Already-tracked articles are skipped

HTML Scraping

When RSS isn't available, provide a CSS selector that matches article links:

--scrape-selector "article h2 a"    # Links inside article h2 tags
--scrape-selector ".post-title a"   # Links with post-title class
--scrape-selector "#blog-posts a"   # Links inside blog-posts ID

Database

SQLite database at ~/.blogwatcher/blogwatcher.db:

Table	Description
`categories`	Blog categories
`blogs`	Tracked blogs (name, URL, feed URL, scrape selector, category)
`articles`	Discovered articles (title, URL, dates, read status)

Development

Requirements: Go 1.24+

# Run tests
go test ./...

# Build
go build ./cmd/blogwatcher

Publishing a Release

git tag vX.Y.Z
git push origin vX.Y.Z

License

MIT

Directories ¶

Path	Synopsis
cmd
blogwatcher command
internal
cli
controller
model
opml
rss
scanner
scraper
storage
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL