blogwatcher

module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2026 License: MIT

README

BlogWatcher

Never miss a post. Track any blog — RSS or not.

Go Version License Go Report Card GitHub release

A Go CLI tool to track blog articles, detect new posts, and manage read/unread status.
Supports RSS/Atom feeds and HTML scraping as fallback.

Forked from Hyaxia/blogwatcher.

English | 中文


Quick Start

# Install
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest

# Track a blog
blogwatcher add "Paul Graham" https://paulgraham.com/articles.html

# Scan for new articles
blogwatcher scan

# Read unread articles
blogwatcher articles

What's New

Added on top of Hyaxia/blogwatcher

OPML Import — bulk-import blog subscriptions from feed reader exports (Feedly, Inoreader, etc.):

blogwatcher import subscriptions.opml

Improved RSS Discovery — better feed detection via Content-Type headers and rel="self" links, fixing feeds that previously weren't auto-detected (e.g. TechCrunch tag pages).

Category support — organize your blogs into named groups and filter by category:

blogwatcher add "Tech Blog" https://example.com -c engineering
blogwatcher edit "Tech Blog" -c research
blogwatcher blogs -c engineering
blogwatcher articles -c engineering
blogwatcher categories

Features

📡 Dual Source Support Tries RSS feeds first, falls back to HTML scraping
🔍 Auto Feed Discovery Detects RSS/Atom URLs from blog homepages
📥 OPML Import 🆕 Bulk-import subscriptions from Feedly / Inoreader OPML exports
🔗 Better Feed Detection 🆕 Finds feeds via Content-Type headers and rel="self" links
🗂️ Category Support Organize blogs into named groups, filter articles by category
Read/Unread Tracking Keep track of what you've read
🚫 Duplicate Prevention Never tracks the same article twice
Concurrent Scanning Configurable parallel workers

Installation

# Install the CLI
go install github.com/hanw39/blogwatcher/cmd/blogwatcher@latest

# Or build locally
git clone https://github.com/hanw39/blogwatcher
cd blogwatcher
go build ./cmd/blogwatcher

Windows and Linux binaries are available on the Releases page.


Usage

Importing from OPML
# Import all subscriptions from a feed reader export (Feedly, Inoreader, etc.)
blogwatcher import subscriptions.opml

Handles OPML 1.0/2.0 and nested categories. Duplicate blogs are reported but not re-added.

Adding Blogs
# Add a blog (auto-discovers RSS feed)
blogwatcher add "My Favorite Blog" https://example.com/blog

# Add with explicit feed URL
blogwatcher add "Tech Blog" https://techblog.com --feed-url https://techblog.com/rss.xml

# Add with HTML scraping selector (for blogs without feeds)
blogwatcher add "No-RSS Blog" https://norss.com --scrape-selector "article h2 a"

# Add and assign to a category (created automatically if it doesn't exist)
blogwatcher add "Tech Blog" https://techblog.com -c engineering
Managing Blogs
# List all tracked blogs
blogwatcher blogs

# Filter blogs by category
blogwatcher blogs -c engineering

# Remove a blog (and all its articles)
blogwatcher remove "My Favorite Blog"

# Remove without confirmation
blogwatcher remove "My Favorite Blog" -y
Editing Blogs
# Assign a blog to a category
blogwatcher edit "Tech Blog" -c engineering

# Remove a blog from its category
blogwatcher edit "Tech Blog" -c ""
Managing Categories
# List all categories with blog counts
blogwatcher categories
Categories (3):

  changelog  3 blogs
  engineering  7 blogs
  research  3 blogs
Scanning for New Articles
# Scan all blogs (8 concurrent workers by default)
blogwatcher scan

# Scan a specific blog
blogwatcher scan "Tech Blog"

# Custom workers
blogwatcher scan -w 4

# Silent mode (outputs "scan done" when complete — useful for cron)
blogwatcher scan -s
Viewing Articles
# List unread articles
blogwatcher articles

# List all articles (including read)
blogwatcher articles -a

# Filter by blog
blogwatcher articles -b "Tech Blog"

# Filter by category
blogwatcher articles -c engineering

# Combine filters
blogwatcher articles -a -c engineering
Managing Read Status
# Mark an article as read (use article ID shown in articles list)
blogwatcher read 42

# Mark as unread
blogwatcher unread 42

# Mark all unread as read
blogwatcher read-all

# Mark all unread from a specific blog as read
blogwatcher read-all -b "Tech Blog" -y

How It Works

Scanning Process
  1. For each tracked blog, BlogWatcher attempts to parse its RSS/Atom feed
  2. If no feed URL is configured, it tries to auto-discover one from the blog homepage
  3. If RSS parsing fails and a scrape_selector is configured, it falls back to HTML scraping
  4. New articles are saved to the database as unread
  5. Already-tracked articles are skipped
HTML Scraping

When RSS isn't available, provide a CSS selector that matches article links:

--scrape-selector "article h2 a"    # Links inside article h2 tags
--scrape-selector ".post-title a"   # Links with post-title class
--scrape-selector "#blog-posts a"   # Links inside blog-posts ID

Database

SQLite database at ~/.blogwatcher/blogwatcher.db:

Table Description
categories Blog categories
blogs Tracked blogs (name, URL, feed URL, scrape selector, category)
articles Discovered articles (title, URL, dates, read status)

Development

Requirements: Go 1.24+

# Run tests
go test ./...

# Build
go build ./cmd/blogwatcher
Publishing a Release
git tag vX.Y.Z
git push origin vX.Y.Z

License

MIT

Directories

Path Synopsis
cmd
blogwatcher command
internal
cli
rss

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL