nntpDirectSearch

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2026 License: MIT Imports: 16 Imported by: 1

README

nntpDirectSearch

A high-performance Go library for efficient NNTP (Network News Transfer Protocol) group scanning and searching. It provides utilities to locate message boundaries by date and build NZB files from message subject parsing and matching using concurrent workers and connection pooling.

Features

  • Concurrent Message Scanning: Uses multiple concurrent workers to efficiently scan large NNTP groups
  • Connection Pooling: Manages a shared pool of NNTP connections for optimal resource utilization
  • Date-Based Boundary Detection: Find the first and last messages within a specific date range using binary search over overview windows
  • NZB FIle Generation: Build NZB files from message subject parsing and matching
  • Progress Tracking: Built-in metrics for monitoring lines read and bytes processed
  • Configurable Timeouts & Retries: Tune overview request timeouts and retry strategies
  • Context Support: Full support for cancellable operations via Go contexts
  • Debug Logging: Non-blocking debug logging via a buffered channel

Installation

go get github.com/Tensai75/nntpDirectSearch

Dependencies

This library requires:

  • github.com/Tensai75/nntpPool - NNTP connection pooling
  • github.com/Tensai75/nntp - NNTP protocol implementation
  • github.com/Tensai75/nzbparser - NZB file parsing and generation
  • github.com/Tensai75/subjectparser - Subject line parsing for NZB metadata

Usage

Basic Setup
package main

import (
	"context"
	"log"
	"time"

	"github.com/Tensai75/nntpDirectSearch"
	"github.com/Tensai75/nntpPool"
)

func main() {
	// Create a connection pool
	pool := nntpPool.NewPool("news.server.com:119", 10, time.Minute)

	// Create a DirectSearch instance
	ds, err := nntpDirectSearch.New(pool, context.Background())
	if err != nil {
		log.Fatal(err)
	}

	// Optionally configure scanning behavior
	config := nntpDirectSearch.DirectSearchConfig{
		Connections:                20,    // Number of concurrent connections
		Step:                       20000, // MessageScanner range step size
		OverviewTimeout:            5,     // Timeout in seconds
		OverviewRetries:            3,     // Number of retries
		BoundariesScannerStep:      1000,  // Overview window size used by BoundariesScanner
		BoundariesScannerTolerance: 15,    // Date tolerance in seconds for boundary convergence
	}
	if err := ds.SetConfig(config); err != nil {
		log.Fatal(err)
	}

	// Select a newsgroup
	if err := ds.SwitchToGroup("alt.binaries.example"); err != nil {
		log.Fatal(err)
	}

	// Read debug logs (optional)
	go func() {
		for msg := range ds.Log {
			log.Println("[DEBUG]", msg)
		}
	}()

	// Use the scanners...
}
Finding Message Boundaries by Date

Use BoundariesScanner to locate the first and last messages within a specific date range:

BoundariesScanner does not compare only single-message timestamps. It fetches overview windows and uses the median timestamp of each window as the representative "average" date while converging on first/last boundaries.

Using a median over a window is more robust than using a single article date, because occasional outlier timestamps do not significantly skew the boundary decision.

startDate := time.Date(2024, 1, 1, 0, 0, 0, 0, time.UTC)
endDate := time.Date(2024, 1, 31, 23, 59, 59, 0, time.UTC)

// Progress callback (optional)
iteration := 0
progressFunc := func() {
	log.Printf("Progress: %d/%d iterations\n",
		iteration++,
		ds.MaxBoundariesScannerIterations)
}

result, err := ds.BoundariesScanner(startDate, endDate, progressFunc)
if err != nil {
	log.Fatal(err)
}

log.Printf("First message: ID=%d, Date=%v\n",
	result.FirstMessage.MessageID,
	result.FirstMessage.Date)
log.Printf("Last message: ID=%d, Date=%v\n",
	result.LastMessage.MessageID,
	result.LastMessage.Date)
Scanning Messages for Content

Use MessageScanner to search for messages containing a specific header and build NZB files:

// Scan messages from 100000 to 200000 for header "Subject:"
linesToRead := 20000 - 10000 + 1
progressFunc := func() {
	linesRead := ds.GetLinesRead()
	bytesRead := ds.GetBytesRead()
	log.Printf("Scanned: %d/%d lines, %d bytes\n", linesRead, linesToRead, bytesRead)
}

nzbResults, err := ds.MessageScanner("Subject:", 100000, 200000, progressFunc)
if err != nil {
	log.Fatal(err)
}

for _, nzb := range nzbResults {
	log.Printf("Found NZB: %s\n", nzb.Head)
}

Configuration

The DirectSearchConfig struct controls scanning behavior:

type DirectSearchConfig struct {
	Connections                uint // Concurrent connections (must be > 0)
	Step                       uint // MessageScanner range step size (must be > 0)
	OverviewTimeout            uint // Overview request timeout in seconds (must be > 0)
	OverviewRetries            uint // Number of retries (must be > 0)
	BoundariesScannerStep      uint // Overview window size for BoundariesScanner (must be > 0)
	BoundariesScannerTolerance uint // Date tolerance in seconds for BoundariesScanner (must be > 0)
}

All fields are required and must be greater than zero. Validation errors are returned via SetConfig().

Error Handling

Common error scenarios:

  • ErrNoGroupSelected - A scan was attempted without selecting a group first
  • ErrGroupHasNoArticles - The selected group contains no usable articles
  • ErrInvalidMessageRange - Invalid message range (e.g. start/end is 0, or start > end)
  • ErrInvalidDateRange - Start date is after end date
  • ErrNoMessageFoundAfterStartDate - No messages exist on or after the start date
  • ErrNoMessageFoundBeforeEndDate - No messages exist on or before the end date
  • ErrFirstMessageInGroupIsAfterEndDate - Group's first-message median is after search end date
  • ErrLastMessageInGroupIsBeforeStartDate - Group's last-message median is before search start date

Documentation

Overview

Package nntpDirectSearch provides high-level NNTP group scanning utilities to locate message boundaries by date and to build NZB results from message overviews. It coordinates concurrent workers over a shared connection pool and exposes progress metrics for long-running scans.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrUnknownError indicates an unexpected internal failure.
	ErrUnknownError = fmt.Errorf("unknown error")
	// ErrUnexpectedError indicates an unexpected error that should be reported to the developer
	ErrUnexpectedError = fmt.Errorf("unexpected error - please report this error on https://github.com/Tensai75/nzb-monkey-go/issues")
	// ErrInvalidDateRange indicates the start date is after the end date.
	ErrInvalidDateRange = fmt.Errorf("invalid search date range")
	// ErrFirstMessageInGroupIsAfterEndDate indicates that the date of the first message in the group is after the end date.
	ErrFirstMessageInGroupIsAfterEndDate = fmt.Errorf("the date of the first message in the group is after the specified end date")
	// ErrLastMessageInGroupIsBeforeStartDate indicates that the date of the last message in the group is before the start date.
	ErrLastMessageInGroupIsBeforeStartDate = fmt.Errorf("the date of the last message in the group is before the specified start date")
	// ErrNoMessageFoundAfterStartDate indicates no message exists after the start date.
	ErrNoMessageFoundAfterStartDate = fmt.Errorf("no messages found after the specified start date")
	// ErrNoMessageFoundBeforeEndDate indicates no message exists before the end date.
	ErrNoMessageFoundBeforeEndDate = fmt.Errorf("no messages found before the specified end date")
	// ErrNoMessagesFoundWithinRange indicates no messages were found within the search range.
	ErrNoMessagesFoundWithinRange = fmt.Errorf("no messages found within search range")
	// ErrAllRequestsTimedOut indicates all overview requests exceeded the timeout.
	ErrAllRequestsTimedOut = fmt.Errorf("all requests timed out")
	// ErrAllRequestsFailed wraps the last error seen after all overview requests failed.
	ErrAllRequestsFailed = func(err error) error {
		return fmt.Errorf("all requests failed - last error: %v", err)
	}
	// ErrRequestFailed wraps a failed overview request with retry count and range.
	ErrRequestFailed = func(retries, overviewStart, overviewEnd uint, err error) error {
		return fmt.Errorf("request failed after %d attempts for range %d-%d: %w", retries, overviewStart, overviewEnd, err)
	}
)
View Source
var (
	// ErrInvalidMessageRange indicates the provided message range is invalid.
	ErrInvalidMessageRange = fmt.Errorf("invalid message range")
	// ErrMessageScannerCancelled indicates the message scan was cancelled via context.
	ErrMessageScannerCancelled = fmt.Errorf("message scanner cancelled")
	// ErrOverviewReaderFailed indicates the overview reader exceeded retry limits.
	ErrOverviewReaderFailed = func(retries, first, last uint) error {
		return fmt.Errorf("overview reader failed after %d retries for range %d-%d", retries, first, last)
	}
	// ErrRetrievingMessageOverview wraps errors encountered while requesting overviews.
	ErrRetrievingMessageOverview = func(first, last uint, err error) error {
		return fmt.Errorf("retrieving message overview failed for range %d-%d: %v", first, last, err)
	}
)
View Source
var (
	// ErrPoolIsNil is returned when a nil connection pool is provided to New.
	ErrPoolIsNil = fmt.Errorf("nntpPool cannot be nil")
	// ErrGroupHasNoArticles indicates the selected group has no usable article range.
	ErrGroupHasNoArticles = fmt.Errorf("selected group has no articles")
	// ErrNoGroupSelected indicates a scan was attempted without selecting a group.
	ErrNoGroupSelected = fmt.Errorf("no group selected")
	// ErrMessageScannerAlreadyRunning indicates a scan is already running on this DirectSearch instance.
	ErrMessageScannerAlreadyRunning = fmt.Errorf("message scanner already running")
	// ErrBoundariesScannerAlreadyRunning indicates a boundaries scan is already running on this DirectSearch instance.
	ErrBoundariesScannerAlreadyRunning = fmt.Errorf("boundaries scanner already running")
	// ErrConnectionsMustBeGreaterThanZero indicates an invalid connection count in config.
	ErrConnectionsMustBeGreaterThanZero = fmt.Errorf("connections must be greater than zero")
	// ErrStepMustBeGreaterThanZero indicates an invalid step size in config.
	ErrStepMustBeGreaterThanZero = fmt.Errorf("step must be greater than zero")
	// ErrOverviewTimeoutMustBeGreaterThanZero indicates an invalid overview timeout in config.
	ErrOverviewTimeoutMustBeGreaterThanZero = fmt.Errorf("overview timeout must be greater than zero")
	// ErrOverviewRetriesMustBeGreaterThanZero indicates an invalid overview retry count in config.
	ErrOverviewRetriesMustBeGreaterThanZero = fmt.Errorf("overview retries must be greater than zero")
	// ErrBoundariesScannerStepMustBeGreaterThanZero indicates an invalid boundaries scanner step size in config.
	ErrBoundariesScannerStepMustBeGreaterThanZero = fmt.Errorf("boundaries scanner step must be greater than zero")
	// ErrBoundariesScannerToleranceMustBeGreaterThanZero indicates an invalid boundaries scanner tolerance in config.
	ErrBoundariesScannerToleranceMustBeGreaterThanZero = fmt.Errorf("boundaries scanner tolerance must be greater than zero")
)

Functions

func FormatNumberWithApostrophe

func FormatNumberWithApostrophe(n uint) string

FormatNumberWithApostrophe formats n using apostrophes as thousand separators. For example, 12000 becomes "12'000".

func GetMD5Hash

func GetMD5Hash(text string) string

GetMD5Hash returns the lowercase hexadecimal MD5 hash of the input text.

func TimeMedian added in v0.2.0

func TimeMedian(times []time.Time) time.Time

TimeMedian returns the median time from the input slice.

Types

type BoundariesScannerResult

type BoundariesScannerResult struct {
	FirstMessage BoundariesScannerResultMessage
	LastMessage  BoundariesScannerResultMessage
}

BoundariesScannerResult contains the first and last messages found within the specified date range.

type BoundariesScannerResultMessage

type BoundariesScannerResultMessage struct {
	MessageID uint
	Date      time.Time
}

BoundariesScannerResultMessage describes a single boundary message result.

type DirectSearch

type DirectSearch struct {
	Log                            chan string // Buffered channel for logging debug messages from internal operations.
	MaxBoundariesScannerIterations uint        // Calculated maximum iterations for the boundaries scanner based on group size and step.
	// contains filtered or unexported fields
}

DirectSearch manages NNTP group selection, boundary scanning, and message scanning using a shared connection pool and cancellable context.

It holds runtime state for the current group and exposes helpers for progress reporting and metrics (lines and bytes read).

func New

New creates a DirectSearch instance using the provided connection pool and context. If ctx is nil, context.Background is used. The pool is validated by acquiring and returning a test connection. It returns an error if the pool is nil or if there is an issue acquiring a connection.

func (*DirectSearch) BoundariesScanner

func (ds *DirectSearch) BoundariesScanner(startDate, endDate time.Time, iterationFunc func()) (BoundariesScannerResult, error)

BoundariesScanner finds the first and last messages within the given date range by performing parallel scans. The optional iterationFunc is invoked on each scan iteration and can be used to report progress. It returns an error if no group is selected, if the date range is invalid, if no messages are found within the range, or if there is an issue with NNTP requests.

func (*DirectSearch) GetBytesRead

func (ds *DirectSearch) GetBytesRead() uint64

GetBytesRead returns the total number of bytes read by the active message scanner.

func (*DirectSearch) GetConfig

func (ds *DirectSearch) GetConfig() DirectSearchConfig

GetConfig returns the current DirectSearchConfig.

func (*DirectSearch) GetLinesRead

func (ds *DirectSearch) GetLinesRead() uint64

GetLinesRead returns the total number of overview lines processed by the active message scanner.

func (*DirectSearch) MessageScanner

func (ds *DirectSearch) MessageScanner(header string, firstMessage, lastMessage uint, iterationFunc func()) ([]*nzbparser.Nzb, error)

MessageScanner scans message overviews for a header substring and builds NZB results from matching subjects. The optional iterationFunc is invoked as messages are processed and can be used for progress reporting. It returns an error if no group is selected, if the message range is invalid, if no messages are found, or if there is an issue with NNTP requests.

func (*DirectSearch) SetConfig

func (ds *DirectSearch) SetConfig(config DirectSearchConfig) error

SetConfig validates and applies the DirectSearchConfig. It returns an error when any required field is zero.

func (*DirectSearch) SwitchToGroup

func (ds *DirectSearch) SwitchToGroup(group string) error

SwitchToGroup selects a new NNTP group and caches its first and last article numbers for subsequent scans. It also calculates the maximum iterations for the boundaries scanner based on the group size and configured step. It returns an error if the group has no articles or if there is an issue retrieving group information from the server.

type DirectSearchConfig

type DirectSearchConfig struct {
	Connections                uint // Number of concurrent connections to use for scanning.
	Step                       uint // Step size for scanning message ranges.
	OverviewTimeout            uint // Timeout in seconds for overview requests.
	OverviewRetries            uint // Number of retries for overview requests.
	BoundariesScannerStep      uint // Step size for boundaries scanner.
	BoundariesScannerTolerance uint // Tolerance in seconds for boundaries scanner to consider a date close enough to target.
}

DirectSearchConfig holds tunable settings for scanning behavior and retry strategies used by DirectSearch.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL