feedexport

package

v1.1.7 Latest Latest Go to latest Published: May 15, 2026 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dplcz/scrapy-go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package feedexport 实现了 scrapy-go 框架的数据导出（Feed Export）系统。

概述 ¶

feedexport 包将爬取的 Item 数据导出为多种格式（JSON、JSON Lines、CSV、XML），并支持多种存储后端（本地文件、标准输出）。对应 Scrapy Python 版本中 scrapy.extensions.feedexport 和 scrapy.exporters 模块的功能。

核心类型 ¶

本包提供以下核心类型：

ItemExporter：序列化器接口，将 Item 编码为指定格式的字节流
FeedStorage：存储后端接口，管理导出文件的打开、写入和持久化
FeedSlot：组合 Exporter 与 Storage，代表一个正在进行中的导出任务
FeedConfig：Feed 导出配置（格式、路径、字段白名单等）
ExporterOptions：Exporter 的通用配置
ItemFilterFunc：Item 过滤函数类型

内置 Exporter ¶

本包提供以下内置序列化器：

┌────────────────────────────────────────────────────────┐
│  格式         │  类型              │  说明              │
├────────────────────────────────────────────────────────┤
│  JSON         │  JSONExporter      │  JSON 数组格式     │
│  JSON Lines   │  JSONLinesExporter │  每行一个 JSON     │
│  CSV          │  CSVExporter       │  逗号分隔值        │
│  XML          │  XMLExporter       │  XML 文档格式      │
└────────────────────────────────────────────────────────┘

内置 Storage ¶

FileStorage：本地文件存储（支持路径模板变量）
StdoutStorage：标准输出存储（用于调试和管道输出）

使用方式 ¶

通过 Crawler 注册 Feed 导出：

c := crawler.New()
c.AddFeed(feedexport.FeedConfig{
    URI:    "output/items.json",
    Format: feedexport.FormatJSON,
    Fields: []string{"title", "price", "url"},
})

多格式同时导出：

c.AddFeed(feedexport.FeedConfig{URI: "items.jsonl", Format: feedexport.FormatJSONLines})
c.AddFeed(feedexport.FeedConfig{URI: "items.csv", Format: feedexport.FormatCSV})

使用标准输出（调试）：

c.AddFeed(feedexport.FeedConfig{URI: "stdout:", Format: feedexport.FormatJSON})

Exporter 生命周期 ¶

ItemExporter 的调用顺序：

StartExporting — 开始导出（写入格式前缀，如 JSON 的 "["）
ExportItem — 反复调用，每次序列化一个 Item
FinishExporting — 结束导出（写入格式后缀，如 JSON 的 "]"）

FeedSlot 工作流 ¶

FeedSlot 组合 Exporter 和 Storage 的完整工作流：

Open：通过 Storage.Open 获取 io.WriteCloser
创建 Exporter 并调用 StartExporting
每个 Item 通过 ItemFilterFunc 过滤后调用 ExportItem
Close：调用 FinishExporting，然后通过 Storage.Store 持久化

Item 序列化 ¶

所有 Exporter 内部使用 item.Adapt 将 Item 转为统一的字段访问接口，然后根据 ExporterOptions.FieldsToExport 决定导出哪些字段。这使得 Exporter 可以无差别地处理 map 和 struct 类型的 Item。

与 Scrapy 的差异 ¶

舍弃 S3/GCS/FTP 等远程存储后端，仅保留本地文件和标准输出
舍弃 PostProcessingManager（gzip/zstd 压缩），可通过 io.Writer 包装实现
舍弃 ItemFilter 的动态加载机制，改为 ItemFilterFunc 函数类型
基于 io.Writer 而非 Python 的文件句柄抽象
使用 item.Adapt 统一 Item 访问，替代 Python 的 ItemAdapter
Format 使用字符串常量而非类引用
NormalizeFormat 兼容常见别名（"jl"/"jsonl"/"jsonlines"）

Package feedexport 的字段序列化器注册表。

对齐 Scrapy 的 BaseItemExporter.serialize_field 机制： Exporter 在写入每个字段前，根据 FieldMeta 中的 "serializer" 键查表调用已注册的序列化函数，将原始值转换为导出值。

典型用法：

type Product struct {
    Price float64 `item:"price,serializer=to_int"`
}

func init() {
    feedexport.RegisterSerializer("to_int", func(v any) any {
        if f, ok := v.(float64); ok {
            return int(f)
        }
        return v
    })
}

Index ¶

func AcceptAll(_ any) bool
func ClearSerializers()
func RegisterExporter(format Format, factory ExporterFactory)
func RegisterSerializer(name string, fn SerializeFunc)
func SerializeField(meta item.FieldMeta, name string, value any) any
type CSVExporter
- func NewCSVExporter(w io.Writer, opts ExporterOptions) *CSVExporter
- func (e *CSVExporter) ExportItem(item any) error
- func (e *CSVExporter) FinishExporting() error
- func (e *CSVExporter) StartExporting() error
type ExporterFactory
- func LookupExporter(format Format) (ExporterFactory, bool)
type ExporterOptions
- func DefaultExporterOptions() ExporterOptions
type FeedConfig
type FeedSlot
- func NewFeedSlot(cfg FeedConfig, logger *slog.Logger) (*FeedSlot, error)
- func (s *FeedSlot) Close(ctx context.Context, sp spider.Spider) error
- func (s *FeedSlot) ExportItem(ctx context.Context, sp spider.Spider, item any) error
- func (s *FeedSlot) ItemCount() int
- func (s *FeedSlot) Start(ctx context.Context, sp spider.Spider) error
- func (s *FeedSlot) URI() string
type FeedStorage
- func NewStorageForURI(uri string, overwrite bool) (FeedStorage, error)
type FileStorage
- func NewFileStorage(uri string, overwrite bool) (*FileStorage, error)
- func (s *FileStorage) Open(ctx context.Context, sp spider.Spider) (io.WriteCloser, error)
- func (s *FileStorage) Path() string
- func (s *FileStorage) Store(ctx context.Context, w io.WriteCloser) error
type Format
- func NormalizeFormat(s string) Format
- func (f Format) String() string
type ItemExporter
- func NewExporter(format Format, w io.Writer, opts ExporterOptions) (ItemExporter, error)
type ItemFilterFunc
type JSONExporter
- func NewJSONExporter(w io.Writer, opts ExporterOptions) *JSONExporter
- func (e *JSONExporter) ExportItem(item any) error
- func (e *JSONExporter) FinishExporting() error
- func (e *JSONExporter) StartExporting() error
type JSONLinesExporter
- func NewJSONLinesExporter(w io.Writer, opts ExporterOptions) *JSONLinesExporter
- func (e *JSONLinesExporter) ExportItem(item any) error
- func (e *JSONLinesExporter) FinishExporting() error
- func (e *JSONLinesExporter) StartExporting() error
type SerializeFunc
- func LookupSerializer(name string) (SerializeFunc, bool)
type StdoutStorage
- func NewStdoutStorage() *StdoutStorage
- func (s *StdoutStorage) Open(ctx context.Context, sp spider.Spider) (io.WriteCloser, error)
- func (s *StdoutStorage) Store(ctx context.Context, w io.WriteCloser) error
type URIParams
- func NewURIParams(spiderName string) URIParams
- func (p URIParams) Render(template string) string
type XMLExporter
- func NewXMLExporter(w io.Writer, opts ExporterOptions) *XMLExporter
- func (e *XMLExporter) ExportItem(item any) error
- func (e *XMLExporter) FinishExporting() error
- func (e *XMLExporter) StartExporting() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func AcceptAll ¶

func AcceptAll(_ any) bool

AcceptAll 是默认过滤器，接受所有 Item。

Example ¶

ExampleAcceptAll 演示默认的 Item 过滤器。

package main

import (
	"fmt"

	"github.com/dplcz/scrapy-go/pkg/feedexport"
)

func main() {
	// AcceptAll 接受所有 Item
	fmt.Println(feedexport.AcceptAll(map[string]any{"title": "test"}))
	fmt.Println(feedexport.AcceptAll(nil))
	fmt.Println(feedexport.AcceptAll("anything"))

}

Output:
true
true
true

func ClearSerializers ¶

func ClearSerializers()

ClearSerializers 清空所有已注册的序列化函数（仅用于测试）。

func RegisterExporter ¶

func RegisterExporter(format Format, factory ExporterFactory)

RegisterExporter 注册一个自定义 Exporter 工厂函数。若 format 已存在，会被覆盖。线程安全。

func RegisterSerializer ¶

func RegisterSerializer(name string, fn SerializeFunc)

RegisterSerializer 注册一个命名的字段序列化函数。若 name 已存在，会被覆盖。线程安全。

对齐 Scrapy 的 Field(serializer=func) 机制，但采用显式注册表模式替代 Python 的动态函数引用。

用法：

feedexport.RegisterSerializer("to_int", func(v any) any {
    if f, ok := v.(float64); ok {
        return int(f)
    }
    return v
})

func SerializeField ¶

func SerializeField(meta item.FieldMeta, name string, value any) any

SerializeField 对齐 Scrapy 的 BaseItemExporter.serialize_field 方法。

逻辑：

从 FieldMeta 中读取 "serializer" 键
若命中已注册的 SerializeFunc，调用并返回转换后的值
未命中则回退返回原始值（identity）

参数：

meta: 字段元数据（可为 nil）
name: 字段名（用于日志/调试，当前未使用）
value: 原始字段值

Types ¶

type CSVExporter ¶

type CSVExporter struct {
	// contains filtered or unexported fields
}

CSVExporter 将 Item 序列化为 CSV 格式。对应 Scrapy 的 CsvItemExporter。

特性：

第一行可选为字段名标题行（通过 IncludeHeadersLine 控制）
切片字段自动用 JoinMultivalued 连接（默认 ","）
字段顺序来自首个 Item，除非 FieldsToExport 显式指定
复杂类型（map、嵌套 struct）通过 fmt.Sprintf("%v") 格式化

注意：CSV 格式的字段集合必须在所有 Item 间保持一致——首个 Item 的字段集合将作为表头，后续 Item 中不存在的字段输出空字符串。

func NewCSVExporter ¶

func NewCSVExporter(w io.Writer, opts ExporterOptions) *CSVExporter

NewCSVExporter 创建一个 CSV 格式的 Exporter。

func (*CSVExporter) ExportItem ¶

func (e *CSVExporter) ExportItem(item any) error

ExportItem 输出一行 CSV 记录。

func (*CSVExporter) FinishExporting ¶

func (e *CSVExporter) FinishExporting() error

FinishExporting 刷新缓冲并标记导出结束。

func (*CSVExporter) StartExporting ¶

func (e *CSVExporter) StartExporting() error

StartExporting 标记开始导出。若 FieldsToExport 已指定，会立即写入表头；否则延迟到首个 Item 写入时确定。

type ExporterFactory ¶

type ExporterFactory func(w io.Writer, opts ExporterOptions) ItemExporter

ExporterFactory 根据指定的 Writer 和 Options 构造一个 ItemExporter。框架内置了所有核心格式的工厂函数，用户也可以通过 RegisterExporter 注册自定义格式。

func LookupExporter ¶

func LookupExporter(format Format) (ExporterFactory, bool)

LookupExporter 按格式查找 Exporter 工厂。返回 nil 和 false 表示未注册。

type ExporterOptions ¶

type ExporterOptions struct {
	// Encoding 指定文本编码（如 "utf-8"）。
	// 空字符串表示使用默认编码（通常为 utf-8）。
	Encoding string

	// Indent 指定缩进空格数。
	//   - 0 或负值：紧凑输出（不缩进）
	//   - > 0     ：按此空格数进行缩进
	// 仅 JSON 与 XML 使用；CSV / JSON Lines 忽略此字段。
	Indent int

	// FieldsToExport 指定要导出的字段白名单。
	// 为空时：导出 Item 的全部字段。
	// 导出顺序与 FieldsToExport 保持一致。
	FieldsToExport []string

	// IncludeHeadersLine 仅对 CSV 生效，是否在第一行输出字段名。
	// 默认 true。
	IncludeHeadersLine bool

	// JoinMultivalued 仅对 CSV 生效，用于将切片字段连接为单个字符串的分隔符。
	// 默认 ","。
	JoinMultivalued string

	// ItemElement 仅对 XML 生效，每个 Item 对应的元素名。默认 "item"。
	ItemElement string

	// RootElement 仅对 XML 生效，XML 根元素名。默认 "items"。
	RootElement string
}

ExporterOptions 是 Exporter 的通用配置。不同 Exporter 可能忽略部分字段。

func DefaultExporterOptions ¶

func DefaultExporterOptions() ExporterOptions

DefaultExporterOptions 返回默认配置。

Example ¶

ExampleDefaultExporterOptions 演示默认导出配置。

package main

import (
	"fmt"

	"github.com/dplcz/scrapy-go/pkg/feedexport"
)

func main() {
	opts := feedexport.DefaultExporterOptions()

	fmt.Println("Encoding:", opts.Encoding)
	fmt.Println("Indent:", opts.Indent)
	fmt.Println("IncludeHeaders:", opts.IncludeHeadersLine)
	fmt.Println("JoinMultivalued:", opts.JoinMultivalued)
	fmt.Println("ItemElement:", opts.ItemElement)
	fmt.Println("RootElement:", opts.RootElement)

}

Output:
Encoding: utf-8
Indent: 0
IncludeHeaders: true
JoinMultivalued: ,
ItemElement: item
RootElement: items

type FeedConfig ¶

type FeedConfig struct {
	// URI 导出目标 URI（如 "output.json"、"file:///tmp/x.csv"、"stdout:"）。
	// 支持 URI 模板占位符，详见 URIParams.Render。
	URI string

	// Format 导出格式。
	Format Format

	// Overwrite 为 true 时覆盖已有文件（仅对 FileStorage 有效）。
	Overwrite bool

	// StoreEmpty 为 true 时即使没有 Item 也会创建输出文件。
	StoreEmpty bool

	// Options 传递给 Exporter 的配置。
	Options ExporterOptions

	// Filter 决定 Item 是否应导出到此 Feed。为空时接受所有 Item。
	Filter ItemFilterFunc

	// Storage 可选地显式指定一个已构造的 FeedStorage。
	// 若为 nil，FeedExport 会通过 NewStorageForURI(URI) 自动构造。
	Storage FeedStorage
}

FeedConfig 描述一个 Feed 导出目标的全部配置。对应 Scrapy 中 FEEDS 字典的一个条目（uri → options）。

type FeedSlot ¶

type FeedSlot struct {
	// contains filtered or unexported fields
}

FeedSlot 组合一个 Storage 与一个 Exporter，代表一个正在进行中的 Feed 导出任务。对应 Scrapy 的 FeedSlot。

生命周期：

NewFeedSlot — 构造
Start — 打开 Storage，启动 Exporter
ExportItem — 反复调用
Close — 结束 Exporter，提交 Storage

线程安全性：FeedExport 扩展通过监听同步信号 item_scraped 串行化调用，因此 FeedSlot 本身无需加锁。为防御性考虑，核心方法上仍加 mutex。

func NewFeedSlot ¶

func NewFeedSlot(cfg FeedConfig, logger *slog.Logger) (*FeedSlot, error)

NewFeedSlot 构造一个 FeedSlot。调用方必须在 Start 之前完成所有配置填充。

func (*FeedSlot) Close ¶

func (s *FeedSlot) Close(ctx context.Context, sp spider.Spider) error

Close 结束 Exporter 并提交 Storage。如果未曾写入任何 Item 且 StoreEmpty=false，则跳过（不创建空文件）。即使处理过程中出错，也会尽力关闭所有资源。

func (*FeedSlot) ExportItem ¶

func (s *FeedSlot) ExportItem(ctx context.Context, sp spider.Spider, item any) error

ExportItem 写入一个 Item。如果 Filter 拒绝此 Item，直接返回 nil。如果 FeedSlot 尚未启动，会自动启动（延迟启动，适配 StoreEmpty=false 场景）。

func (*FeedSlot) ItemCount ¶

func (s *FeedSlot) ItemCount() int

ItemCount 返回已导出的 Item 数量。

func (*FeedSlot) Start ¶

func (s *FeedSlot) Start(ctx context.Context, sp spider.Spider) error

Start 打开 Storage 并初始化 Exporter。幂等：多次调用只会生效一次。

func (*FeedSlot) URI ¶

func (s *FeedSlot) URI() string

URI 返回已渲染的目标 URI。

type FeedStorage ¶

type FeedStorage interface {
	// Open 打开存储，返回一个可写入的 io.WriteCloser。
	// sp 为当前 Spider，实现可根据 Spider 名称等信息定制路径。
	// 返回的 WriteCloser 必须由调用方通过 Store 传回以便正确关闭。
	Open(ctx context.Context, sp spider.Spider) (io.WriteCloser, error)

	// Store 将 Open 返回的 writer 的内容持久化。
	// 实现应负责关闭 writer（若尚未关闭）。
	// 对于直接写文件的实现，Store 可能只是关闭句柄；
	// 对于需要两阶段提交的实现（如临时文件 + rename），Store 会执行 rename。
	Store(ctx context.Context, w io.WriteCloser) error
}

FeedStorage 定义导出存储后端接口。对应 Scrapy 的 FeedStorageProtocol / IFeedStorage。

生命周期：

Open — Spider 打开时调用，返回一个 io.WriteCloser 供 Exporter 写入。
Store — Spider 关闭时调用，将 Open 返回的 writer 的内容持久化到最终位置。

典型实现场景：

FileStorage : Open 直接打开目标文件，Store 关闭文件句柄即完成。
StdoutStorage: Open 返回 os.Stdout 的包装，Store 空实现。

线程安全性：单个 FeedStorage 实例由单个 FeedSlot 独占使用，无需并发保护。

func NewStorageForURI ¶

func NewStorageForURI(uri string, overwrite bool) (FeedStorage, error)

NewStorageForURI 根据 URI 的 scheme 自动选择存储后端。支持：

"stdout:" 或 "-" → StdoutStorage
"file://...", 相对/绝对路径 → FileStorage

不支持的 scheme（如 s3://、ftp://）会返回错误。

type FileStorage ¶

type FileStorage struct {
	// contains filtered or unexported fields
}

FileStorage 将导出数据写入本地文件。对应 Scrapy 的 FileFeedStorage。

特性：

支持 "file://" 前缀 URI 或普通路径
支持目录自动创建（父目录不存在时递归创建）
overwrite=true 时覆盖已有文件，否则以追加模式打开

func NewFileStorage ¶

func NewFileStorage(uri string, overwrite bool) (*FileStorage, error)

NewFileStorage 根据 URI 创建一个本地文件存储。 URI 可以是：

"output.json" — 相对路径
"/tmp/output.json" — 绝对路径
"file:///tmp/output.json" — 带 file:// scheme

func (*FileStorage) Open ¶

func (s *FileStorage) Open(ctx context.Context, sp spider.Spider) (io.WriteCloser, error)

Open 打开（或创建）目标文件，返回 io.WriteCloser。

func (*FileStorage) Path ¶

func (s *FileStorage) Path() string

Path 返回存储的实际文件路径。

func (*FileStorage) Store ¶

func (s *FileStorage) Store(ctx context.Context, w io.WriteCloser) error

Store 关闭文件句柄。由于 FileStorage 直接写入目标位置，无需额外的 rename 操作。

type Format ¶

type Format string

Format 表示导出格式。

const (
	// FormatJSON 表示 JSON 数组格式（所有 Item 被封装在一个 JSON 数组中）。
	FormatJSON Format = "json"

	// FormatJSONLines 表示 JSON Lines 格式（每行一个 JSON 对象）。
	// 也常被简写为 "jsonl" 或 "jl"。
	FormatJSONLines Format = "jsonlines"

	// FormatCSV 表示 CSV 格式。
	FormatCSV Format = "csv"

	// FormatXML 表示 XML 格式。
	FormatXML Format = "xml"
)

func NormalizeFormat ¶

func NormalizeFormat(s string) Format

NormalizeFormat 归一化格式名，兼容常见别名。例如 "jl" / "jsonl" / "jsonlines" 都归一化为 FormatJSONLines。

Example ¶

ExampleNormalizeFormat 演示格式名归一化。

package main

import (
	"fmt"

	"github.com/dplcz/scrapy-go/pkg/feedexport"
)

func main() {
	fmt.Println(feedexport.NormalizeFormat("json"))
	fmt.Println(feedexport.NormalizeFormat("jl"))
	fmt.Println(feedexport.NormalizeFormat("jsonl"))
	fmt.Println(feedexport.NormalizeFormat("jsonlines"))
	fmt.Println(feedexport.NormalizeFormat("csv"))
	fmt.Println(feedexport.NormalizeFormat("xml"))

}

Output:
json
jsonlines
jsonlines
jsonlines
csv
xml

func (Format) String ¶

func (f Format) String() string

String 返回格式的字符串表示。

type ItemExporter ¶

type ItemExporter interface {
	// StartExporting 开始导出过程，必须在调用 ExportItem 之前调用。
	// 实现可以在此写入格式特定的前缀（如 JSON 的 "["、XML 的根元素开标签）。
	StartExporting() error

	// ExportItem 序列化一个 Item 并写入底层 writer。
	ExportItem(item any) error

	// FinishExporting 结束导出过程。
	// 实现可以在此写入格式特定的后缀（如 JSON 的 "]"、XML 的根元素闭标签）并刷新缓冲。
	FinishExporting() error
}

ItemExporter 定义 Item 序列化器接口。对应 Scrapy 的 BaseItemExporter。

生命周期：

StartExporting — 开始导出（如写入 JSON 数组的 "["）
ExportItem — 反复调用，每次序列化一个 Item
FinishExporting— 结束导出（如写入 JSON 数组的 "]"）

实现应当是非并发安全的：同一个 Exporter 由单个 FeedSlot 独占使用， Feed Export 扩展通过信号串行化调用。

func NewExporter ¶

func NewExporter(format Format, w io.Writer, opts ExporterOptions) (ItemExporter, error)

NewExporter 根据格式构造一个 Exporter。 format 会被 NormalizeFormat 归一化后查找。

type ItemFilterFunc ¶

type ItemFilterFunc func(item any) bool

ItemFilterFunc 决定一个 Item 是否应该被导出到某个 Feed。返回 true 表示接受，false 表示过滤。

对应 Scrapy 的 ItemFilter.accepts；此处采用函数类型而非接口，以契合 Go 的函数式风格并避免不必要的类型层次。

type JSONExporter ¶

type JSONExporter struct {
	// contains filtered or unexported fields
}

JSONExporter 将 Item 序列化为 JSON 数组。对应 Scrapy 的 JsonItemExporter。

输出格式示例（indent = 0）：

[{"a":1},{"b":2}]

输出格式示例（indent > 0）：

[
  {"a": 1},
  {"b": 2}
]

注意：由于 JSON 数组需要在 Item 与 Item 之间插入 ","，本实现会累积状态（firstItem 标记），因此必须按 Start → Export → Finish 顺序使用。

func NewJSONExporter ¶

func NewJSONExporter(w io.Writer, opts ExporterOptions) *JSONExporter

NewJSONExporter 创建一个 JSON 数组格式的 Exporter。

func (*JSONExporter) ExportItem ¶

func (e *JSONExporter) ExportItem(item any) error

ExportItem 写入一个 Item。若 Item 与上一个 Item 之间需要分隔符，会自动写入 ","。

func (*JSONExporter) FinishExporting ¶

func (e *JSONExporter) FinishExporting() error

FinishExporting 写入 "]" 结束 JSON 数组。

func (*JSONExporter) StartExporting ¶

func (e *JSONExporter) StartExporting() error

StartExporting 写入 "[" 开始 JSON 数组。

type JSONLinesExporter ¶

type JSONLinesExporter struct {
	// contains filtered or unexported fields
}

JSONLinesExporter 将 Item 序列化为 JSON Lines 格式（每行一个 JSON 对象）。对应 Scrapy 的 JsonLinesItemExporter。

输出格式示例：

{"a":1}
{"b":2}

相比 JSONExporter，JSON Lines 格式天然流式友好：

每个 Item 独立，可以增量读取
不需要数组开闭标记，Item 之间无依赖
大数据集场景下更易于处理

func NewJSONLinesExporter ¶

func NewJSONLinesExporter(w io.Writer, opts ExporterOptions) *JSONLinesExporter

NewJSONLinesExporter 创建一个 JSON Lines 格式的 Exporter。

func (*JSONLinesExporter) ExportItem ¶

func (e *JSONLinesExporter) ExportItem(item any) error

ExportItem 序列化一个 Item 并追加 "\n"。

func (*JSONLinesExporter) FinishExporting ¶

func (e *JSONLinesExporter) FinishExporting() error

FinishExporting 标记导出结束。JSON Lines 无需写入后缀。

func (*JSONLinesExporter) StartExporting ¶

func (e *JSONLinesExporter) StartExporting() error

StartExporting 标记开始导出。JSON Lines 无需写入前缀。

type SerializeFunc ¶

type SerializeFunc func(value any) any

SerializeFunc 定义字段序列化函数。接收原始字段值，返回序列化后的值。

对应 Scrapy 中 Field(serializer=func) 的 serializer 参数。

func LookupSerializer ¶

func LookupSerializer(name string) (SerializeFunc, bool)

LookupSerializer 按名称查找已注册的序列化函数。返回 nil 和 false 表示未注册。

type StdoutStorage ¶

type StdoutStorage struct{}

StdoutStorage 将导出数据直接写入 os.Stdout。对应 Scrapy 的 StdoutFeedStorage。

特性：

Open 返回一个对 os.Stdout 的 no-op wrapper（防止外部关闭 Stdout）
Store 为 no-op
overwrite 选项无意义，保留仅为接口一致性

func NewStdoutStorage ¶

func NewStdoutStorage() *StdoutStorage

NewStdoutStorage 创建一个标准输出存储。

func (*StdoutStorage) Open ¶

func (s *StdoutStorage) Open(ctx context.Context, sp spider.Spider) (io.WriteCloser, error)

Open 返回 Stdout 的安全包装。

func (*StdoutStorage) Store ¶

func (s *StdoutStorage) Store(ctx context.Context, w io.WriteCloser) error

Store no-op（Stdout 不需要后处理）。

type URIParams ¶

type URIParams struct {
	// SpiderName 为 Spider.Name()
	SpiderName string

	// Time 为爬取开始的 UTC 时间，格式化字符串 "YYYY-MM-DDTHH-MM-SS"
	Time string

	// BatchTime 等同于 Time（保留兼容性）
	BatchTime string

	// BatchID 批次 ID，从 1 开始，当前 Go 版本不支持分批，恒为 1
	BatchID int

	// Extra 是用户自定义的额外变量
	Extra map[string]string
}

URIParams 存储可用于 URI 模板的变量。对应 Scrapy 的 _get_uri_params 函数返回值。

func NewURIParams ¶

func NewURIParams(spiderName string) URIParams

NewURIParams 生成一个默认的 URIParams。

func (URIParams) Render ¶

func (p URIParams) Render(template string) string

Render 渲染 URI 模板，替换形如 "%(name)s" / "%(batch_id)d" 的占位符。不识别的占位符会保留原字符串。

支持的占位符：

%(name)s — Spider 名称
%(time)s — 爬取开始时间
%(batch_time)s — 等同 time
%(batch_id)d — 批次 ID
%(<key>)s — 来自 Extra

type XMLExporter ¶

type XMLExporter struct {
	// contains filtered or unexported fields
}

XMLExporter 将 Item 序列化为 XML 格式。对应 Scrapy 的 XmlItemExporter。

输出格式示例（RootElement=items, ItemElement=item）：

<?xml version="1.0" encoding="utf-8"?>
<items>
  <item>
    <name>Foo</name>
    <price>10</price>
  </item>
</items>

字段值支持以下递归规则：

切片/数组：每个元素展开为 <value>…</value> 子节点
map：每个键值对展开为 <key>value</key> 子节点
其他类型：使用 fmt.Sprint 转为文本

func NewXMLExporter ¶

func NewXMLExporter(w io.Writer, opts ExporterOptions) *XMLExporter

NewXMLExporter 创建一个 XML 格式的 Exporter。

func (*XMLExporter) ExportItem ¶

func (e *XMLExporter) ExportItem(item any) error

ExportItem 写入一个 Item 元素。

func (*XMLExporter) FinishExporting ¶

func (e *XMLExporter) FinishExporting() error

FinishExporting 写入根元素闭标签并刷新缓冲。

func (*XMLExporter) StartExporting ¶

func (e *XMLExporter) StartExporting() error

StartExporting 写入 XML 声明和根元素开标签。

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL