streamhub

package module

v0.1.5 Latest Latest Go to latest Published: Mar 28, 2022 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/neutrinocorp/streamhub

Links

Open Source Insights

README ¶

✉ Streamhub

Streamhub is a toolkit crafted for streaming-powered applications written in Go.

✉ Streamhub

Requirements

Go version >= 1.17

Overall Architecture

Streamhub is composed by several inner components which collaborate with each other in order to accomplish basic streaming handling operations (publishing and consuming messages).

Streamhub exposes all its operational capabilities through a simple and idiomatic API, enabling interactions between the program and the actual live infrastructure using a facade component called Hub.

Internal Hub architecture and specific flows of basic streams operations. On the left: Message publishing flow. On the right: Message consumption flow.

Message

The Message type is the unit of information which will be used to interact with multiple systems through live infrastructure.

Streamhub implements natively most of the CNCF's CloudEvents specification fields to keep consistency between messages passed through a stream.

Just as the CloudEvents specification states, depending on the underlying communication protocol from Event Buses and Message Brokers (e.g. MQTT, Apache Kafka, raw JSON, Amazon SNS), a message will be constructed accordingly to the given protocol.

For example, if using Apache Kafka, most of the message fields will be attached to binary headers instead the body of the message itself. In the other hand, if using Amazon Simple Notification Service, messages will be encoded into the raw JSON template for messages as AWS specifies on their API definition for SNS. These processes are independent from the Marshaler operations. Hence, message inner data (the actual message content) codec won't change.

For more information about CloudEvents, please review this repository.

Stream Registry

An Stream Registry is an in-memory key-value database used by both Listener Node(s) and Publisher which holds metadata about every stream that will interact with the program.

Moreover, stream metadata might contain critical information about the stream such as the name of the stream (also called topic), schema definition version and/or the schema definition name so components such as Publisher and Listener Node can find schema definitions from the Schema Registry in order to continue with their further operations normally. The stream name defined here is used by both Publisher and Listener Node(s) to interact with live infrastructure.

The Stream Registry accepts reflection-based structs which will lead to a registration with the given struct name (e.g. package_name.struct_name -> main.fooMessage) as string. In addition, the registry also accepts plain strings as keys in order to increase flexibility (one may use the stream name, e.g. foo-stream).

Note: If using plain strings as keys, remember to fulfill the GoType metadata field so the Listener Node handler can decode the incoming message data. If no GoType was found in stream metadata while consuming a message, the marshaling capabilities will be disabled to avoid program panics.

Note: Using reflection-based stream definitions will lead to performance degradation when listening to streams.

Unique Identifier Factory

A Unique Identifier Factory is a component which generates unique identifiers using an underlying concrete implementation of a unique identifier algorithm (e.g. UUID, NanoID). It is used by the Publisher component to construct unique messages.

Schema Registry

An Schema Registry is a database which holds messages schema definitions and versioning. It ensures that every message produced and consumed by the program complies with the specified schema definition.

The registry MIGHT be implemented using either external or internal underlying solutions (e.g. Third-Party service such as Amazon Glue, Host's disk or In-memory).

Note: For Apache Avro message formats, the usage of an Schema Registry is a MUST in order for the Marshaler component to decode and encode message data.

Marshaler

A Marshaler is a component in charge of message data coding and encoding.

Currently, Streamhub has Apache Avro and JSON native implementations. Nevertheless, the Marshaler interface is exported through Streamhub API to give flexibility to developers as it lets custom Marshaler implementations.

We are currently considering adding Protocol-Buffers and Flat/Flex Buffers codecs for edge cases where greater performance is required.

Message Broker / Event Bus Driver

The Message Broker / Event Bus Driver is an abstract component which enables interactions between Hub internal components and the actual stream-messaging live infrastructure (e.g. Apache Kafka, Amazon SNS/SQS, In-memory).

The driver component implements both Publisher and Listener Node interfaces. Thus, by separating behaviours through interfaces, technology heterogeneity and autonomy between processes is achieved, giving the program even greater flexibility of interaction.

For example, the program might contain a Hub which publishes messages to Amazon Simple Notification Service (SNS) while one set of Listener Nodes polls messages from Amazon Simple Queue Service (SQS) queues and another set of Listener Nodes receive messages from Apache Kafka topics.

Publisher

A Publisher is a high-level component that lets the program publish messages to desired streams defined on the message broker / event bus, so external programs may react to published messages in parallel.

Furthermore, the publisher API is designed to allow chain of responsibility pattern implementations (middlewares) in order to aggregate extra behaviours when publishing messages (e.g. logging, tracing, monitoring, retries).

Streamhub offers native implementations through the use of a Driver. Nevertheless, custom Publisher implementations crafted by developers are available as Streamhub API exposes the publisher interface.

Listener Registry

A Stream Listener Registry is an in-memory database which holds information about workers to be scheduled when Hub gets started.

Workers are also called Listener Node.

Listener Supervisor

The Listener Supervisor is an internal Hub component which manages Listener Node(s) lifecycles.

It forks new workers into the Listener Registry queue, and it schedules workers on Hub startup.

In addition, when forking new workers, the supervisor crafts a Listener Task template, using the listener node configuration, which will be later passed to Driver listener node interface implementations on Hub startup. This template is used internally by drivers to access critical data so they can interact with live infrastructure (e.g. Stream / Topic name, Consumer Groups / Queues to be used, Vendor-specific configurations such as Amazon Web Services or Shopify's Sarama lib for Apache Kafka).

Listener Node

A Listener Node is an internal Listener Supervisor component which schedules actual stream-listening jobs. These stream-listening jobs are mostly I/O blocking so the node will try to run then concurrently if a degree of parallelism was configured for the worker.

It uses the Driver listener node interface implementation to interact with live infrastructure.

Note: In order to stop Listener Node inner processes, a context cancellation MUST be issued through the root Context passed originally on Hub startup. Moreover, every node job has an internal timeout context constructed from the root context in order to avoid stream-listener jobs hang up or considerable wait times, affecting throughput directly.

Note: Every Listener Node inner process runs inside a new goroutine and uses a timeout scoped context to keep process autonomy and increase overall throughput.

Listener / ListenerFunc

Each Listener Node contains a specific-configuration as previously mentioned. This configuration holds, asides from critical data for Driver implementations, Listener and ListenerFunc interface/type which represent the entry point for desired message processing operations defined by the developer (the handler for each message received from a queue/topic).

These types/interfaces lets programs to return an error if something failed when processing the message. If no error was returned, the Driver implementation will acknowledge the message to the actual live infrastructure to avoid message re-processing issues. As side note and recommendation, remember to keep message processors idempotent to deal with the nature of distributed systems (duplicated and un-ordered messages).

Moreover, the Listener and ListenerFunc types/interfaces APIs were defined to enable chain of responsibility pattern implementations (middlewares), just as the Publisher API, to let developers add layers of extra behaviour when processing a message.

It is required to say that Streamhub adds layers of behaviour by default for every Listener/ListenerFunc forked. These behaviours include:

Exponential backoff retrying (fully customizable)
Correlation and Causation IDs injection into the handler-scoped context
Unmarshaling*
Logging*
Monitoring/Metrics*
Tracing*

* Available if properly configured

Supported infrastructure

Apache Kafka (on-premise, Confluent cloud or Amazon Managed Streaming for Apache Kafka/MSK)
Amazon Simple Notification Service (SNS) and Simple Queue Service (SQS) with the Topic-Queue chaining pattern implementation
Apache Pulsar*
MQTT-based buses/brokers (e.g. RabbitMQ, Apache ActiveMQ)*
Google Cloud PubSub*
Microsoft Azure Service Bus*
Redis Streams*

* On Streamhub's roadmap, not yet implemented.

Documentation ¶

Overview ¶

Package streamhub is a toolkit crafted for streaming-powered applications written in Go.

Index ¶

Constants
Variables
func InjectMessageCausationID(ctx context.Context, messageID string) string
func InjectMessageCorrelationID(ctx context.Context, messageID string) string
type AvroMarshaler
- func NewAvroMarshaler() AvroMarshaler
- func (a AvroMarshaler) ContentType() string
- func (a AvroMarshaler) Marshal(schemaDef string, data interface{}) (parsedData []byte, err error)
- func (a AvroMarshaler) Unmarshal(schemaDef string, data []byte, ref interface{}) (err error)
type Event
type FailingMarshalerNoop
- func (f FailingMarshalerNoop) ContentType() string
- func (f FailingMarshalerNoop) Marshal(_ string, _ interface{}) ([]byte, error)
- func (f FailingMarshalerNoop) Unmarshal(_ string, _ []byte, _ interface{}) error
type Hashing64AlgorithmFactory
type Hub
- func NewHub(opts ...HubOption) *Hub
- func (h *Hub) Listen(message interface{}, opts ...ListenerNodeOption) error
- func (h *Hub) ListenByStreamKey(stream string, opts ...ListenerNodeOption)
- func (h *Hub) RegisterStream(message interface{}, metadata StreamMetadata)
- func (h *Hub) RegisterStreamByString(messageType string, metadata StreamMetadata)
- func (h *Hub) Start(ctx context.Context)
- func (h *Hub) Write(ctx context.Context, message interface{}) error
- func (h *Hub) WriteBatch(ctx context.Context, messages ...interface{}) error
- func (h *Hub) WriteByMessageKey(ctx context.Context, messageKey string, message interface{}) error
- func (h *Hub) WriteByMessageKeyBatch(ctx context.Context, items WriteByMessageKeyBatchItems) error
- func (h *Hub) WriteRawMessage(ctx context.Context, message Message) error
- func (h *Hub) WriteRawMessageBatch(ctx context.Context, messages ...Message) error
type HubOption
- func WithIDFactory(f IDFactoryFunc) HubOption
- func WithInstanceName(n string) HubOption
- func WithListenerBaseOptions(opts ...ListenerNodeOption) HubOption
- func WithListenerBehaviours(b ...ListenerBehaviour) HubOption
- func WithListenerDriver(d ListenerDriver) HubOption
- func WithMarshaler(m Marshaler) HubOption
- func WithSchemaRegistry(r SchemaRegistry) HubOption
- func WithWriter(p Writer) HubOption
type IDFactoryFunc
type InMemorySchemaRegistry
- func (i InMemorySchemaRegistry) GetSchemaDefinition(name string, version int) (string, error)
- func (i InMemorySchemaRegistry) RegisterDefinition(name, def string, version int)
type JSONMarshaler
- func (m JSONMarshaler) ContentType() string
- func (m JSONMarshaler) Marshal(_ string, data interface{}) ([]byte, error)
- func (m JSONMarshaler) Unmarshal(_ string, data []byte, ref interface{}) error
type Listener
type ListenerBehaviour
type ListenerDriver
type ListenerFunc
type ListenerNode
type ListenerNodeOption
- func WithConcurrencyLevel(n int) ListenerNodeOption
- func WithDriver(d ListenerDriver) ListenerNodeOption
- func WithGroup(g string) ListenerNodeOption
- func WithListener(l Listener) ListenerNodeOption
- func WithListenerFunc(l ListenerFunc) ListenerNodeOption
- func WithMaxHandlerPoolSize(n int) ListenerNodeOption
- func WithProviderConfiguration(cfg interface{}) ListenerNodeOption
- func WithRetryInitialInterval(d time.Duration) ListenerNodeOption
- func WithRetryMaxInterval(d time.Duration) ListenerNodeOption
- func WithRetryTimeout(d time.Duration) ListenerNodeOption
type ListenerNoop
- func (l ListenerNoop) Listen(_ context.Context, _ Message) error
type ListenerTask
type Marshaler
type Message
- func NewMessage(args NewMessageArgs) Message
type MessageContextKey
type NewMessageArgs
type NoopSchemaRegistry
- func (n NoopSchemaRegistry) GetSchemaDefinition(_ string, _ int) (string, error)
type ProtocolBuffersMarshaler
- func (p ProtocolBuffersMarshaler) ContentType() string
- func (p ProtocolBuffersMarshaler) Marshal(_ string, data interface{}) ([]byte, error)
- func (p ProtocolBuffersMarshaler) Unmarshal(_ string, data []byte, ref interface{}) error
type SchemaRegistry
type StreamMetadata
type StreamRegistry
- func (r StreamRegistry) Get(message interface{}) (StreamMetadata, error)
- func (r StreamRegistry) GetByStreamName(name string) (StreamMetadata, error)
- func (r StreamRegistry) GetByString(key string) (StreamMetadata, error)
- func (r StreamRegistry) Set(message interface{}, metadata StreamMetadata)
- func (r StreamRegistry) SetByString(key string, metadata StreamMetadata)
type WriteByMessageKeyBatchItems
type Writer

Constants ¶

View Source

const CloudEventsSpecVersion = "1.0"

CloudEventsSpecVersion the CloudEvents specification version used by streamhub

Variables ¶

View Source

var (
	// DefaultConcurrencyLevel default stream-listening jobs to be running concurrently for each ListenerNode.
	DefaultConcurrencyLevel = 1
	// DefaultRetryInitialInterval default initial interval duration between each stream-listening job provisioning on failures.
	DefaultRetryInitialInterval = time.Second * 3
	// DefaultRetryMaxInterval default maximum interval duration between each stream-listening job provisioning on failures.
	DefaultRetryMaxInterval = time.Second * 15
	// DefaultRetryTimeout default duration of each stream-listening job provisioning on failures.
	DefaultRetryTimeout = time.Second * 15
	// DefaultMaxHandlerPoolSize default pool size of goroutines for ListenerNode's Listener(s) / ListenerFunc(s) executions.
	DefaultMaxHandlerPoolSize = 10
)

View Source

var DefaultHubInstanceName = "com.streamhub"

DefaultHubInstanceName default instance names for nameless Hub instances

View Source

var (
	// ErrInvalidProtocolBufferFormat the given data is not a valid protocol buffer message
	ErrInvalidProtocolBufferFormat = errors.New("streamhub: Invalid protocol buffer data")
)

View Source

var ErrMissingSchemaDefinition = errors.New("streamhub: Missing stream schema definition in schema registry")

ErrMissingSchemaDefinition the requested stream message definition was not found in the SchemaRegistry

View Source

var ErrMissingStream = errors.New("streamhub: Missing stream entry in stream registry")

ErrMissingStream the requested stream was not found in the StreamRegistry

View Source

var ErrMissingWriterDriver = errors.New("streamhub: Missing writer driver")

ErrMissingWriterDriver no publisher driver was found.

View Source

var ListenerBaseBehaviours = []ListenerBehaviour{
	unmarshalListenerBehaviour,
	injectGroupListenerBehaviour,
	injectTxIDsListenerBehaviour,
	retryListenerBehaviour,
}

ListenerBaseBehaviours default ListenerBehaviours

Behaviours will be executed in descending order

View Source

var ListenerBaseBehavioursNoUnmarshal = []ListenerBehaviour{
	injectGroupListenerBehaviour,
	injectTxIDsListenerBehaviour,
	retryListenerBehaviour,
}

ListenerBaseBehavioursNoUnmarshal default ListenerBehaviours without unmarshaling

Behaviours will be executed in descending order

Functions ¶

func InjectMessageCausationID ¶

func InjectMessageCausationID(ctx context.Context, messageID string) string

InjectMessageCausationID injects the causation id from the given context if available. If not, it will use the message id as fallback.

func InjectMessageCorrelationID ¶

func InjectMessageCorrelationID(ctx context.Context, messageID string) string

InjectMessageCorrelationID injects the correlation id from the given context if available. If not, it will use the message id as fallback.

Types ¶

type AvroMarshaler ¶

type AvroMarshaler struct {
	HashingFactory Hashing64AlgorithmFactory
	// contains filtered or unexported fields
}

AvroMarshaler handles data transformation between primitives and Apache Avro format.

Apache Avro REQUIRES a defined SchemaRegistry to decode/encode data.

func NewAvroMarshaler ¶

func NewAvroMarshaler() AvroMarshaler

NewAvroMarshaler allocates a new Apache Avro marshaler with a simple caching system to reduce memory footprint and computational usage when parsing Avro schema definition files.

func (AvroMarshaler) ContentType ¶

func (a AvroMarshaler) ContentType() string

ContentType retrieves the encoding/decoding Apache Avro format using RFC 2046 standard (application/avro).

func (AvroMarshaler) Marshal ¶

func (a AvroMarshaler) Marshal(schemaDef string, data interface{}) (parsedData []byte, err error)

Marshal transforms a complex data type into a primitive binary array for data transportation using Apache Avro format.

func (AvroMarshaler) Unmarshal ¶

func (a AvroMarshaler) Unmarshal(schemaDef string, data []byte, ref interface{}) (err error)

Unmarshal transforms a primitive binary array to a complex data type for data processing using Apache Avro format.

type Event ¶

type Event interface {
	// GetSubject This describes the subject of the event in the context of the event producer (identified by source).
	// In publish-subscribe scenarios, a subscriber will typically subscribe to events emitted by a source, but the
	// source identifier alone might not be sufficient as a qualifier for any specific event if the source
	// context has internal sub-structure.
	//
	// Identifying the subject of the event in context metadata (opposed to only in the data payload) is particularly
	// helpful in generic subscription filtering scenarios where middleware is unable to interpret the data content.
	// In the above example, the subscriber might only be interested in blobs with names ending with '.jpg' or '.jpeg'
	// and the subject attribute allows for constructing a simple and efficient string-suffix filter for that
	// subset of events.
	GetSubject() string
}

Event is an abstract message unit used by streamhub-based systems to publish messages with a `subject` populated field of a Message

type FailingMarshalerNoop ¶

type FailingMarshalerNoop struct{}

FailingMarshalerNoop the no-operation failing Marshaler

For testing purposes only

func (FailingMarshalerNoop) ContentType ¶

func (f FailingMarshalerNoop) ContentType() string

ContentType the failing content type operation

func (FailingMarshalerNoop) Marshal ¶

func (f FailingMarshalerNoop) Marshal(_ string, _ interface{}) ([]byte, error)

Marshal the failing marshal operation

func (FailingMarshalerNoop) Unmarshal ¶

func (f FailingMarshalerNoop) Unmarshal(_ string, _ []byte, _ interface{}) error

Unmarshal the failing unmarshal operation

type Hashing64AlgorithmFactory ¶

type Hashing64AlgorithmFactory func() hash.Hash64

Hashing64AlgorithmFactory factory for hash.Hash64 algorithms (used by Apache Avro schema definition caching system)

var DefaultHashing64AlgorithmFactory Hashing64AlgorithmFactory = func() hash.Hash64 {
	return fnv.New64a()
}

DefaultHashing64AlgorithmFactory the default hashing64 algorithm factory for Marshaler schema definition caching layer

type Hub ¶

type Hub struct {
	InstanceName        string
	StreamRegistry      StreamRegistry
	Writer              Writer
	Marshaler           Marshaler
	IDFactory           IDFactoryFunc
	SchemaRegistry      SchemaRegistry
	ListenerDriver      ListenerDriver
	ListenerBehaviours  []ListenerBehaviour
	ListenerBaseOptions []ListenerNodeOption
	// contains filtered or unexported fields
}

Hub is the main component which enables interactions between several systems through the usage of streams.

func NewHub ¶

func NewHub(opts ...HubOption) *Hub

NewHub allocates a new Hub

func (*Hub) Listen ¶

func (h *Hub) Listen(message interface{}, opts ...ListenerNodeOption) error

Listen registers a new stream-listening background job.

If listening to a Google's Protocol Buffer message, DO NOT use a pointer as message schema to avoid marshaling problems

func (*Hub) ListenByStreamKey ¶

func (h *Hub) ListenByStreamKey(stream string, opts ...ListenerNodeOption)

ListenByStreamKey registers a new stream-listening background job using the raw stream identifier (e.g. topic name).

func (*Hub) RegisterStream ¶

func (h *Hub) RegisterStream(message interface{}, metadata StreamMetadata)

RegisterStream creates a relation between a stream message type and metadata.

If registering a Google's Protocol Buffer message, DO NOT use a pointer as message schema to avoid marshaling problems

func (*Hub) RegisterStreamByString ¶

func (h *Hub) RegisterStreamByString(messageType string, metadata StreamMetadata)

RegisterStreamByString creates a relation between a string key and metadata.

func (*Hub) Start ¶

func (h *Hub) Start(ctx context.Context)

Start initiates all daemons (e.g. stream-listening jobs) processes

func (*Hub) Write ¶ added in v0.1.5

func (h *Hub) Write(ctx context.Context, message interface{}) error

Write inserts a message into a stream assigned to the message in the StreamRegistry in order to propagate the data to a set of subscribed systems for further processing.