Documentation ¶
Index ¶
- Variables
- func Hash(keys ...string) string
- type DelimiterDeserializer
- type Deserializer
- type DeserializerFunc
- func NewDelimiterDeserializer(delimiter []byte, fn func([]byte) Emitter) DeserializerFunc
- func NewGOBDeserializer(fn func() Emitter) DeserializerFunc
- func NewJSONDeserializer(fn func() Emitter) DeserializerFunc
- func NewMsgPackDeserializer(fn func() Emitter) DeserializerFunc
- func NewTSVDeserializer(constructor func() Emitter) DeserializerFunc
- type Emitter
- type Finalizer
- type GOBDeserializer
- type JSONDeserializer
- type Map
- type MsgPackDeserializer
- type Pipeline
- type Reduce
- type Reducer
- type Reducers
- type Sorter
- type TSVDeserializer
Constants ¶
This section is empty.
Variables ¶
var DefaultKeyFieldDelimiter = []byte("\t")
var NewLine = []byte("\n")
Functions ¶
Types ¶
type DelimiterDeserializer ¶
type DelimiterDeserializer struct {
// contains filtered or unexported fields
}
DelimiterDeserializer is a deserializer that splits input rows based on the configured delimiter. It is suitable for use with Hadoop Streaming.
func (*DelimiterDeserializer) Error ¶
func (g *DelimiterDeserializer) Error() error
Error returns the last error to occur or nil if there are no errors.
func (*DelimiterDeserializer) HasNext ¶
func (g *DelimiterDeserializer) HasNext() bool
HasNext advances the underlying scanner and returns true when data is available. It will return false once no data is available or an error occurs.
func (*DelimiterDeserializer) Next ¶
func (g *DelimiterDeserializer) Next() Emitter
Next retrieves the next available Emitter from the underlying scanner calling the defined constructor method.
type Deserializer ¶
type Deserializer interface { // HasNext should return true if there is data available HasNext() bool // Next should return the next available Emitter interface Next() Emitter // Error should return the last error to occur Error() error }
Deserializer is an interface for deserializing the input data into Emitter interfaces which can then be used within other pipeline methods.
type DeserializerFunc ¶
type DeserializerFunc func(io.Reader) Deserializer
DeserializerFunc is a function that accepts an io.Reader and returns a Deserializer interface.
func NewDelimiterDeserializer ¶
func NewDelimiterDeserializer(delimiter []byte, fn func([]byte) Emitter) DeserializerFunc
NewDelimiterDeserializer returns a DeserializerFunc that uses delimiter as the row delimiter and fn as the constructor function.
func NewGOBDeserializer ¶
func NewGOBDeserializer(fn func() Emitter) DeserializerFunc
NewGOBDeserializer
func NewJSONDeserializer ¶
func NewJSONDeserializer(fn func() Emitter) DeserializerFunc
NewJSONDeserializer
func NewMsgPackDeserializer ¶
func NewMsgPackDeserializer(fn func() Emitter) DeserializerFunc
func NewTSVDeserializer ¶
func NewTSVDeserializer(constructor func() Emitter) DeserializerFunc
type Emitter ¶
type Emitter interface { // Emit is called passing in the underlying writer. It's upto the caller // to write data in the required format. Emit(w io.Writer) error // Where should return true or false. It it returns false then this // Emitter will not be used within the Map or Reduce methods. Where() bool }
Emitter implements methods required within the Map stage.
type Finalizer ¶
Finalize implements a Finalize method. During the reduce stage, if a type implements the Finalizer interface Finalize is called once for each reduced Key.
type GOBDeserializer ¶
type GOBDeserializer struct {
// contains filtered or unexported fields
}
func (*GOBDeserializer) Error ¶
func (g *GOBDeserializer) Error() error
func (*GOBDeserializer) HasNext ¶
func (g *GOBDeserializer) HasNext() bool
func (*GOBDeserializer) Next ¶
func (g *GOBDeserializer) Next() Emitter
type JSONDeserializer ¶
type JSONDeserializer struct {
// contains filtered or unexported fields
}
func (*JSONDeserializer) Error ¶
func (j *JSONDeserializer) Error() error
func (*JSONDeserializer) HasNext ¶
func (j *JSONDeserializer) HasNext() bool
func (*JSONDeserializer) IgnoreMalformedJSON ¶
func (j *JSONDeserializer) IgnoreMalformedJSON() *JSONDeserializer
func (*JSONDeserializer) Next ¶
func (j *JSONDeserializer) Next() Emitter
type Map ¶
type Map struct { DeserializerFunc DeserializerFunc // contains filtered or unexported fields }
Map implements the Pipeline interface. It deserializes an input using a Deserializer and emits a key value pair of resulting Emitter interface.
func NewMap ¶
func NewMap(d DeserializerFunc) *Map
NewMap creates a new Map with the given Deserializer d
func (*Map) Out ¶
Out writes the Map output to w. It returns a channel of Emitters from the deserializer and then emits the Emitters Key and Value to the output.
type MsgPackDeserializer ¶
type MsgPackDeserializer struct {
// contains filtered or unexported fields
}
func (*MsgPackDeserializer) Error ¶
func (m *MsgPackDeserializer) Error() error
func (*MsgPackDeserializer) HasNext ¶
func (m *MsgPackDeserializer) HasNext() bool
func (*MsgPackDeserializer) Next ¶
func (m *MsgPackDeserializer) Next() Emitter
type Pipeline ¶
type Pipeline interface { io.ReadWriteCloser // In takes an io.Reader and returns a instance of the Pipeline // interface. It is used for providing an input, such as a file, // to a Pipeline In(io.Reader) Pipeline // Then writes the output of the Pipeline to another Pipeline and returns // that Pipeline. It is used for chaining together stages of the pipeline. Then(Pipeline) Pipeline // Out writes the output of this Pipeline to the given io.Writer Out(io.Writer) }
Pipeline implements the required methods for a pipeline stage.
type Reduce ¶
type Reduce struct { DeserializerFunc DeserializerFunc // contains filtered or unexported fields }
Reduce is a reduce process that reads from an io.PipeReader performs a reduce operation and writes to an io.PipeWriter
func NewReduce ¶
func NewReduce(d DeserializerFunc) *Reduce
NewReduce creates and returns a pointer to a new reduce type using the given DeserializerFunc
type Reducer ¶
type Reducer interface { Emitter // Provides the key to use for grouping the sum operations Key() string // Sum implements logic to Sum together two copies of this Emitter. // The underlying type of the receiver and the argument will always be // identical. Sum(emitter ...Emitter) }
Reducer implements the Sum method. It is used within the reduce stage to perform summing of sequential matching keys and their values
type Reducers ¶
type Reducers []Reducer
Emitters is a slice of Emitter interfaces that implements the sort.Interface
type Sorter ¶
type Sorter struct { DeserializerFunc DeserializerFunc // contains filtered or unexported fields }
func NewSorter ¶
func NewSorter(d DeserializerFunc) *Sorter
type TSVDeserializer ¶
type TSVDeserializer struct {
// contains filtered or unexported fields
}
func (*TSVDeserializer) Error ¶
func (m *TSVDeserializer) Error() error
func (*TSVDeserializer) HasNext ¶
func (m *TSVDeserializer) HasNext() bool
func (*TSVDeserializer) Next ¶
func (m *TSVDeserializer) Next() Emitter