metrics

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 18, 2021 License: BSD-3-Clause Imports: 9 Imported by: 0

Documentation

Overview

Package metrics provides some pre-manufactured metrics on texts.

_________________________________________________________________________

BSD 3-Clause License

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Align

func Align(text cords.Cord, i, j uint64, metric cords.MaterializedMetric) (cords.MetricValue, cords.Cord, error)

Align applies a materialized metric to a text.

func Count

func Count(text cords.Cord, i, j uint64, metric CountingMetric) (int, error)

Count applies a counting metric to a text.

func Find

func Find(text cords.Cord, i, j uint64, metric ScanningMetric) ([][]int, error)

Find applies a scanning metric to a text.

func T

func T() tracing.Trace

T traces to a global core-tracer.

func Words

func Words() cords.MaterializedMetric

Types

type CountingMetric

type CountingMetric interface {
	cords.Metric
	Count(cords.MetricValue) int
}

CountingMetric is a type for metrics that count items in text. Possible items may be lines, words, emojis, …

func LineCount

func LineCount() CountingMetric

LineCount creates a CountingMetrc to be applied to a cord. It counts the lines of a text, delimited by newline characters. Multiple consecutive newlines will be counted as multiple empty lines. Clients who have a need for interpreting consecutive newlines in a different way may use a ParagraphCount metric first. If the text does not end with a newline, the trailing text fragment is *not* counted as a (incomplete) line.

type MetricValueBase

type MetricValueBase struct {
	// contains filtered or unexported fields
}

MetricValueBase is a helper type for metric application and for combining metric values on text fragments. Clients who want to use it should embed a MetricValueBase into their type definition for metric types.

MetricValueBase will implement `Len` and `Unprocessed` of interface MetricValue. To implement `Combine` clients should interact with MetricValueBase in a way that lets MetricValueBase handle the tricky parts of fragment boundary bytes.

In Metric.Apply(…):

v := &myCoolMetricValue{ … }    // create a MetricValue which embeds MetricValueBase
v.InitFrom(frag)                // call this helper first
from, to := …                   // do some metric calculations and possibly have boundaries
v.Measured(from, to, frag)      // leave it to MetricValueBase to remember unprocessed bytes
return &v

In Metric.Combine(…):

unproc, ok := leftSibling.ConcatUnprocessed(&rightSibling.MetricValueBase)  // step (b)
if ok {                                                                     //
    // yes, we have to re-apply our metric to `unproc`                      //
    x := metric.Apply(string(unproc)).(*delimiterMetricValue)               //
    // do something with sub-value x                                        //
}
leftSibling.UnifyWith(&rightSibling.MetricValueBase)                        // step (c)

It is up to the client's `Metric` and `MetricValue` to decide which spans of text fragments can be processed and how intermediate metric values are calculated and stored.

func (*MetricValueBase) Chunk

func (mvb *MetricValueBase) Chunk() []byte

func (*MetricValueBase) ConcatUnprocessed

func (mvb *MetricValueBase) ConcatUnprocessed(rightSibling *MetricValueBase) ([]byte, bool)

ConcatUnprocessed is a helper function to provide access to unprocessed bytes in between two text fragments. As described with MetricValues, refer to step (b) where unprocessed boundary bytes are subject to re-application of the metric.

(b)  |-----========    ------    ==============|     reprocess 6 bytes in between

ConcatUnprocessed will return the 6 bytes in between and a boolean flag to indicate if the metric should reprocess the bytes. It is the responsibility of the client's metric to initiate the reprocessing.

func (*MetricValueBase) HasBoundaries

func (mvb *MetricValueBase) HasBoundaries() bool

HasBoundaries returns true if the metric value has unprocessed boundary bytes. Clients normally will not have to consult this.

func (*MetricValueBase) InitFrom

func (mvb *MetricValueBase) InitFrom(frag []byte)

InitFrom should be called from the enclosing client metric type at the beginning of `Combine`. It will set up information about fragment length and possibly other administrative information.

This will usually be called from Metric.Apply(…).

func (MetricValueBase) Len

func (mvb MetricValueBase) Len() int

Len is part of interface MetricValue.

func (*MetricValueBase) Measured

func (mvb *MetricValueBase) Measured(from, to int, frag []byte)

Measured is a signal to an embedded MetricValueBase that a range of bytes has already been considered for metric calculation. The MetricValueBase will derive information about unprocessed boundary bytes from this.

This will usually be called from Metric.Apply(…).

from and to are allowed to be identical, signalling a split.

func (*MetricValueBase) MeasuredNothing

func (mvb *MetricValueBase) MeasuredNothing(frag []byte)

MeasuredNothing is a signal to an embedded MetricValueBase that no metric value could be calculated for a given text fragment. This will tell the metric calculation driver to reconsider the complete fragment when combining it with a sibling node.

This will usually be called from Metric.Apply(…).

func (*MetricValueBase) Prefix

func (mvb *MetricValueBase) Prefix() []byte

func (*MetricValueBase) Suffix

func (mvb *MetricValueBase) Suffix() []byte

func (*MetricValueBase) UnifyWith

func (mvb *MetricValueBase) UnifyWith(rightSibling *MetricValueBase)

UnifyWith creates a combined metric value from two sibling values. Recalculation of unprocessed bytes must already have been done, i.e. ConcatUnprocessed must already have been called.

Referring to the example for MetricValue, UnifyWith will help with step (c):

(c)  |-----============================|              combined intermediate fragment

The “meat” of the metric has to be calculated by the client metric type. Clients must implement their own data structure to support metric calculation and propagation. MetricValueBase just shields clients from the details of fragment handling.

func (MetricValueBase) Unprocessed

func (mvb MetricValueBase) Unprocessed() ([]byte, []byte)

Unprocessed is part of interface MetricValue.

type ScanningMetric

type ScanningMetric interface {
	cords.Metric
	Locations(cords.MetricValue) [][]int
}

A ScanningMetric searches a text for item (such as lines, word, emojis, …) and returns their location indices.

func FindLines

func FindLines() ScanningMetric

FindLines creates a ScanningMetrc to be applied to a cord. It finds the lines of a text, delimited by newline characters. Multiple consecutive newlines will be counted as multiple empty lines. Clients who have a need for interpreting consecutive newlines in a different way may use a ParagraphCount metric first.

FindLines returns tuples [position, length] for each line of text, not counting the line-terminating newline characters. If the last text fragment does not contain a final newline, it will be reported as a (fractional) line. Clients who have a need for the last non-terminated line will have to use cord.Report, starting at position+length+1 of the final location from FindLines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL