Documentation ¶
Overview ¶
Package metrics provides some pre-manufactured metrics on texts.
_________________________________________________________________________
BSD 3-Clause License ¶
Copyright (c) 2020–21, Norbert Pillmayer ¶
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index ¶
- func Align(text cords.Cord, i, j uint64, metric cords.MaterializedMetric) (cords.MetricValue, cords.Cord, error)
- func Count(text cords.Cord, i, j uint64, metric CountingMetric) (int, error)
- func Find(text cords.Cord, i, j uint64, metric ScanningMetric) ([][]int, error)
- func T() tracing.Trace
- func Words() cords.MaterializedMetric
- type CountingMetric
- type MetricValueBase
- func (mvb *MetricValueBase) Chunk() []byte
- func (mvb *MetricValueBase) ConcatUnprocessed(rightSibling *MetricValueBase) ([]byte, bool)
- func (mvb *MetricValueBase) HasBoundaries() bool
- func (mvb *MetricValueBase) InitFrom(frag []byte)
- func (mvb MetricValueBase) Len() int
- func (mvb *MetricValueBase) Measured(from, to int, frag []byte)
- func (mvb *MetricValueBase) MeasuredNothing(frag []byte)
- func (mvb *MetricValueBase) Prefix() []byte
- func (mvb *MetricValueBase) Suffix() []byte
- func (mvb *MetricValueBase) UnifyWith(rightSibling *MetricValueBase)
- func (mvb MetricValueBase) Unprocessed() ([]byte, []byte)
- type ScanningMetric
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Align ¶
func Align(text cords.Cord, i, j uint64, metric cords.MaterializedMetric) (cords.MetricValue, cords.Cord, error)
Align applies a materialized metric to a text.
func Words ¶
func Words() cords.MaterializedMetric
Types ¶
type CountingMetric ¶
type CountingMetric interface { cords.Metric Count(cords.MetricValue) int }
CountingMetric is a type for metrics that count items in text. Possible items may be lines, words, emojis, …
func LineCount ¶
func LineCount() CountingMetric
LineCount creates a CountingMetrc to be applied to a cord. It counts the lines of a text, delimited by newline characters. Multiple consecutive newlines will be counted as multiple empty lines. Clients who have a need for interpreting consecutive newlines in a different way may use a ParagraphCount metric first. If the text does not end with a newline, the trailing text fragment is *not* counted as a (incomplete) line.
type MetricValueBase ¶
type MetricValueBase struct {
// contains filtered or unexported fields
}
MetricValueBase is a helper type for metric application and for combining metric values on text fragments. Clients who want to use it should embed a MetricValueBase into their type definition for metric types.
MetricValueBase will implement `Len` and `Unprocessed` of interface MetricValue. To implement `Combine` clients should interact with MetricValueBase in a way that lets MetricValueBase handle the tricky parts of fragment boundary bytes.
In Metric.Apply(…):
v := &myCoolMetricValue{ … } // create a MetricValue which embeds MetricValueBase v.InitFrom(frag) // call this helper first from, to := … // do some metric calculations and possibly have boundaries v.Measured(from, to, frag) // leave it to MetricValueBase to remember unprocessed bytes return &v
In Metric.Combine(…):
unproc, ok := leftSibling.ConcatUnprocessed(&rightSibling.MetricValueBase) // step (b) if ok { // // yes, we have to re-apply our metric to `unproc` // x := metric.Apply(string(unproc)).(*delimiterMetricValue) // // do something with sub-value x // } leftSibling.UnifyWith(&rightSibling.MetricValueBase) // step (c)
It is up to the client's `Metric` and `MetricValue` to decide which spans of text fragments can be processed and how intermediate metric values are calculated and stored.
func (*MetricValueBase) Chunk ¶
func (mvb *MetricValueBase) Chunk() []byte
func (*MetricValueBase) ConcatUnprocessed ¶
func (mvb *MetricValueBase) ConcatUnprocessed(rightSibling *MetricValueBase) ([]byte, bool)
ConcatUnprocessed is a helper function to provide access to unprocessed bytes in between two text fragments. As described with MetricValues, refer to step (b) where unprocessed boundary bytes are subject to re-application of the metric.
(b) |-----======== ------ ==============| reprocess 6 bytes in between
ConcatUnprocessed will return the 6 bytes in between and a boolean flag to indicate if the metric should reprocess the bytes. It is the responsibility of the client's metric to initiate the reprocessing.
func (*MetricValueBase) HasBoundaries ¶
func (mvb *MetricValueBase) HasBoundaries() bool
HasBoundaries returns true if the metric value has unprocessed boundary bytes. Clients normally will not have to consult this.
func (*MetricValueBase) InitFrom ¶
func (mvb *MetricValueBase) InitFrom(frag []byte)
InitFrom should be called from the enclosing client metric type at the beginning of `Combine`. It will set up information about fragment length and possibly other administrative information.
This will usually be called from Metric.Apply(…).
func (MetricValueBase) Len ¶
func (mvb MetricValueBase) Len() int
Len is part of interface MetricValue.
func (*MetricValueBase) Measured ¶
func (mvb *MetricValueBase) Measured(from, to int, frag []byte)
Measured is a signal to an embedded MetricValueBase that a range of bytes has already been considered for metric calculation. The MetricValueBase will derive information about unprocessed boundary bytes from this.
This will usually be called from Metric.Apply(…).
from and to are allowed to be identical, signalling a split.
func (*MetricValueBase) MeasuredNothing ¶
func (mvb *MetricValueBase) MeasuredNothing(frag []byte)
MeasuredNothing is a signal to an embedded MetricValueBase that no metric value could be calculated for a given text fragment. This will tell the metric calculation driver to reconsider the complete fragment when combining it with a sibling node.
This will usually be called from Metric.Apply(…).
func (*MetricValueBase) Prefix ¶
func (mvb *MetricValueBase) Prefix() []byte
func (*MetricValueBase) Suffix ¶
func (mvb *MetricValueBase) Suffix() []byte
func (*MetricValueBase) UnifyWith ¶
func (mvb *MetricValueBase) UnifyWith(rightSibling *MetricValueBase)
UnifyWith creates a combined metric value from two sibling values. Recalculation of unprocessed bytes must already have been done, i.e. ConcatUnprocessed must already have been called.
Referring to the example for MetricValue, UnifyWith will help with step (c):
(c) |-----============================| combined intermediate fragment
The “meat” of the metric has to be calculated by the client metric type. Clients must implement their own data structure to support metric calculation and propagation. MetricValueBase just shields clients from the details of fragment handling.
func (MetricValueBase) Unprocessed ¶
func (mvb MetricValueBase) Unprocessed() ([]byte, []byte)
Unprocessed is part of interface MetricValue.
type ScanningMetric ¶
type ScanningMetric interface { cords.Metric Locations(cords.MetricValue) [][]int }
A ScanningMetric searches a text for item (such as lines, word, emojis, …) and returns their location indices.
func FindLines ¶
func FindLines() ScanningMetric
FindLines creates a ScanningMetrc to be applied to a cord. It finds the lines of a text, delimited by newline characters. Multiple consecutive newlines will be counted as multiple empty lines. Clients who have a need for interpreting consecutive newlines in a different way may use a ParagraphCount metric first.
FindLines returns tuples [position, length] for each line of text, not counting the line-terminating newline characters. If the last text fragment does not contain a final newline, it will be reported as a (fractional) line. Clients who have a need for the last non-terminated line will have to use cord.Report, starting at position+length+1 of the final location from FindLines.