datautils

package module
v0.0.0-...-e1a3218 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2021 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EmphasisedRelevancy

func EmphasisedRelevancy(r float64) float64

EmphasisedRelevancy is an alternative formulation of the relevancy function for calculating discounted cumulative gain that more strongly emphasises the degree of relevancy r.

func PlotHeatmap

func PlotHeatmap(corr mat.Matrix, xlabels []string, ylabels []string) (p *plot.Plot, err error)

func TraditionalRelevancy

func TraditionalRelevancy(r float64) float64

TraditionalRelevancy is the traditional formulation of the relevancy function for calculating discounted cumulative gain. It simply directly uses the specied degree of relevancy r.

Types

type ConfusionMatrix

type ConfusionMatrix struct {
	Observations, Pos, Neg, TruePos, TrueNeg, FalsePos, FalseNeg int
}

func NewConfusionMatrix

func NewConfusionMatrix(predictions []float64, labels []float64, threshold float64) ConfusionMatrix

func (ConfusionMatrix) Accuracy

func (c ConfusionMatrix) Accuracy() float64

func (ConfusionMatrix) F1

func (c ConfusionMatrix) F1() float64

func (ConfusionMatrix) Precision

func (c ConfusionMatrix) Precision() float64

func (ConfusionMatrix) Recall

func (c ConfusionMatrix) Recall() float64

func (ConfusionMatrix) String

func (c ConfusionMatrix) String() string

type PrecisionRecallCurve

type PrecisionRecallCurve struct {
	// Precision is a slice containing the ranked precision values at K for the predictions until all positive/
	// relevant items were found according to corresponding the ground truth labels (recall==1)
	Precision []float64

	// Recall is a slice containing the ranked recall values at K for the predictions until all positive/
	// relevant items were found according to corresponding the ground truth labels (recall==1)
	Recall []float64

	// Thresholds is a slice containing the ranked (sorted) predictions (probability/similarity scores) until
	// all positive/relevant items were found according to corresponding the ground truth labels (recall==1)
	Thresholds []float64
	// contains filtered or unexported fields
}

PrecisionRecallCurve represents a precision recall curve for visualising and measuring the performance of a classification or information retrieval model. It can be used to evaluate how well the model predictions can be ranked compared to a perfect ranking according to the ground truth labels. This is usefull when evaluating ranking based on relevancy for information retrieval or raw classification performance based on predicted probability of class membership e.g. logistic regression predictions without using a threshold to determine the class for the predicted probability. It is important to note that Precision[0] and Recall[0] indicate the precision and recall @ 0 and so will always be 1 and 0 respectively.

func NewPrecisionRecallCurve

func NewPrecisionRecallCurve(predictions, labels []float64) PrecisionRecallCurve

NewPrecisionRecallCurve creates a new precision recall curve. The precision recall curve visualises how well the model's predictions (or similarity scores for information retrieval) can be ranked compared to a perfect ranking according to the ground truth labels. Both the supplied predictions and labels slices can be in any order providing they are identical lengths and their order matches e.g. predictions[5] corresponds to the ground truth labels[5]. As Precision Recall curves and average precision (summarising the curve as a single metric/area under the curve) represent a binary class/relevance measure we assume that any label value greater than 0 represents a positive/relative observation (and 0 label values represent a negative/non-relevant observation).

func (PrecisionRecallCurve) AverageInterpolatedPrecision

func (c PrecisionRecallCurve) AverageInterpolatedPrecision() float64

AverageInterpolatedPrecision calculates the average interpolated precision based on the predictions and labels the curve was constructed with. Average Interpolated Precision represents the area under the curve of the precision recall curve using interpolated precision for 11 fixed recall values {0.0, 0.1, 0.2, ... 1.0}.

func (PrecisionRecallCurve) AveragePrecision

func (c PrecisionRecallCurve) AveragePrecision() float64

AveragePrecision calculates the average precision based on the predictions and labels the curve was constructed with. Average Precision represents the area under the curve of the precision recall curve and is a method for summarising the curve in a single metric.

func (PrecisionRecallCurve) InterpolatedPrecisionAt

func (c PrecisionRecallCurve) InterpolatedPrecisionAt(r float64) float64

InterpolatedPrecisionAt calculates an interpolated Precision@r. This can be used to calculate the precision for a specific recall value that does not necessarily occur explicitly in the ranking. It is calculated by taking the maximum precision value over all recalls greater than r.

func (PrecisionRecallCurve) Plot

func (c PrecisionRecallCurve) Plot() *plot.Plot

Plot renders the entire precision recall curve as a plot for visualisation.

func (PrecisionRecallCurve) PrecisionAt

func (c PrecisionRecallCurve) PrecisionAt(k int) float64

PrecisionAt calculates the Precision@k. This represents the precision at a certain cut-off, k i.e. if a search returns 10 (k=10) results what is the proportion of those 10 results that are relevant or if we are only interested in the relevancy of the top ranked item (k=1) is that item relevant or not.

func (PrecisionRecallCurve) RPrecision

func (c PrecisionRecallCurve) RPrecision() float64

RPrecision returns the R-Precision. The total number of relevant documents, R, is used as the cutoff for calculation, and this varies from query to query. It counts the number of results ranked above the cutoff that are relevant, r, and turns that into a relevancy fraction: r/R.

type RankingEvaluation

type RankingEvaluation struct {
	// Ground truth relevancy values in original ordering
	Relevancies []float64

	// ranked indexes of relevancy values, ranked according to predicted relevancy/probabilty values
	PredictedRankInd []int

	// ranked indexes of relevancy values, ranked according to ground truth relevancy values (a perfect ranking)
	PerfectRankInd []int
}

RankingEvaluation type for evaluating rankings for information retrieval and classification supporting calculation of [normalised] discounted cumulative gain

func NewRankingEvaluation

func NewRankingEvaluation(predictions, labels []float64) RankingEvaluation

NewRankingEvaluation creates a new RankingEvaluation type from the specified predicted relevancies (predictions) and ground truth relevancy values (labels). The ordering of both slices must correspond and the lengths must match.

func (RankingEvaluation) CumulativeGain

func (r RankingEvaluation) CumulativeGain(k int) float64

CumulativeGain calculates the cumulative gain for the ranking. This is the cumulative gain or sum of relevancy values at each rank up to the kth ranked item. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off).

func (RankingEvaluation) DiscountedCumulativeGain

func (r RankingEvaluation) DiscountedCumulativeGain(k int, rel RelevancyFunction) float64

DiscountedCumulativeGain calculates the discounted cumulative gain for the ranking. This is the cumulative gain or sum of relevancy values at each rank up to the kth ranked item with each relevancy value being discounted according to rank so that relevancy values at lower ranks are more heavily discounted and therefore contribute less to the sum. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off) and rel is the relevancy function to use. See TraditionalRelevancy and EmphasisedRelevancy for two popular formulations of the relevancy function - either of which may be specified for this parameter.

func (RankingEvaluation) NormalisedDiscountedCumulativeGain

func (r RankingEvaluation) NormalisedDiscountedCumulativeGain(k int, rel RelevancyFunction) float64

NormalisedDiscountedCumulativeGain calculates the normalised discounted cumulative gain for the ranking. This is the ratio of the discounted cumulative gain for the given ranking compared to the discounted cumulative for a perfect ranking of the same items. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off) and rel is the relevancy function to use. See TraditionalRelevancy and EmphasisedRelevancy for two popular formulations of the relevancy function - either of which may be specified for this parameter.

type RelevancyFunction

type RelevancyFunction func(float64) float64

RelevancyFunction supports specification/weighting of relevancy values for calculating discounted cumulative gain

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL