logprobparser

package module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 1, 2024 License: MIT Imports: 5 Imported by: 1

README

LogProb JSON Parser

This package helps developers parse JSON output from an LLM (Large Language Model) and retrieve the log probabilities associated with tokens of values either directly as elements or fields in Go structs. It supports three main scenarios:

  1. Parsing LogProb of JSON Array Outputs (Element by Element).
  2. Parsing LogProb of JSON Object Outputs (Key-Value Pairs).
  3. Parsing LogProb of JSON Outputs into Go Structs with Confidence Field Auto-Populated Based on the logprob Tag.

Installation

go get github.com/metaphi-org/logprobparser

Usage

1. Parse JSON Array Output Element by Element

When parsing an array output from the LLM, this method returns the probability of individual elements within the array.

Example JSON Output:
["Element1", "Element2", "Element3"]
Corresponding Logprobs Token Array:
[
  { "Key": "[", "Value": 0.01 },
  { "Key": "Element1", "Value": 0.02 },
  { "Key": ",", "Value": 0.03 },
  { "Key": "Element2", "Value": 0.04 },
  { "Key": ",", "Value": 0.05 },
  { "Key": "Element3", "Value": 0.06 },
  { "Key": "]", "Value": 0.07 }
]
Code Example:
tokens := []LogprobsToken[string]{
    {Key: "[", Value: 0.01},
    {Key: "Element1", Value: 0.02},
    {Key: ",", Value: 0.03},
    {Key: "Element2", Value: 0.04},
    {Key: ",", Value: 0.05},
    {Key: "Element3", Value: 0.06},
    {Key: "]", Value: 0.07},
}

parsed := ParseLogprobsArrElems(tokens)
// parsed now contains [][]LogprobsToken[string], where each inner slice represents an element's tokens and their probabilities
2. Parse JSON Object Output (Key-Value Pairs)

When parsing JSON object output, this method returns the probability of individual key-value pairs, allowing the developer to assess the certainty behind each field of the object.

Example JSON Output:
{
    "SomeKey": "SomeValue"
}
Corresponding Logprobs Token Array:
[
  { "Key": "{", "Value": 0 },
  { "Key": "\"", "Value": 0.01 },
  { "Key": "Some", "Value": 0.02 },
  { "Key": "Key", "Value": 0.03 },
  { "Key": "\":", "Value": 0.04 },
  { "Key": "\"", "Value": 0.05 },
  { "Key": "Some", "Value": 0.06 },
  { "Key": "Value", "Value": 0.07 },
  { "Key": "\"", "Value": 0.08 },
  { "Key": "}", "Value": 0.09 }
]
Code Example:
tokens := []LogprobsToken[string]{
    {Key: "{", Value: 0},
    {Key: "\"", Value: 0.01},
    {Key: "Some", Value: 0.02},
    {Key: "Key", Value: 0.03},
    {Key: "\":", Value: 0.04},
    {Key: "\"", Value: 0.05},
    {Key: "Some", Value: 0.06},
    {Key: "Value", Value: 0.07},
    {Key: "\"", Value: 0.08},
    {Key: "}", Value: 0.09},
}

parsed := ParseLogprobsObjEntries(tokens)
// parsed will now contain key-value tokens and probabilities
3. Parse JSON Output into a Go Struct (Auto-Populating Confidence Field)

In more complex scenarios, developers may want to parse the output into a Go struct. The logprob tag allows automatic population of a Confidence field using the log probabilities of sibling fields.

The logprob tag supports the following values:

  • set: Sets the confidence as exp(lp).
  • mult: Multiplies the existing confidence with exp(lp).
  • multsqt: Sets confidence as sqrt(exp(lp) * existing).
  • multsqt100: Sets confidence as sqrt(100 * exp(lp) * existing).

In this context, existing refers to the value already set in the Confidence field before parsing.

Example Struct:
type TestStruct struct {
    Key        string  `json:"SomeKey"`
    Confidence float64 `logprob:"mult"`  // Confidence will be auto-populated based on the logprob of sibling fields
}
Example Logprobs Token Array:
[
  { "Key": "{", "Value": 0 },
  { "Key": "\"", "Value": 0.01 },
  { "Key": "Some", "Value": 0.02 },
  { "Key": "Key", "Value": 0.03 },
  { "Key": "\":", "Value": 0.04 },
  { "Key": "\"", "Value": 0.05 },
  { "Key": "Some", "Value": 0.06 },
  { "Key": "Value", "Value": 0.07 },
  { "Key": "\"", "Value": 0.08 },
  { "Key": "}", "Value": 0.09 }
]
Code Example:
tokens := []LogprobsToken[string]{
    {Key: "{", Value: 0},
    {Key: "\"", Value: 0.01},
    {Key: "Some", Value: 0.02},
    {Key: "Key", Value: 0.03},
    {Key: "\":", Value: 0.04},
    {Key: "\"", Value: 0.05},
    {Key: "Some", Value: 0.06},
    {Key: "Value", Value: 0.07},
    {Key: "\"", Value: 0.08},
    {Key: "}", Value: 0.09},
}

var result TestStruct
ParseLogprobsIntoConfField(tokens, &result)
// result.Confidence will now be auto-populated based on the logprob of its sibling fields
Additional Notes
  • The logprob tag is highly flexible and allows for fine-tuning confidence settings using different mathematical operations.
  • Use ParseLogprobsArrElems for arrays, ParseLogprobsObjEntries for JSON objects, and ParseLogprobsIntoConfField when parsing into Go structs.

Contributing

Contributions are welcome. Please fork the repository and submit a PR with your changes!

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseLogprobsArrElems

func ParseLogprobsArrElems[T LogprobsToken[T]](tokens []T) [][]T

func ParseLogprobsIntoConfField

func ParseLogprobsIntoConfField[O any, T LogprobsToken[T]](
	o *O,
	tokens []T,
)

Any struct having a float field with logprob tag will be automatically populated

logprob

  • set: exp(lp)
  • mult: exp(lp)*existing
  • multsqt: sqrt(exp(lp)*existing)
  • multsqt100: sqrt(100*exp(lp)*existing)

The logprobs of only the values of sibling fields of "Confidence" field are used, including the "Confidence" field

func ParseLogprobsIntoObj

func ParseLogprobsIntoObj[O any, T LogprobsToken[T]](
	o *O,
	tokens []T,
	logProbSetter func(fieldName string, v reflect.Value, tokens []T),
)

func TokensAvgLogprob

func TokensAvgLogprob[T LogprobsToken[T]](tokens []T) float64

func TokensToText

func TokensToText[T LogprobsToken[T]](tokens []T) string

Types

type DefaulTokenObj

type DefaulTokenObj struct {
	Token   string
	LogProb float64
}

func (DefaulTokenObj) GetLogProb

func (c DefaulTokenObj) GetLogProb() float64

func (DefaulTokenObj) GetToken

func (c DefaulTokenObj) GetToken() string

func (DefaulTokenObj) WithToken

func (c DefaulTokenObj) WithToken(t string) DefaulTokenObj

type LogprobsToken

type LogprobsToken[O any] interface {
	GetToken() string
	WithToken(t string) O
	GetLogProb() float64
}

type ObjTokenEntries

type ObjTokenEntries[T LogprobsToken[T]] []ObjTokenEntry[T]

func ParseLogprobsObjEntries

func ParseLogprobsObjEntries[T LogprobsToken[T]](tokens []T) ObjTokenEntries[T]

func (ObjTokenEntries[T]) ValueAvgLogprob

func (oes ObjTokenEntries[T]) ValueAvgLogprob() float64

func (ObjTokenEntries[T]) ValueTokens

func (oes ObjTokenEntries[T]) ValueTokens() []T

type ObjTokenEntry

type ObjTokenEntry[T LogprobsToken[T]] struct {
	Key   []T
	Value []T
}

func (ObjTokenEntry[T]) KeyString

func (o ObjTokenEntry[T]) KeyString() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL