Back to godoc.org
github.com/hscells/cui2vec

Package cui2vec

v0.0.0-...-d05e622
Latest Go to latest

The latest major version is .

Published: Feb 14, 2020 | License: MIT | Module: github.com/hscells/cui2vec

Overview

package cui2vec implements utilities for dealing with cui2vec Embeddings and mapping cuis to text.

Index

func CUI2Int

func CUI2Int(cui string) (int, error)

CUI2Int converts a string CUI into an integer.

func Cosine

func Cosine(x, y []float64) (float64, error)

Cosine returns the cosine similarity between two vectors.

func Int2CUI

func Int2CUI(val int) string

Int2CUI converts an integer value to a CUI.

type AliasMapping

type AliasMapping map[string][]string

func LoadCUIAliasMapping

func LoadCUIAliasMapping(path string) (AliasMapping, error)

type Concept

type Concept struct {
	CUI   string
	Value float64
}

Concept is a CUI that has a similarity score in relation to a target CUI.

func Softmax

func Softmax(z []Concept) []Concept

Softmax normalises a slice of concepts.

type Embeddings

type Embeddings interface {
	LoadModel(r io.Reader) error
	Similar(cui string) ([]Concept, error)
}

Embeddings is a complete cui2vec file loaded into memory.

type Mapping

type Mapping map[string]string

func LoadCUIFrequencyMapping

func LoadCUIFrequencyMapping(path string) (Mapping, error)

func LoadCUIMapping

func LoadCUIMapping(path string) (Mapping, error)

LoadCUIMapping loads a mapping of cui to most common title.

Mapping of cuis->title is constructed as per: Jimmy, Zuccon G., Koopman B. (2018) Choices in Knowledge-Base Retrieval for Consumer Health Search. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science, vol 10772. Springer, Cham

File must reflect this.

func (Mapping) Invert

func (m Mapping) Invert() Mapping

type PrecomputedEmbeddings

type PrecomputedEmbeddings struct {
	Matrix [][]int
	Cols   int
}

PrecomputedEmbeddings is a type of cui2vec container where the distances between CUIs have been pre-computed. It contains a sparse Matrix where the rows are CUIs and the columns are the distances to other CUIs. Each row is formatted in the form [CUI, score, CUI, score, ...]. Each CUI must be converted back to a string, and each score must be re-normalised from an int back to a float (taken care of by the Similar method).

func NewPrecomputedEmbeddings

func NewPrecomputedEmbeddings(r io.Reader) (*PrecomputedEmbeddings, error)

func (*PrecomputedEmbeddings) LoadModel

func (v *PrecomputedEmbeddings) LoadModel(r io.Reader) error

LoadModel reads a model from disk into memory. The file format of the pre-computed distances file is that of a single, continuous byte sequence starting with four bytes indicating the rows in the matrix. The first four bytes indicate a single Uint32 number representing the size of the matrix. This is used to create a fixed-size sparse matrix. The `Cols` attribute of the `PrecomputedEmbeddings` type is used to read N four-byte Uint32 numbers at a time to populate the columns of the matrix.

func (*PrecomputedEmbeddings) Similar

func (v *PrecomputedEmbeddings) Similar(cui string) ([]Concept, error)

Similar matches a given input CUI to the `Cols`-closest CUIs in the cui2vec embedding space. As each row in the matrix is encoded into (CUI, score) pairs, this method handles that. It also converts each int value in the matrix into either a string CUI or a re-normalised softmax score float64.

func (*PrecomputedEmbeddings) WriteModel

func (v *PrecomputedEmbeddings) WriteModel(w io.Writer) error

WriteModel writes a pre-computed distance matrix to disk. The write begins with a four-byte sequence to be parsed as a Uint32 representing the size of the matrix. Each value of the matrix is then written one by one in a continuous byte sequence, where each element in the matrix is encoded as a four-byte sequence to be parsed as a Uint32. Elements of the matrix are written row-by-row, and each row is exactly `Cols` wide. If there are less than `Cols` elements in a row, the row is padded with zeros.

type SimResponse

type SimResponse struct {
	V []Concept
}

type UncompressedEmbeddings

type UncompressedEmbeddings struct {
	SkipFirst  bool
	Comma      rune
	Embeddings map[string][]float64
}

func NewUncompressedEmbeddings

func NewUncompressedEmbeddings(r io.Reader, skipFirst bool, comma rune) (*UncompressedEmbeddings, error)

func (*UncompressedEmbeddings) LoadModel

func (v *UncompressedEmbeddings) LoadModel(r io.Reader) error

LoadModel a cui2vec pre-trained model into memory. The pre-trained file from:

https://arxiv.org/pdf/1804.01486.pdf

which was downloaded from:

https://figshare.com/s/00d69861786cd0156d81

is a csv file. The skipFirst parameter determines if the first line of the file should be skipped.

func (*UncompressedEmbeddings) Similar

func (v *UncompressedEmbeddings) Similar(cui string) ([]Concept, error)

Similar computes cuis that a similar to an input CUI. The distance function used is Cosine similarity. The CUIs are then run through Softmax and sorted.

type VecClient

type VecClient struct {
	// contains filtered or unexported fields
}

func NewVecClient

func NewVecClient(addr string) (*VecClient, error)

func (*VecClient) Sim

func (c *VecClient) Sim(cui string) ([]Concept, error)

func (*VecClient) Vec

func (c *VecClient) Vec(cui string) ([]float64, error)

type VecResponse

type VecResponse struct {
	V []float64
}

Package Files

Documentation was rendered with GOOS=linux and GOARCH=amd64.

Jump to identifier

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to identifier