extractor

package
v0.0.0-...-4db8f08 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2021 License: MIT Imports: 19 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Init

func Init(l *zap.SugaredLogger, conf *Configuration, isUpdate bool)

Init sets up extractors and loaders

func LoadSources

func LoadSources(ctx context.Context, l *zap.SugaredLogger, conf *Configuration) error

Types

type Compound

type Compound struct {
	UCI              int              `json:"uci"`
	Inchi            Inchi            `json:"inchi"`
	Components       []Inchi          `json:"components,omitempty"`
	StandardInchiKey string           `json:"standard_inchi_key"`
	Smiles           string           `json:"smiles"`
	Sources          []CompoundSource `json:"sources,omitempty"`
	CreatedAt        time.Time        `json:"created_at"`
	IsSourceless     bool             `json:"is_sourceless"`
}

Compound is an structure describing the information to be indexed extracted from Unichem database

type CompoundSource

type CompoundSource struct {
	ID                 int       `json:"id"`
	Name               string    `json:"name"`
	LongName           string    `json:"long_name"`
	CompoundID         string    `json:"compound_id"`
	Description        string    `json:"description"`
	BaseURL            string    `json:"base_url"`
	ShortName          string    `json:"short_name"`
	BaseIDURLAvailable bool      `json:"base_id_url_available"`
	AuxForURL          bool      `json:"aux_for_url"`
	CreatedAt          time.Time `json:"created_at"`
	LastUpdate         time.Time `json:"last_updated,omitempty"`
	IsPrivate          bool      `json:"is_private"`
}

CompoundSource is the source where the unichem database extracted that compound

type Configuration

type Configuration struct {
	LogPath         string
	OracleConn      string
	ElasticHost     string
	MongoDB         string
	BulkLimit       int
	Index           string
	Type            string
	MaxBulkCalls    int
	QueryMax        Range
	Query           string
	MaxConcurrent   int
	Interval        int
	MaxAttempts     int
	ElasticAuth     ElasticAuth
	ESIndexSettings string
}

Configuration stores the configuration parameters required for the application

func LoadConfig

func LoadConfig(c string) (*Configuration, error)

LoadConfig opening a yaml config file (config.yaml)

type ElasticAuth

type ElasticAuth struct {
	Username, Password string
}

ElasticAuth ElasticSearch authentication

type ElasticManager

type ElasticManager struct {
	Context   context.Context
	Client    *elastic.Client
	IndexName string
	TypeName  string
	Bulklimit int

	Errchan   chan error
	Respchan  chan WorkerResponse
	WaitGroup sync.WaitGroup

	MaxBulkCalls int
	// contains filtered or unexported fields
}

ElasticManager used for connection and adding compounds to the elastic server

func (*ElasticManager) AddToBulk

func (em *ElasticManager) AddToBulk(c Compound)

AddToBulk fills a BulkRequest up to the limit set up on the em.Bulklimit property

func (*ElasticManager) Close

func (em *ElasticManager) Close()

Close terminates the ElasticSearch Client and BulkProcessor

func (*ElasticManager) Init

func (em *ElasticManager) Init(ctx context.Context, conf *Configuration, logger *zap.SugaredLogger) error

Init function initializes an elastic client and pings it to check the provider server is up

func (*ElasticManager) SendCurrentBulk

func (em *ElasticManager) SendCurrentBulk()

SendCurrentBulk through a worker, useful for cleaning the requests stored on the BulkService regardless the BulkLimit has been reached or not

type Extractor

type Extractor struct {
	ElasticManager         *ElasticManager
	Oraconn                string
	Query                  string
	QueryLimit, QueryStart int
	Logger                 *zap.SugaredLogger
	LastIDAdded            int

	Attemps int
	//CurrentCompound contains the current compound being added to the loader
	PreviousCompound Compound
	CurrentCompound  Compound
	// contains filtered or unexported fields
}

Extractor connects to the given oracle string connection (Oraconn), fetch the unichem data and adds it into the index using the ElasticManager provided

func (*Extractor) Start

func (ex *Extractor) Start(ctx context.Context) error

Start extracting unichem data by querying Unichem's db and adds them into the index using a provided ElasticManager

type Inchi

type Inchi struct {
	Version               string `json:"version"`
	Formula               string `json:"formula"`
	Connections           string `json:"connections"`
	HAtoms                string `json:"h_atoms"`
	Charge                string `json:"charge"`
	Protons               string `json:"protons"`
	StereoDbond           string `json:"stereo_dbond"`
	StereoSP3             string `json:"stereo_SP3"`
	StereoSP3inverted     string `json:"stereo_SP3_inverted"`
	StereoType            string `json:"stereo_type"`
	IsotopicAtoms         string `json:"isotopic_atoms"`
	IsotopicExchangeableH string `json:"isotopic_exchangeable_h"`
	Inchi                 string `json:"inchi"`
}

Inchi split in its components

type InchiDivider

type InchiDivider struct {
	Logger *zap.SugaredLogger
}

InchiDivider the InChI on its different layers and components.

func (*InchiDivider) ProcessInchi

func (ind *InchiDivider) ProcessInchi(compound Compound) (Compound, error)

type Range

type Range struct {
	Start, Finish int
}

Range UCI for concurrent queries

type Source

type Source struct {
	SourceID         int       `bson:"sourceID,omitempty"`
	Name             string    `bson:"name,omitempty"`
	Description      string    `bson:"description,omitempty"`
	SrcReleaseNumber int32     `bson:"srcReleaseNumber,omitempty"`
	SrcReleaseDate   time.Time `bson:"srcReleaseDate,omitempty"`
	Created          time.Time `bson:"created,omitempty"`
	LastUpdated      time.Time `bson:"lastUpdated,omitempty"`
	LongName         string    `bson:"nameLong,omitempty"`
	SrcDetails       string    `bson:"srcDetails,omitempty"`
	SrcUrl           string    `bson:"srcUrl,omitempty"`
	BaseIdUrl        string    `bson:"baseIdUrl,omitempty"`
	Private          bool      `bson:"private,omitempty"`
	NameLabel        string    `bson:"nameLabel,omitempty"`
	UpdateComments   string    `bson:"updateComments,omitempty"`
	UCICount         int       `bson:"UCICount,omitempty"`
}

type UCICount

type UCICount struct {
	TotalUCI int `json:"totalUCI"`
	Source   int `json:"source"`
}

UCICount is the amount of UCI by Sources on UniChem

type WorkerResponse

type WorkerResponse struct {
	Succeeded    int
	Indexed      int
	Created      int
	Updated      int
	Deleted      int
	Failed       int
	BulkResponse *elastic.BulkResponse
}

WorkerResponse contains the result of the BulkRequest to the ElasticSearch index

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL