Documentation ¶
Overview ¶
Package manticore implements Client to work with manticoresearch over it's internal binary protocol. Also in many cases it may be used to work with sphinxsearch daemon as well. It implements Client connector which may be used as
cl := NewClient() res, err := cl.Query("hello") ...
Set of functions is mostly imitates API description of Manticoresearch for PHP, but with few changes which are specific to Go language as more effective and mainstream for that language (as, for example, error handling).
This SDK help you to send different manticore API packets and parse results. These are:
* Search (full-text and full-scan)
* Build snippets
* Build keywords
* Flush attributes
* Perform JSON queries (as via HTTP proto)
* Perform sphinxql queries (as via mysql proto)
* Set user variables
* Ping the server
* Look server status
* Perform pecrolate queries
The percolate query is used to match documents against queries stored in an index. It is also called “search in reverse” as it works opposite to a regular search where documents are stored in an index and queries are issued against the index.
These queries are stored in special kind index and they can be added, deleted and listed using INSERT/DELETE/SELECT statements similar way as it’s done for a regular index.
Checking if a document matches any of the predefined criterias (queries) performed via CallPQ function, or via http /json/pq/<index>/_search endpoint. They returns list of matched queries and may be additional info as matching clause, filters, and tags.
Index ¶
- Constants
- func EscapeString(from string) string
- type Client
- func (cl *Client) BuildExcerpts(docs []string, index, words string, opts ...SnippetOptions) ([]string, error)
- func (cl *Client) BuildKeywords(query, index string, hits bool) ([]Keyword, error)
- func (cl *Client) CallPQ(index string, values []string, opts SearchPqOptions) (*SearchPqResponse, error)
- func (cl *Client) CallPQBson(index string, values []byte, opts SearchPqOptions) (*SearchPqResponse, error)
- func (cl *Client) Close() (bool, error)
- func (cl *Client) FlushAttributes() (int, error)
- func (cl *Client) GetLastWarning() string
- func (cl *Client) IsConnectError() bool
- func (cl *Client) Json(endpoint, request string) (JsonAnswer, error)
- func (cl *Client) Open() (bool, error)
- func (cl *Client) Ping(cookie uint32) (uint32, error)
- func (cl *Client) Query(query string, indexes ...string) (*QueryResult, error)
- func (cl *Client) RunQueries(queries []Search) ([]QueryResult, error)
- func (cl *Client) RunQuery(query Search) (*QueryResult, error)
- func (cl *Client) SetConnectTimeout(timeout time.Duration)
- func (cl *Client) SetMaxAlloc(alloc int)
- func (cl *Client) SetServer(host string, port ...uint16)
- func (cl *Client) Sphinxql(cmd string) ([]Sqlresult, error)
- func (cl *Client) Status(global bool) (map[string]string, error)
- func (cl *Client) UpdateAttributes(index string, attrs []string, values map[DocID][]interface{}, ...) (int, error)
- func (cl *Client) Uvar(name string, values []uint64) error
- type ColumnInfo
- type DocID
- type EAttrType
- type EGroupBy
- type EMatchMode
- type ERankMode
- type ESearchdstatus
- type ESortOrder
- type EUpdateType
- type ExcerptFlags
- type JsonAnswer
- type JsonOrStr
- type Keyword
- type Match
- type PqQuery
- type PqResponseFlags
- type Pqflags
- type Qflags
- type QueryDesc
- type QueryDescFlags
- type QueryResult
- type Search
- func (q *Search) AddFilter(attribute string, values []int64, exclude bool)
- func (q *Search) AddFilterExpression(expression string, exclude bool)
- func (q *Search) AddFilterFloatRange(attribute string, fmin, fmax float32, exclude bool)
- func (q *Search) AddFilterNull(attribute string, isnull bool)
- func (q *Search) AddFilterRange(attribute string, imin, imax int64, exclude bool)
- func (q *Search) AddFilterString(attribute string, value string, exclude bool)
- func (q *Search) AddFilterStringList(attribute string, values []string, exclude bool)
- func (q *Search) AddFilterUservar(attribute string, uservar string, exclude bool)
- func (q *Search) ChangeQueryFlags(flags Qflags, set bool)
- func (q *Search) ResetFilters()
- func (q *Search) ResetGroupBy()
- func (q *Search) ResetOuterSelect()
- func (q *Search) ResetQueryFlags()
- func (q *Search) SetGeoAnchor(attrlat, attrlong string, lat, long float32)
- func (q *Search) SetGroupBy(attribute string, gfunc EGroupBy, groupsort ...string)
- func (q *Search) SetMaxPredictedTime(predtime time.Duration)
- func (q *Search) SetOuterSelect(orderby string, offset, limit int32)
- func (q *Search) SetQueryFlags(flags Qflags)
- func (q *Search) SetRankingExpression(rankexpr string)
- func (q *Search) SetRankingMode(ranker ERankMode)
- func (q *Search) SetSortMode(sort ESortOrder, sortby ...string)
- func (q *Search) SetTokenFilter(library, name string, opts string)
- type SearchPqOptions
- type SearchPqResponse
- type SnippetOptions
- type SqlMsg
- type SqlResultset
- type SqlSchema
- type Sqlresult
- type WordStat
Examples ¶
Constants ¶
const ( AggrNone eAggrFunc = iota // None AggrAvg // Avg() AggrMin // Min() AggrMax // Max() AggrSum // Sum() AggrCat // Cat() )
const ( CollationLibcCi eCollation = iota // Libc CI CollationLibcCs // Libc Cs CollationUtf8GeneralCi // Utf8 general CI CollationBinary // Binary CollationDefault = CollationLibcCi )
const ( FilterValues eFilterType = iota // filter by integer values set FilterRange // filter by integer range FilterFloatrange // filter by float range FilterString // filter by string value FilterNull // filter by NULL FilterUservar // filter by @uservar FilterStringList // filter by string list FilterExpression // filter by expression )
const ( QueryOptDefault eQueryoption = iota // Default QueryOptDisabled // Disabled QueryOptEnabled // Enabled QueryOptMorphNone // None morphology expansion )
const (
SphinxPort uint16 = 9312 // Default IANA port for Sphinx API
)
Variables ¶
This section is empty.
Functions ¶
func EscapeString ¶
EscapeString escapes characters that are treated as special operators by the query language parser.
`from` is a string to escape. This function might seem redundant because it’s trivial to implement in any calling application. However, as the set of special characters might change over time, it makes sense to have an API call that is guaranteed to escape all such characters at all times. Returns escaped string.
Example ¶
escaped := EscapeString("escaping-sample@query/string") fmt.Println(escaped)
Output: escaping\-sample\@query\/string
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client represents connection to manticore daemon. It provides set of public API functions
func NewClient ¶
func NewClient() Client
NewClient creates default connector, which points to 'localhost:9312', has zero timeout and 8M maxalloc. Defaults may be changed later by invoking `SetServer()`, `SetMaxAlloc()`
func (*Client) BuildExcerpts ¶
func (cl *Client) BuildExcerpts(docs []string, index, words string, opts ...SnippetOptions) ([]string, error)
BuildExcerpts generates excerpts (snippets) of given documents for given query. returns nil on failure, an array of snippets on success. If necessary it will connect to the server before processing.
`docs` is a plain slice of strings that carry the documents’ contents.
`index` is an index name string. Different settings (such as charset, morphology, wordforms) from given index will be used.
`words` is a string that contains the keywords to highlight. They will be processed with respect to index settings. For instance, if English stemming is enabled in the index, shoes will be highlighted even if keyword is shoe. Keywords can contain wildcards, that work similarly to star-syntax available in queries.
`opts` is an optional struct SnippetOptions which may contain additional optional highlighting parameters, it may be created by calling of “NewSnippetOptions()” and then tuned for your needs. If `opts` is omitted, default will be used.
Snippets extraction algorithm currently favors better passages (with closer phrase matches), and then passages with keywords not yet in snippet. Generally, it will try to highlight the best match with the query, and it will also to highlight all the query keywords, as made possible by the limits. In case the document does not match the query, beginning of the document trimmed down according to the limits will be return by default. You can also return an empty snippet instead case by setting allow_empty option to true.
Returns false on failure. Returns a plain array of strings with excerpts (snippets) on success.
func (*Client) BuildKeywords ¶
BuildKeywords extracts keywords from query using tokenizer settings for given index, optionally with per-keyword occurrence statistics. Returns an array of hashes with per-keyword information. If necessary it will connect to the server before processing.
`query` is a query to extract keywords from.
`index` is a name of the index to get tokenizing settings and keyword occurrence statistics from.
`hits` is a boolean flag that indicates whether keyword occurrence statistics are required.
Example (WithHits) ¶
cl := NewClient() keywords, err := cl.BuildKeywords("this.is.my query", "lj", true) if err != nil { fmt.Println(err.Error()) } else { fmt.Println(keywords) }
Output: [{Tok: 'this', Norm: 'this', Qpos: 1; docs/hits 1629922/3905279} {Tok: 'is', Norm: 'is', Qpos: 2; docs/hits 1901345/6052344} {Tok: 'my', Norm: 'my', Qpos: 3; docs/hits 1981048/7549917} {Tok: 'query', Norm: 'query', Qpos: 4; docs/hits 1235/1474} ]
Example (WithoutHits) ¶
cl := NewClient() keywords, err := cl.BuildKeywords("this.is.my query", "lj", false) if err != nil { fmt.Println(err.Error()) } else { fmt.Println(keywords) }
Output: [{Tok: 'this', Norm: 'this', Qpos: 1; docs/hits 0/0} {Tok: 'is', Norm: 'is', Qpos: 2; docs/hits 0/0} {Tok: 'my', Norm: 'my', Qpos: 3; docs/hits 0/0} {Tok: 'query', Norm: 'query', Qpos: 4; docs/hits 0/0} ]
func (*Client) CallPQ ¶
func (cl *Client) CallPQ(index string, values []string, opts SearchPqOptions) (*SearchPqResponse, error)
CallQP perform check if a document matches any of the predefined criterias (queries) It returns list of matched queries and may be additional info as matching clause, filters, and tags.
`index` determines name of PQ index you want to call into. It can be either local, either distributed built from several PQ agents
`values` is the list of the index. Each value regarded as separate index. Order num of matched indexes then may be returned in resultset
`opts` packed options. See description of SearchPqOptions for details. In general you need to make instance of options by calling NewSearchPqOptions(), set desired flags and options, and then invoke CallPQ, providing desired index, set of documents and the options.
Since this function expects plain text documents, it will remove all flags about json from the options, and also will not use IdAlias, if any provided.
For example:
.. po := NewSearchPqOptions() po.Flags = NeedDocs | Verbose | NeedQuery resp, err := cl.CallPQ("pq",[]string{"angry test","filter test doc2",},po) ...
func (*Client) CallPQBson ¶
func (cl *Client) CallPQBson(index string, values []byte, opts SearchPqOptions) (*SearchPqResponse, error)
CallPQBson perform check if a document matches any of the predefined criterias (queries) It returns list of matched queries and may be additional info as matching clause, filters, and tags.
It works very like CallPQ, but expects documents in BSON form. With this function it is have sense to use flags as SkipBadJson, and param IdAlias which are not used for plain queries.
This function is not yet implemented in SDK, it is stub.
func (*Client) Close ¶
Close closes previously opened persistent connection. If no connection active, it fire error 'not connected' which is just informational and safe to ignore.
func (*Client) FlushAttributes ¶
FlushAttributes forces searchd to flush pending attribute updates to disk, and blocks until completion. Returns a non-negative internal flush tag on success, or -1 and error.
Attribute values updated using UpdateAttributes() API call are kept in a memory mapped file. Which means the OS decides when the updates are actually written to disk. FlushAttributes() call lets you enforce a flush, which writes all the changes to disk. The call will block until searchd finishes writing the data to disk, which might take seconds or even minutes depending on the total data size (.spa file size). All the currently updated indexes will be flushed.
Flush tag should be treated as an ever growing magic number that does not mean anything. It’s guaranteed to be non-negative. It is guaranteed to grow over time, though not necessarily in a sequential fashion; for instance, two calls that return 10 and then 1000 respectively are a valid situation. If two calls to FlushAttrs() return the same tag, it means that there were no actual attribute updates in between them, and therefore current flushed state remained the same (for all indexes).
Usage example:
status, err := cl.FlushAttributes () if err!=nil { fmt.Println(err.Error()) }
func (*Client) GetLastWarning ¶
GetLastWarning returns last warning message, as a string, in human readable format. If there were no warnings during the previous API call, empty string is returned.
You should call it to verify whether your request (such as Query()) was completed but with warnings. For instance, search query against a distributed index might complete successfully even if several remote agents timed out. In that case, a warning message would be produced.
The warning message is not reset by this call; so you can safely call it several times if needed. If you issued multi-query by running RunQueries(), individual warnings will not be written in client; instead check the Warning field in each returned result of the slice.
func (*Client) IsConnectError ¶
IsConnectError checks whether the last error was a network error on API side, or a remote error reported by searchd. Returns true if the last connection attempt to searchd failed on API side, false otherwise (if the error was remote, or there were no connection attempts at all).
func (*Client) Json ¶
func (cl *Client) Json(endpoint, request string) (JsonAnswer, error)
Json pefrorms remote call of JSON query, as if it were fired via HTTP connection. It is intented to run updates and deletes, however works sometimes in other cases. General rule: if the endpoint accepts data via POST, it will work via Json call.
`endpoint` - is the endpoint, like "json/search".
`request` - the query. As in REST, expected to be in JSON, like `{"index":"lj","query":{"match":{"title":"luther"}}}`
func (*Client) Ping ¶
Ping just send a uint32 cookie to the daemon and immediately receive it back. It may be used to average network responsibility time, or to ping if daemon is alive or not.
func (*Client) Query ¶
func (cl *Client) Query(query string, indexes ...string) (*QueryResult, error)
Query connects to searchd server, run given simple search query string through given indexes, and return the search result.
This is simplified function which accepts only 1 query string parameter and no options Internally it will run with ranker 'RankProximityBm25', mode 'MatchAll' with 'max_matches=1000' and 'limit=20' It is good to be used in kind of a demo run. If you want more fine-tuned options, consider to use `RunQuery()` and `RunQueries()` functions which provide you full spectre of possible tuning options.
`query` is a query string.
`indexes` is an index name (or names) string. Default value for `indexes` is "*" that means to query all local indexes. Characters allowed in index names include Latin letters (a-z), numbers (0-9) and underscore (_); everything else is considered a separator. Note that index name should not start with underscore character. Internally 'Query' is just invokes 'RunQuery' with default Search, where only `query` and `index` fields are customized.
Therefore, all of the following samples calls are valid and will search the same two indexes:
cl.Query ( "test query", "main delta" ) cl.Query ( "test query", "main;delta" ) cl.Query ( "test query", "main, delta" )
func (*Client) RunQueries ¶
func (cl *Client) RunQueries(queries []Search) ([]QueryResult, error)
RunQueries connects to searchd, runs a batch of queries, obtains and returns the result sets. Returns nil and error message on general error (such as network I/O failure). Returns a slice of result sets on success.
`queries` is slice of Search structures, each represent one query. You need to prepare this slice yourself before call.
Each result set in the returned array is exactly the same as the result set returned from RunQuery.
Note that the batch query request itself almost always succeeds - unless there’s a network error, blocking index rotation in progress, or another general failure which prevents the whole request from being processed.
However individual queries within the batch might very well fail. In this case their respective result sets will contain non-empty `error` message, but no matches or query statistics. In the extreme case all queries within the batch could fail. There still will be no general error reported, because API was able to successfully connect to searchd, submit the batch, and receive the results - but every result set will have a specific error message.
func (*Client) RunQuery ¶
func (cl *Client) RunQuery(query Search) (*QueryResult, error)
RunQuery connects to searchd, runs a query, obtains and returns the result set. Returns nil and error message on general error (such as network I/O failure). Returns a result set on success.
`query` is a single Search structure, representing the query. You need to prepare it yourself before call.
Each result set in the returned array is exactly the same as the result set returned from RunQuery.
func (*Client) SetConnectTimeout ¶
SetConnectTimeout sets the time allowed to spend connecting to the server before giving up.
Under some circumstances, the server can be delayed in responding, either due to network delays, or a query backlog. In either instance, this allows the client application programmer some degree of control over how their program interacts with searchd when not available, and can ensure that the client application does not fail due to exceeding the execution limits.
In the event of a failure to connect, an appropriate error code should be returned back to the application in order for application-level error handling to advise the user.
func (*Client) SetMaxAlloc ¶
SetMaxAlloc limits size of client's network buffer. For sending queries and receiving results client reuses byte array, which can grow up to required size. If the limit reached, array will be released and new one will be created. Usually API needs just few kilobytes of the memory, but sometimes the value may grow significantly high. For example, if you fetch a big resultset with many attributes. Such resultset will be properly received and processed, however at the next query backend array which used for it will be released, and occupied memory will be returned to runtime.
`alloc` is size, in bytes. Reasonable default value is 8M.
func (*Client) SetServer ¶
SetServer sets searchd host name and TCP port. All subsequent requests will use the new host and port settings. Default host and port are ‘localhost’ and 9312, respectively.
`host` is either url (hostname or ip address), either unix socket path (starting with '/')
`port` is optional, it has sense only for tcp connections and not used for unix socket. Default is 9312
Example (Tcpsocket) ¶
cl := NewClient() cl.SetServer("google.com", 9999) fmt.Println(cl.dialmethod) fmt.Println(cl.host) fmt.Println(cl.port)
Output: tcp google.com 9999
Example (Unixsocket) ¶
cl := NewClient() cl.SetServer("/var/log") fmt.Println(cl.dialmethod) fmt.Println(cl.host)
Output: unix /var/log
func (*Client) Sphinxql ¶
Sphinxql send sphinxql request encapsulated into API. Return over network came in mysql native proto format, which is parsed by SDK and represented as usable structure (see Sqlresult definition). Also result provides Stringer interface, so it may be printed nice without any postprocessing. Limitation of the command is that it is done in one session, as if you open connection via mysql, execute the command and disconnected. So, some information, like 'show meta' after 'call pq' will be lost in such case (however, you can invoke CallPQ directly from API), but another things like 'select...; show meta' in one line is still supported and work well
func (*Client) Status ¶
Status queries searchd status, and returns an array of status variable name and value pairs.
`global` determines whether you take global status, or meta of the last query.
true: receive global daemon status false: receive meta of the last executed query
Usage example:
status, err := cl.Status(false) if err != nil { fmt.Println(err.Error()) } else { for key, line := range (status) { fmt.Printf("%v:\t%v\n", key, line) } }
example output:
time: 0.000 keyword[0]: query docs[0]: 1235 hits[0]: 1474 total: 3 total_found: 3
func (*Client) UpdateAttributes ¶
func (cl *Client) UpdateAttributes(index string, attrs []string, values map[DocID][]interface{}, vtype EUpdateType, ignorenonexistent bool) (int, error)
UpdateAttributes instantly updates given attribute values in given documents. Returns number of actually updated documents (0 or more) on success, or -1 on failure with error.
`index` is a name of the index (or indexes) to be updated. It can be either a single index name or a list, like in Query(). Unlike Query(), wildcard is not allowed and all the indexes to update must be specified explicitly. The list of indexes can include distributed index names. Updates on distributed indexes will be pushed to all agents.
`attrs` is a slice with string attribute names, listing attributes that are updated.
`values` is a map with documents IDs as keys and new attribute values, see below.
`vtype` type parameter, see EUpdateType description for values.
`ignorenonexistent` points that the update will silently ignore any warnings about trying to update a column which is not exists in current index schema.
Usage example:
upd, err := cl.UpdateAttributes("test1", []string{"group_id"}, map[DocID][]interface{}{1:{456}}, UpdateInt, false)
Here we update document 1 in index test1, setting group_id to 456.
upd, err := cl.UpdateAttributes("products", []string{"price", "amount_in_stock"}, map[DocID][]interface{}{1001:{123,5}, 1002:{37,11}, 1003:{25,129}}, UpdateInt, false)
Here we update documents 1001, 1002 and 1003 in index products. For document 1001, the new price will be set to 123 and the new amount in stock to 5; for document 1002, the new price will be 37 and the new amount will be 11; etc.
func (*Client) Uvar ¶
Uvar defines remote user variable which later may be used for filtering. You can really push megabytes of values and later just refer to the whole set by name.
`name` is the name of the variable, must start with @, like "@foo"
`values` is array of the numbers you want to store in the variable. It is considered as 'set', so dupes will be removed, order will not be kept. Like: []uint64{7811237,7811235,7811235,7811233,7811236}
type ColumnInfo ¶
type ColumnInfo struct { Name string // name of the attribute Type EAttrType // type of the attribute }
ColumnInfo represents one attribute column in resultset schema
func (ColumnInfo) String ¶
func (res ColumnInfo) String() string
Stringer interface for ColumnInfo type
type EAttrType ¶
type EAttrType uint32
EAttrType represents known attribute types. See comments in constants for concrete meaning. Values of this type will be returned with resultset schema, you don't need to use them yourself.
const ( AttrNone EAttrType = iota // not an attribute at all AttrInteger // unsigned 32-bit integer AttrTimestamp // this attr is a timestamp AttrBool // this attr is a boolean bit field AttrFloat // floating point number (IEEE 32-bit) AttrBigint // signed 64-bit integer AttrString // string (binary; in-memory) AttrPoly2d // vector of floats, 2D polygon (see POLY2D) AttrStringptr // string (binary, in-memory, stored as pointer to the zero-terminated string) AttrTokencount // field token count, 32-bit integer AttrJson // JSON subset; converted, packed, and stored as string AttrUint32set EAttrType = 0x40000001 // MVA, set of unsigned 32-bit integers AttrInt64set EAttrType = 0x40000002 // MVA, set of signed 64-bit integers )
const ( AttrMaparg EAttrType = 1000 + iota AttrFactors // packed search factors (binary, in-memory, pooled) AttrJsonField // points to particular field in JSON column subset AttrFactorsJson // packed search factors (binary, in-memory, pooled, provided to Client json encoded) )
these types are runtime only used as intermediate types in the expression engine
type EGroupBy ¶
type EGroupBy uint32
EGroupBy selects search query grouping mode. It is used as a param when calling `SetGroupBy()` function.
GroupbyDay ¶
GroupbyDay extracts year, month and day in YYYYMMDD format from timestamp.
GroupbyWeek ¶
GroupbyWeek extracts year and first day of the week number (counting from year start) in YYYYNNN format from timestamp.
GroupbyMonth ¶
GroupbyMonth extracts month in YYYYMM format from timestamp.
GroupbyYear ¶
GroupbyYear extracts year in YYYY format from timestamp.
GroupbyAttr ¶
GroupbyAttr uses attribute value itself for grouping.
GroupbyMultiple ¶
GroupbyMultiple group by on multiple attribute values. Allowed plain attributes and json fields; MVA and full JSONs are not allowed.
type EMatchMode ¶
type EMatchMode uint32
EMatchMode selects search query matching mode. So-called matching modes are a legacy feature that used to provide (very) limited query syntax and ranking support. Currently, they are deprecated in favor of full-text query language and so-called Available built-in rankers. It is thus strongly recommended to use `MatchExtended` and proper query syntax rather than any other legacy mode. All those other modes are actually internally converted to extended syntax anyway. SphinxAPI still defaults to `MatchAll` but that is for compatibility reasons only.
There are the following matching modes available:
MatchAll ¶
MatchAll matches all query words.
MatchAny ¶
MatchAny matches any of the query words.
MatchPhrase ¶
MatchPhrase, matches query as a phrase, requiring perfect match.
MatchBoolean ¶
MatchBoolean, matches query as a boolean expression (see Boolean query syntax).
MatchExtended ¶
MatchExtended2 ¶
MatchExtended, MatchExtended2 (alias) matches query as an expression in Manticore internal query language (see Extended query syntax). This is default matching mode if nothing else specified.
MatchFullscan ¶
MatchFullscan, matches query, forcibly using the “full scan” mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching. MatchFullscan mode will be automatically activated in place of the specified matching mode when the query string is empty (ie. its length is zero).
In full scan mode, all the indexed documents will be considered as matching. Such queries will still apply filters, sorting, and group by, but will not perform any full-text searching. This can be useful to unify full-text and non-full-text searching code, or to offload SQL server (there are cases when Manticore scans will perform better than analogous MySQL queries). An example of using the full scan mode might be to find posts in a forum. By selecting the forum’s user ID via SetFilter() but not actually providing any search text, Manticore will match every document (i.e. every post) where SetFilter() would match - in this case providing every post from that user. By default this will be ordered by relevancy, followed by Manticore document ID in ascending order (earliest first).
const ( MatchAll EMatchMode = iota // match all query words MatchAny // match any query word MatchPhrase // match this exact phrase MatchBoolean // match this boolean query MatchExtended // match this extended query MatchFullscan // match all document IDs w/o fulltext query, apply filters MatchExtended2 // extended engine V2 (TEMPORARY, WILL BE REMOVED IN 0.9.8-RELEASE) MatchTotal )
type ERankMode ¶
type ERankMode uint32
ERankMode selects query relevance ranking mode. It is set via `SetRankingMode()` and `SetRankingExpression()` functions.
Manticore ships with a number of built-in rankers suited for different purposes. A number of them uses two factors, phrase proximity (aka LCS) and BM25. Phrase proximity works on the keyword positions, while BM25 works on the keyword frequencies. Basically, the better the degree of the phrase match between the document body and the query, the higher is the phrase proximity (it maxes out when the document contains the entire query as a verbatim quote). And BM25 is higher when the document contains more rare words. We’ll save the detailed discussion for later.
Currently implemented rankers are:
RankProximityBm25 ¶
RankProximityBm25, the default ranking mode that uses and combines both phrase proximity and BM25 ranking.
RankBm25 ¶
RankBm25, statistical ranking mode which uses BM25 ranking only (similar to most other full-text engines). This mode is faster but may result in worse quality on queries which contain more than 1 keyword.
RankNone ¶
RankNone, no ranking mode. This mode is obviously the fastest. A weight of 1 is assigned to all matches. This is sometimes called boolean searching that just matches the documents but does not rank them.
RankWordcount ¶
RankWordcount, ranking by the keyword occurrences count. This ranker computes the per-field keyword occurrence counts, then multiplies them by field weights, and sums the resulting values.
RankProximity ¶
RankProximity, returns raw phrase proximity value as a result. This mode is internally used to emulate MatchAll queries.
RankMatchany ¶
RankMatchany, returns rank as it was computed in SPH_MATCH_ANY mode earlier, and is internally used to emulate MatchAny queries.
RankFieldmask ¶
RankFieldmask, returns a 32-bit mask with N-th bit corresponding to N-th fulltext field, numbering from 0. The bit will only be set when the respective field has any keyword occurrences satisfying the query.
RankSph04 ¶
RankSph04, is generally based on the default SPH_RANK_PROXIMITY_BM25 ranker, but additionally boosts the matches when they occur in the very beginning or the very end of a text field. Thus, if a field equals the exact query, SPH04 should rank it higher than a field that contains the exact query but is not equal to it. (For instance, when the query is “Hyde Park”, a document entitled “Hyde Park” should be ranked higher than a one entitled “Hyde Park, London” or “The Hyde Park Cafe”.)
RankExpr ¶
RankExpr, lets you specify the ranking formula in run time. It exposes a number of internal text factors and lets you define how the final weight should be computed from those factors.
RankExport ¶
RankExport, rank by BM25, but compute and export all user expression factors ¶
RankPlugin ¶
RankPlugin, rank by user-defined ranker provided as UDF function.
const ( RankProximityBm25 ERankMode = iota // default mode, phrase proximity major factor and BM25 minor one (aka SPH03) RankBm25 // statistical mode, BM25 ranking only (faster but worse quality) RankNone // no ranking, all matches get a weight of 1 RankWordcount // simple word-count weighting, rank is a weighted sum of per-field keyword occurence counts RankProximity // phrase proximity (aka SPH01) RankMatchany // emulate old match-any weighting (aka SPH02) RankFieldmask // sets bits where there were matches RankSph04 // codename SPH04, phrase proximity + bm25 + head/exact boost RankExpr // rank by user expression (eg. "sum(lcs*user_weight)*1000+bm25") RankExport // rank by BM25, but compute and export all user expression factors RankPlugin // user-defined ranker RankTotal RankDefault = RankProximityBm25 )
type ESearchdstatus ¶
type ESearchdstatus uint16
ESearchdstatus describes known return codes. Also status codes for search command (but there 32bit)
const ( StatusOk ESearchdstatus = iota // general success, command-specific reply follows StatusError // general failure, error message follows StatusRetry // temporary failure, error message follows, Client should retry late StatusWarning // general success, warning message and command-specific reply follow )
func (ESearchdstatus) String ¶
func (vl ESearchdstatus) String() string
Stringer interface for ESearchdstatus type
type ESortOrder ¶
type ESortOrder uint32
ESortOrder selects search query sorting orders
There are the following result sorting modes available:
SortRelevance ¶
SortRelevance sorts by relevance in descending order (best matches first).
SortAttrDesc ¶
SortAttrDescmode sorts by an attribute in descending order (bigger attribute values first).
SortAttrAsc ¶
SortAttrAsc mode sorts by an attribute in ascending order (smaller attribute values first).
SortTimeSegments ¶
SortTimeSegments sorts by time segments (last hour/day/week/month) in descending order, and then by relevance in descending order. Attribute values are split into so-called time segments, and then sorted by time segment first, and by relevance second.
The segments are calculated according to the current timestamp at the time when the search is performed, so the results would change over time. The segments are as follows:
last hour,
last day,
last week,
last month,
last 3 months,
everything else.
These segments are hardcoded, but it is trivial to change them if necessary.
This mode was added to support searching through blogs, news headlines, etc. When using time segments, recent records would be ranked higher because of segment, but within the same segment, more relevant records would be ranked higher - unlike sorting by just the timestamp attribute, which would not take relevance into account at all.
SortExtended ¶
SortExtended sorts by SQL-like combination of columns in ASC/DESC order. You can specify an SQL-like sort expression with up to 5 attributes (including internal attributes), eg:
@relevance DESC, price ASC, @id DESC
Both internal attributes (that are computed by the engine on the fly) and user attributes that were configured for this index are allowed. Internal attribute names must start with magic @-symbol; user attribute names can be used as is. In the example above, @relevance and @id are internal attributes and price is user-specified.
Known internal attributes are:
@id (match ID)
@weight (match weight)
@rank (match weight)
@relevance (match weight)
@random (return results in random order)
@rank and @relevance are just additional aliases to @weight.
SortExpr ¶
SortExpr sorts by an arithmetic expression.
`SortRelevance` ignores any additional parameters and always sorts matches by relevance rank. All other modes require an additional sorting clause, with the syntax depending on specific mode. SortAttrAsc, SortAttrDesc and SortTimeSegments modes require simply an attribute name. SortRelevance is equivalent to sorting by “@weight DESC, @id ASC” in extended sorting mode, SortAttrAsc is equivalent to “attribute ASC, @weight DESC, @id ASC”, and SortAttrDesc to “attribute DESC, @weight DESC, @id ASC” respectively.
const ( SortRelevance ESortOrder = iota // sort by document relevance desc, then by date SortAttrDesc // sort by document data desc, then by relevance desc SortAttrAsc // sort by document data asc, then by relevance desc SortTimeSegments // sort by time segments (hour/day/week/etc) desc, then by relevance desc SortExtended // sort by SQL-like expression (eg. "@relevance DESC, price ASC, @id DESC") SortExpr // sort by arithmetic expression in descending order (eg. "@id + max(@weight,1000)*boost + log(price)") SortTotal )
type EUpdateType ¶
type EUpdateType uint32
EUpdateType is values for `vtype` of UpdateAttributes() call, which determines meaning of `values` param of this function.
UpdateInt ¶
This is the default value. `values` hash holds documents IDs as keys and a plain arrays of new attribute values.
UpdateMva ¶
Points that MVA attributes are being updated. In this case the `values` must be a hash with document IDs as keys and array of arrays of int values (new MVA attribute values).
UpdateString ¶
Points that string attributes are being updated. `values` must be a hash with document IDs as keys and array of strings as values.
UpdateJson ¶
Works the same as `UpdateString`, but for JSON attribute updates.
const ( UpdateInt EUpdateType = iota UpdateMva UpdateString UpdateJson )
type ExcerptFlags ¶
type ExcerptFlags uint32
ExcerptFlags is bitmask for SnippetOptions.Flags Different values have to be combined with '+' or '|' operation from following constants:
ExcerptFlagExactphrase ¶
Whether to highlight exact query phrase matches only instead of individual keywords.
ExcerptFlagUseboundaries ¶
Whether to additionally break passages by phrase boundary characters, as configured in index settings with phrase_boundary directive.
ExcerptFlagWeightorder ¶
Whether to sort the extracted passages in order of relevance (decreasing weight), or in order of appearance in the document (increasing position).
ExcerptFlagQuery ¶
Whether to handle 'words' as a query in extended syntax, or as a bag of words (default behavior). For instance, in query mode "(one two | three four)" will only highlight and include those occurrences one two or three four when the two words from each pair are adjacent to each other. In default mode, any single occurrence of one, two, three, or four would be highlighted.
ExcerptFlagForceAllWords ¶
Ignores the snippet length limit until it includes all the keywords.
ExcerptFlagLoadFiles ¶
Whether to handle 'docs' as data to extract snippets from (default behavior), or to treat it as file names, and load data from specified files on the server side. Up to dist_threads worker threads per request will be created to parallelize the work when this flag is enabled. To parallelize snippets build between remote agents, configure “dist_threads” param of searchd to value greater than 1, and then invoke the snippets generation over the distributed index, which contain only one(!) local agent and several remotes. The “snippets_file_prefix” param of remote daemons is also in the game and the final filename is calculated by concatenation of the prefix with given name.
ExcerptFlagAllowEmpty ¶
Allows empty string to be returned as highlighting result when a snippet could not be generated (no keywords match, or no passages fit the limit). By default, the beginning of original text would be returned instead of an empty string.
ExcerptFlagEmitZones ¶
Emits an HTML tag with an enclosing zone name before each passage.
ExcerptFlagFilesScattered ¶
It works only with distributed snippets generation with remote agents. The source files for snippets could be distributed among different agents, and the main daemon will merge together all non-erroneous results. So, if one agent of the distributed index has ‘file1.txt’, another has ‘file2.txt’ and you call for the snippets with both these files, the daemon will merge results from the agents together, so you will get the snippets from both ‘file1.txt’ and ‘file2.txt’.
If the load_files is also set, the request will return the error in case if any of the files is not available anywhere. Otherwise (if 'load_files' is not set) it will just return the empty strings for all absent files. The master instance reset this flag when distributes the snippets among agents. So, for agents the absence of a file is not critical error, but for the master it is so. If you want to be sure that all snippets are actually created, set both `load_files_scattered` and `load_files`. If the absence of some snippets caused by some agents is not critical for you - set just `load_files_scattered`, leaving `load_files` not set.
ExcerptFlagForcepassages ¶
Whether to generate passages for snippet even if limits allow to highlight whole text.
Confusion and deprecation
const ( ExcerptFlagExactphrase ExcerptFlags ExcerptFlagUseboundaries ExcerptFlagWeightorder ExcerptFlagQuery ExcerptFlagForceAllWords ExcerptFlagLoadFiles ExcerptFlagAllowEmpty ExcerptFlagEmitZones ExcerptFlagFilesScattered ExcerptFlagForcepassages )
type JsonAnswer ¶
JsonAnswer encapsulates answer to Json command.
`Endpoint` - endpoint to which request was directed
`Answer` - string, containing the answer. In opposite to true HTTP connection, here only string mesages given, no numeric error codes.
type JsonOrStr ¶
type JsonOrStr struct { IsJson bool // true, if Val is JSON document; false if it is just a plain string Val string // value (string or JSON document) }
JsonOrStr is typed string with explicit flag whether it is 'just a string', or json document. It may be used, say, to either escape plain strings when appending to JSON structure, either add it 'as is' assuming it is alreayd json. Such values came from daemon as attribute values for PQ indexes.
type Keyword ¶
type Keyword struct { Tokenized string // token from the query Normalized string // normalized token after all stemming/lemming Querypos int // position in the query Docs int // number of docs (from backend index) Hits int // number of hits (from backend index) }
Keyword represents a keyword returned from BuildKeywords() call
type Match ¶
type Match struct { DocID DocID // key Document ID Weight int // weight of the match Attrs []interface{} // optional array of attributes, quantity and types depends from schema }
Match represents one match (document) in result schema
type PqQuery ¶
type PqQuery = struct { Flags QueryDescFlags Query string Tags string Filters string }
PqQuery describes one separate query info from resultset of CallPQ/CallPQBson
Flags determines type of the Query, and also whether other fields of the struct are filled or not.
Query, Tags, Filters - attributes saved with query, all are optional
type PqResponseFlags ¶
type PqResponseFlags uint32
PqResponseFlags determines boolean flags came in SearchPqResponse result These flags are unified into one bitfield used instead of bunch of separate flags.
There are following bits available:
HasDocs ¶
HasDocs indicates that each QueryDesc of Queries result array have array of documents in Docs field. Otherwise this field there is nil.
DumpQueries ¶
DumpQueries indicates that each query contains additional info, like query itself, tags and filters. Otherwise it have only the number - QueryID and nothing more.
HasDocids ¶
HasDocids, came in pair with HasDocs, indicates that array of documents in Queries[]Docs field is array of uint64 with document ids, provided in documents of original query. Otherwise it is array of int32 with order numbers, may be shifted by Shift param.
const ( HasDocs PqResponseFlags = (1 << iota) DumpQueries HasDocids )
type Pqflags ¶
type Pqflags uint32
Pqflags determines boolean parameter flags for CallQP options This flags are unified into one bitfield used instead of bunch of separate flags.
There are the following flags for CallPQ modes available:
NeedDocs ¶
NeedDocs require to provide numbers of matched documents. It is either order numbers from the set of provided documents, or DocIDs, if documents are JSON and you pointed necessary field which contains DocID. (NOTE: json PQ calls are not yet implemented via API, it will be done later).
NeedQuery ¶
NeedQuery require to return not only QueryID of the matched queries, but also another information about them. It may include query itself, tags and filters.
Verbose ¶
Verbose, require to return additional meta-information about matching and queries. It causes daemon to fill fields TmSetup, TmTotal, QueriesFailed, EarlyOutQueries and QueryDT of SearchPqResponse structure.
SkipBadJson ¶
SkipBadJson, require to not fail on bad (ill-formed) jsons, but warn and continue processing. This flag works only for bson queries and useless for plain text (may even cause warning if provided there).
type Qflags ¶
type Qflags uint32
Qflags is bitmask with query flags which is set by calling Search.SetQueryFlags() Different values have to be combined with '+' or '|' operation from following constants:
QflagReverseScan ¶
Control the order in which full-scan query processes the rows.
0 direct scan 1 reverse scan QFlagSortKbuffer
Determines sort method for resultset sorting. The result set is in both cases the same; picking one option or the other may just improve (or worsen!) performance.
0 priority queue 1 k-buffer (gives faster sorting for already pre-sorted data, e.g. index data sorted by id)
QflagMaxPredictedTime ¶
Determines if query has or not max_predicted_time option as an extra parameter
0 no predicted time provided 1 query contains predicted time metric
QflagSimplify ¶
Switch on query boolean simplification to speed it up If set to 1, daemon will simplify complex queries or queries that produced by different algos to eliminate and optimize different parts of query.
0 query will be calculated without transformations 1 query will be transformed and simplified.
List of performed transformation is:
common NOT ((A !N) | (B !N)) -> ((A|B) !N) common compound NOT ((A !(N C)) | (B !(N D))) -> (((A|B) !N) | (A !C) | (B !D)) // if cost(N) > cost(A) + cost(B) common sub-term ((A (X | C)) | (B (X | D))) -> (((A|B) X) | (A C) | (B D)) // if cost(X) > cost(A) + cost(B) common keywords (A | "A B"~N) -> A ("A B" | "A B C") -> "A B" ("A B"~N | "A B C"~N) -> ("A B"~N) common PHRASE ("X A B" | "Y A B") -> (("X|Y") "A B") common AND NOT factor ((A !X) | (A !Y) | (A !Z)) -> (A !(X Y Z)) common OR NOT ((A !(N | N1)) | (B !(N | N2))) -> (( (A !N1) | (B !N2) ) !N) excess brackets ((A | B) | C) -> ( A | B | C ) ((A B) C) -> ( A B C ) excess AND NOT ((A !N1) !N2) -> (A !(N1 | N2)) QflagPlainIdf Determines how BM25 IDF will be calculated. Below ``N'' is collection size, and ``n'' is number of matched documents 1 plain IDF = log(N/n), as per Sparck-Jonesor 0 normalized IDF = log((N-n+1)/n), as per Robertson et al
QflagGlobalIdf ¶
Determines whether to use global statistics (frequencies) from the global_idf file for IDF computations, rather than the local index statistics.
0 use local index statistics 1 use global_idf file (see https://docs.manticoresearch.com/latest/html/conf_options_reference/index_configuration_options.html#global-idf)
QflagNormalizedTfIdf ¶
Determines whether to divide IDF value additionally by query word count, so that TF*IDF fits into [0..1] range
0 don't divide IDF by query word count 1 divide IDF by query word count
Notes for QflagPlainIdf and QflagNormalizedTfIdf flags ¶
The historically default IDF (Inverse Document Frequency) in Manticore is equivalent to QflagPlainIdf=0, QflagNormalizedTfIdf=1, and those normalizations may cause several undesired effects.
First, normalized idf (QflagPlainIdf=0) causes keyword penalization. For instance, if you search for [the | something] and [the] occurs in more than 50% of the documents, then documents with both keywords [the] and [something] will get less weight than documents with just one keyword [something]. Using QflagPlainIdf=1 avoids this. Plain IDF varies in [0, log(N)] range, and keywords are never penalized; while the normalized IDF varies in [-log(N), log(N)] range, and too frequent keywords are penalized.
Second, QflagNormalizedTfIdf=1 causes IDF drift over queries. Historically, we additionally divided IDF by query keyword count, so that the entire sum(tf*idf) over all keywords would still fit into [0,1] range. However, that means that queries [word1] and [word1 | nonmatchingword2] would assign different weights to the exactly same result set, because the IDFs for both “word1” and “nonmatchingword2” would be divided by 2. QflagNormalizedTfIdf=0 fixes that. Note that BM25, BM25A, BM25F() ranking factors will be scaled accordingly once you disable this normalization.
QflagLocalDf ¶
Determines whether to automatically sum DFs over all the local parts of a distributed index, so that the IDF is consistent (and precise) over a locally sharded index.
0 don't sum local DFs 1 sum local DFs
QflagLowPriority ¶
Determines priority for executing the query
0 run the query in usual (normal) priority 1 run the query in idle priority
QflagFacet ¶
Determines slave role of the query in multi-query facet
0 query is not a facet query, or is main facet query 1 query is depended (slave) part of facet multiquery
QflagFacetHead ¶
Determines slave role of the query in multi-query facet
0 query is not a facet query, or is slave of facet query 1 query is main (head) query of facet multiquery
QflagJsonQuery ¶
Determines if query is originated from REST api and so, must be parsed as one of JSON syntax
0 query is API query 1 query is JSON query
Example ¶
fl := QflagJsonQuery fmt.Println(fl)
Output: 2048
const ( QflagReverseScan Qflags = 1 << iota // direct or reverse full-scans QFlagSortKbuffer // pq or kbuffer for sorting QflagMaxPredictedTime // has or not max_predicted_time value QflagSimplify // apply or not boolean simplification QflagPlainIdf // plain or normalized idf QflagGlobalIdf // use or not global idf QflagNormalizedTfIdf // plain or normalized tf-idf QflagLocalDf // sum or not DFs over a locally sharderd (distributed) index QflagLowPriority // run query in idle priority QflagFacet // query is part of facet batch query QflagFacetHead // query is main facet query QflagJsonQuery // query is JSON query (otherwise - API query) )
type QueryDesc ¶
QueryDesc represents an elem of Queries array from SearchPqResponse and describe one returned stored query.
QueryID ¶
QueryID is namely, Query ID. In most minimal query it is the only returned field.
Docs ¶
Docs is filled only if flag HasDocs is set, and contains either array of DocID (which are uint64) - if flag HasDocids is set, either array of doc ordinals (which are int32), if flag HasDocids is NOT set.
Query ¶
Query is query meta, in addition to QueryID. It is filled only if in the query options they were requested via bit NeedQuery, and may contain query string, tags and filters.
type QueryDescFlags ¶
type QueryDescFlags uint32
QueryDescFlags is bitfield describing internals of PqQuery struct This flags are unified into one bitfield used instead of bunch of separate flags.
There are following bits available:
QueryPresent ¶
QueryPresent indicates that field Query is valid. Otherwise it is not touched ("" by default)
TagsPresent ¶
TagsPresent indicates that field Tags is valid. Otherwise it is not touched ("" by default)
FiltersPresent ¶
FiltersPresent indicates that field Filters is valid. Otherwise it is not touched ("" by default)
QueryIsQl ¶
QueryIsQl indicates that field Query (if present) is query in sphinxql syntax. Otherwise it is query in json syntax. PQ index can store indexes in both format, and this flag in resultset helps you to distinguish them (both are text, but syntax m.b. different)
const ( QueryPresent QueryDescFlags = (1 << iota) TagsPresent FiltersPresent QueryIsQl )
type QueryResult ¶
type QueryResult struct {
Error, Warning string // messages (if any)
Status ESearchdstatus // status code for current resultset
Fields []string // fields of the schema
Attrs []ColumnInfo // attributes of the schema
Id64 bool // if DocumentID is 64-bit (always true)
Matches []Match // set of matches according to schema
Total, TotalFound int // num of matches and total num of matches found
QueryTime time.Duration // query duration
WordStats []WordStat // words statistic
}
QueryResult represents resultset from successful Query/RunQuery, or one of resultsets from RunQueries call.
func (QueryResult) String ¶
func (res QueryResult) String() string
Stringer interface for QueryResult type
type Search ¶
type Search struct { Offset int32 // offset into resultset (0) Limit int32 // count of resultset (20) MaxMatches int32 CutOff int32 RetryCount int32 MaxQueryTime time.Duration RetryDelay time.Duration MatchMode EMatchMode // Matching mode FieldWeights map[string]int32 // bind per-field weights by name IndexWeights map[string]int32 // bind per-index weights by name IDMin DocID // set IDs range to match (from) IDMax DocID // set IDs range to match (to) Groupfunc EGroupBy GroupBy string GroupSort string GroupDistinct string // count-distinct attribute for group-by queries SelectClause string // select-list (attributes or expressions), SQL-like syntax Indexes string Comment string Query string // contains filtered or unexported fields }
Search represents one search query. Exported fields may be set directly. Unexported which bind by internal dependencies and constrains intended to be set wia special methods.
func NewSearch ¶
NewSearch construct default search which then may be customized. You may just customize 'Query' and m.b. 'Indexes' from default one, and it will work like a simple 'Query()' call.
func (*Search) AddFilter ¶
AddFilter adds new integer values set filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name
`values` must be a plain slice containing integer values.
`exclude` controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index matches any of the values from `values` slice will be matched (or rejected, if `exclude` is true).
func (*Search) AddFilterExpression ¶
AddFilterExpression adds new filter by expression.
On this call, additional new filter is added to the existing list of filters.
The only value `expression` must contain filtering expression which returns bool.
Expression has SQL-like syntax and may refer to columns (usually json fields) by name, and may look like: 'j.price - 1 > 3 OR j.tag IS NOT null' Documents either filtered by 'true' expression, either (if `exclude` is set to true) by 'false'.
func (*Search) AddFilterFloatRange ¶
AddFilterFloatRange adds new float range filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name.
`fmin` and `fmax` must be floats that define the acceptable attribute values range (including the boundaries).
`exclude` controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index is between `fmin` and `fmax` (including values that are exactly equal to `fmin` or `fmax`) will be matched (or rejected, if `exclude` is true).
func (*Search) AddFilterNull ¶
AddFilterNull adds new IsNull filter.
On this call, additional new filter is added to the existing list of filters. Documents where `attribute` is null will match, (if `isnull` is true) or not match (if `isnull` is false).
func (*Search) AddFilterRange ¶
AddFilterRange adds new integer range filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name.
`imin` and `imax` must be integers that define the acceptable attribute values range (including the boundaries).
`exclude` controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index is between `imin` and `imax` (including values that are exactly equal to `imin` or `imax`) will be matched (or rejected, if `exclude` is true).
func (*Search) AddFilterString ¶
AddFilterString adds new string value filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name.
`value` must be a string.
`exclude` must be a boolean value; it controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index equal to string value from `value` will be matched (or rejected, if `exclude` is true).
func (*Search) AddFilterStringList ¶
AddFilterStringList adds new string list filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name.
`values` must be slice of strings
`exclude` must be a boolean value; it controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index equal to one of string values from `values` will be matched (or rejected, if `exclude` is true).
func (*Search) AddFilterUservar ¶
AddFilterUservar adds new uservar filter.
On this call, additional new filter is added to the existing list of filters.
`attribute` must be a string with attribute name.
`uservar` must be name of user variable, containing list of filtering values, starting from @, as "@var"
`exclude` must be a boolean value; it controls whether to accept the matching documents (default mode, when `exclude` is false) or reject them.
Only those documents where `attribute` column value stored in the index equal to one of the values stored in `uservar` variable on daemon side (or rejected, if `exclude` is true). Such filter intended to save huge list of variables once on the server, and then refer to it by name. Saving the list might be done by separate call of 'SetUservar()'
func (*Search) ChangeQueryFlags ¶
ChangeQueryFlags changes (set or reset) query flags by mask `flags`.
func (*Search) ResetFilters ¶
func (q *Search) ResetFilters()
ResetFilters clears all currently set search filters.
This call is only normally required when using multi-queries. You might want to set different filters for different queries in the batch. To do that, you may either create another Search request and fill it from the scratch, either copy existing (last one) and modify. To change all the filters in the copy you can call ResetFilters() and add new filters using the respective calls.
func (*Search) ResetGroupBy ¶
func (q *Search) ResetGroupBy()
ResetGroupBy clears all currently group-by settings, and disables group-by.
This call is only normally required when using multi-queries. You might want to set different group-by settings in the batch. To do that, you may either create another Search request and fill ot from the scratch, either copy existing (last one) and modify. In last case you can change individual group-by settings using SetGroupBy() and SetGroupDistinct() calls, but you can not disable group-by using those calls. ResetGroupBy() fully resets previous group-by settings and disables group-by mode in the current Search query.
func (*Search) ResetOuterSelect ¶
func (q *Search) ResetOuterSelect()
ResetOuterSelect clears all outer select settings
This call is only normally required when using multi-queries. You might want to set different outer select settings in the batch. To do that, you may either create another Search request and fill ot from the scratch, either copy existing (last one) and modify. In last case you can change individual group-by settings using SetOuterSelect() calls, but you can not disable outer statement by this calls. ResetOuterSelect() fully resets previous outer select settings.
func (*Search) ResetQueryFlags ¶
func (q *Search) ResetQueryFlags()
ResetQueryFlags resets query flags of Select query to default value, and also reset value set by SetMaxPredictedTime() call.
This call is only normally required when using multi-queries. You might want to set different flags of Select queries in the batch. To do that, you may either create another Search request and fill ot from the scratch, either copy existing (last one) and modify. In last case you can change individual or many flags using SetQueryFlags() and ChangeQueryFlags() calls. This call just one-shot set all the flags to default value `QflagNormalizedTfIdf`, and also set predicted time to 0.
func (*Search) SetGeoAnchor ¶
SetGeoAnchor sets anchor point for and geosphere distance (geodistance) calculations, and enable them.
`attrlat` and `attrlong` contain the names of latitude and longitude attributes, respectively.
`lat` and `long` specify anchor point latitude and longitude, in radians.
Once an anchor point is set, you can use magic @geodist attribute name in your filters and/or sorting expressions. Manticore will compute geosphere distance between the given anchor point and a point specified by latitude and longitude attributes from each full-text match, and attach this value to the resulting match. The latitude and l ongitude values both in SetGeoAnchor and the index attribute data are expected to be in radians. The result will be returned in meters, so geodistance value of 1000.0 means 1 km. 1 mile is approximately 1609.344 meters.
func (*Search) SetGroupBy ¶
SetGroupBy sets grouping attribute, function, and groups sorting mode; and enables grouping.
`attribute` is a string that contains group-by attribute name.
`func` is a constant that chooses a function applied to the attribute value in order to compute group-by key.
`groupsort` is optional clause that controls how the groups will be sorted.
Grouping feature is very similar in nature to GROUP BY clause from SQL. Results produces by this function call are going to be the same as produced by the following pseudo code:
SELECT ... GROUP BY func(attribute) ORDER BY groupsort
Note that it’s `groupsort` that affects the order of matches in the final result set. Sorting mode (see `SetSortMode()`) affect the ordering of matches within group, ie. what match will be selected as the best one from the group. So you can for instance order the groups by matches count and select the most relevant match within each group at the same time.
Grouping on string attributes is supported, with respect to current collation.
func (*Search) SetMaxPredictedTime ¶
SetMaxPredictedTime set max predicted time and according query flag
func (*Search) SetOuterSelect ¶
SetOuterSelect determines outer select conditions for Search query.
`orderby` specify clause with SQL-like syntax as "foo ASC, bar DESC, baz" where name of the items (`foo`, `bar`, `baz` in example) are the names of columns originating from internal query.
`offset` and `limit` has the same meaning as fields Offset and Limit in the clause, but applied to outer select.
Outer select currently have 2 usage cases:
1. We have a query with 2 ranking UDFs, one very fast and the other one slow and we perform a full-text search will a big match result set. Without outer the query would look like
q := NewSearch("some common query terms", "index", "") q.SelectClause = "id, slow_rank() as slow, fast_rank as fast" q.SetSortMode( SortExtended, "fast DESC, slow DESC" ) // q.Limit=20, q.MaxMatches=1000 - are default, so we don't set them explicitly
With subselects the query can be rewritten as :
q := NewSearch("some common query terms", "index", "") q.SelectClause = "id, slow_rank() as slow, fast_rank as fast" q.SetSortMode( SortExtended, "fast DESC" ) q.Limit=100 q.SetOuterSelect("slow desc", 0, 20)
In the initial query the slow_rank() UDF is computed for the entire match result set. With subselects, only fast_rank() is computed for the entire match result set, while slow_rank() is only computed for a limited set.
2. The second case comes handy for large result set coming from a distributed index.
For this query:
q := NewSearch("some conditions", "my_dist_index", "") q.Limit = 50000
If we have 20 nodes, each node can send back to master a number of 50K records, resulting in 20 x 50K = 1M records, however as the master sends back only 50K (out of 1M), it might be good enough for us for the nodes to send only the top 10K records. With outer select we can rewrite the query as:
q := NewSearch("some conditions", "my_dist_index", "") q.Limit = 10000 q.SetOuterSelect("some_attr", 0, 50000)
In this case, the nodes receive only the inner query and execute. This means the master will receive only 20x10K=200K records. The master will take all the records received, reorder them by the OUTER clause and return the best 50K records. The outer select helps reducing the traffic between the master and the nodes and also reduce the master’s computation time (as it process only 200K instead of 1M).
func (*Search) SetQueryFlags ¶
SetQueryFlags set query flags. New flags are |-red to existing value, previously set flags are not affected. Note that default flags has set QflagNormalizedTfIdf bit, so if you need to reset it, you need to explicitly invoke ChangeQueryFlags(QflagNormalizedTfIdf,false) for it.
func (*Search) SetRankingExpression ¶
SetRankingExpression assigns ranking expression, and also set ranking mode to RankExpr
`rankexpr` provides ranking formula, for example, "sum(lcs*user_weight)*1000+bm25" - this is the same as RankProximityBm25, but written explicitly. Since using ranking expression assumes RankExpr ranker, it is also set by this function.
func (*Search) SetRankingMode ¶
SetRankingMode assigns ranking mode and also adjust MatchMode to MatchExtended2 (since otherwise rankers are useless)
func (*Search) SetSortMode ¶
func (q *Search) SetSortMode(sort ESortOrder, sortby ...string)
SetSortMode sets matches sorting mode
`sort` determines sorting mode.
`sortby` determines attribute or expression used for sorting.
If `sortby` set in Search query is empty (it is not necessary set in this very call, it might be set earlier!), then `sort` is explicitly set as SortRelevance
func (*Search) SetTokenFilter ¶
SetTokenFilter setups UDF token filter
`library` is the name of plugin library, as "mylib.so"
`name` is the name of token filtering function in the library, as "email_process"
`opts` is string parameters which passed to udf filter, like "field=email;split=.io". Format of the options determined by UDF plugin.
type SearchPqOptions ¶
SearchPqOptions incapsulates params to be passed to CallPq function.
Flags ¶
Flags is instance of Pqflags, different bites described there.
IdAlias ¶
IdAlias determines name of the field in supplied json documents, which contain DocumentID. If NeedDocs flag is set, this value will be used in resultset to identify documents instead of just plain numbers of them.
Shift ¶
Shift is used if daemon returns order number of the documents (i.e. when NeedDoc flag is set, but no IdAlias provided, or if documents are just plain texts and can't contain such field at all). Shift then is just added to every number of the doc, helping move the whole range. Say, if you provide 2 documents, they may be returned as numbers 1 and 2. Buf if you also give Shift=100, they will became 101 and 102. It may help if you distribute bit docset over several instances and want to keep the numbers. Daemon itself uses this value for the same purpose.
func NewSearchPqOptions ¶
func NewSearchPqOptions() SearchPqOptions
NewSearchPqOptions creates empty instance of search options. Prefer to use this function when you need options, since it may set necessary defaults
type SearchPqResponse ¶
type SearchPqResponse = struct { Flags PqResponseFlags TmTotal time.Duration // total time spent for matching the document(s) TmSetup time.Duration // time spent to initial setup of matching process - parsing docs, setting options, etc. QueriesMatched int // how many stored queries match the document(s) QueriesFailed int // number of failed queries DocsMatched int // how many times the documents match the queries stored in the index TotalQueries int // how many queries are stored in the index at all OnlyTerms int // how many queries in the index have terms. The rest of the queries have extended query syntax EarlyOutQueries int // num of queries which wasn’t fall into full routine, but quickly matched and rejected with filters or other conditions QueryDT []int // detailed times per each query Warnings string Queries []QueryDesc // queries themselve. See QueryDesc structure for details }
SearchPqResponse represents whole response to CallPQ and CallPQBson calls
type SnippetOptions ¶
type SnippetOptions struct { BeforeMatch, AfterMatch, ChunkSeparator, HtmlStripMode, PassageBoundary string Limit, LimitPassages, LimitWords, Around, StartPassageId int32 Flags ExcerptFlags }
SnippetOptions used to tune snippet's generation. All fields are exported and have meaning described below.
BeforeMatch ¶
A string to insert before a keyword match. A '%PASSAGE_ID%' macro can be used in this string. The first match of the macro is replaced with an incrementing passage number within a current snippet. Numbering starts at 1 by default but can be overridden with start_passage_id option. In a multi-document call, '%PASSAGE_ID%' would restart at every given document.
AfterMatch ¶
A string to insert after a keyword match. %PASSAGE_ID% macro can be used in this string.
ChunkSeparator ¶
A string to insert between snippet chunks (passages).
HtmlStripMode ¶
HTML stripping mode setting. Possible values are `index`, which means that index settings will be used, `none` and `strip`, that forcibly skip or apply stripping irregardless of index settings; and `retain`, that retains HTML markup and protects it from highlighting. The retain mode can only be used when highlighting full documents and thus requires that no snippet size limits are set. String, allowed values are none, strip, index, and retain.
PassageBoundary ¶
Ensures that passages do not cross a sentence, paragraph, or zone boundary (when used with an index that has the respective indexing settings enabled). Allowed values are `sentence`, `paragraph`, and `zone`.
Limit ¶
Maximum snippet size, in runes (codepoints).
LimitPassages ¶
Limits the maximum number of passages that can be included into the snippet.
LimitWords ¶
Limits the maximum number of words that can be included into the snippet. Note the limit applies to any words, and not just the matched keywords to highlight. For example, if we are highlighting Mary and a passage Mary had a little lamb is selected, then it contributes 5 words to this limit, not just 1
Around ¶
How much words to pick around each matching keywords block.
StartPassageId ¶
Specifies the starting value of `%PASSAGE_ID%` macro (that gets detected and expanded in before_match, after_match strings).
Flags ¶
Bitmask. Individual bits described in `type ExcerptFlags` constants.
func NewSnippetOptions ¶
func NewSnippetOptions() *SnippetOptions
Create default SnippetOptions with following defaults:
BeforeMatch: "<b>" AfterMatch: "</b>" ChunkSeparator: " ... " HtmlStripMode: "index" PassageBoundary: "none" Limit: 256 Around: 5 StartPassageId: 1 // Rest of the fields: 0, or "" (depends from type)
type SqlResultset ¶
type SqlResultset [][]interface{}
SqlResultset returned from Sphinxql and contains one or more mysql resultsets