Documentation
¶
Overview ¶
Create a Hugging Face inference endpoint.
Create an inference endpoint to perform an inference task with the `hugging_face` service. Supported tasks include: `text_embedding`, `completion`, and `chat_completion`.
To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.
For Elastic's `text_embedding` task: The selected model must support the `Sentence Embeddings` task. On the new endpoint creation page, select the `Sentence Embeddings` task under the `Advanced Configuration` section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models for `text_embedding` task:
* `all-MiniLM-L6-v2` * `all-MiniLM-L12-v2` * `all-mpnet-base-v2` * `e5-base-v2` * `e5-small-v2` * `multilingual-e5-base` * `multilingual-e5-small`
For Elastic's `chat_completion` and `completion` tasks: The selected model must support the `Text Generation` task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for `Text Generation`. When creating dedicated endpoint select the `Text Generation` task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use. Recommended models for `chat_completion` and `completion` tasks:
* `Mistral-7B-Instruct-v0.2` * `QwQ-32B` * `Phi-3-mini-128k-instruct`
For Elastic's `rerank` task: The selected model must support the `sentence-ranking` task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints for `Rerank` so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models for `rerank` task:
* `bge-reranker-base` * `jina-reranker-v1-turbo-en-GGUF`
Index ¶
- Variables
- type NewPutHuggingFace
- type PutHuggingFace
- func (r *PutHuggingFace) ChunkingSettings(chunkingsettings *types.InferenceChunkingSettings) *PutHuggingFace
- func (r PutHuggingFace) Do(providedCtx context.Context) (*Response, error)
- func (r *PutHuggingFace) ErrorTrace(errortrace bool) *PutHuggingFace
- func (r *PutHuggingFace) FilterPath(filterpaths ...string) *PutHuggingFace
- func (r *PutHuggingFace) Header(key, value string) *PutHuggingFace
- func (r *PutHuggingFace) HttpRequest(ctx context.Context) (*http.Request, error)
- func (r *PutHuggingFace) Human(human bool) *PutHuggingFace
- func (r PutHuggingFace) Perform(providedCtx context.Context) (*http.Response, error)
- func (r *PutHuggingFace) Pretty(pretty bool) *PutHuggingFace
- func (r *PutHuggingFace) Raw(raw io.Reader) *PutHuggingFace
- func (r *PutHuggingFace) Request(req *Request) *PutHuggingFace
- func (r *PutHuggingFace) Service(service huggingfaceservicetype.HuggingFaceServiceType) *PutHuggingFace
- func (r *PutHuggingFace) ServiceSettings(servicesettings *types.HuggingFaceServiceSettings) *PutHuggingFace
- func (r *PutHuggingFace) TaskSettings(tasksettings *types.HuggingFaceTaskSettings) *PutHuggingFace
- func (r *PutHuggingFace) Timeout(duration string) *PutHuggingFace
- type Request
- type Response
Constants ¶
This section is empty.
Variables ¶
var ErrBuildPath = errors.New("cannot build path, check for missing path parameters")
ErrBuildPath is returned in case of missing parameters within the build of the request.
Functions ¶
This section is empty.
Types ¶
type NewPutHuggingFace ¶
type NewPutHuggingFace func(tasktype, huggingfaceinferenceid string) *PutHuggingFace
NewPutHuggingFace type alias for index.
func NewPutHuggingFaceFunc ¶
func NewPutHuggingFaceFunc(tp elastictransport.Interface) NewPutHuggingFace
NewPutHuggingFaceFunc returns a new instance of PutHuggingFace with the provided transport. Used in the index of the library this allows to retrieve every apis in once place.
type PutHuggingFace ¶
type PutHuggingFace struct {
// contains filtered or unexported fields
}
func New ¶
func New(tp elastictransport.Interface) *PutHuggingFace
Create a Hugging Face inference endpoint.
Create an inference endpoint to perform an inference task with the `hugging_face` service. Supported tasks include: `text_embedding`, `completion`, and `chat_completion`.
To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.
For Elastic's `text_embedding` task: The selected model must support the `Sentence Embeddings` task. On the new endpoint creation page, select the `Sentence Embeddings` task under the `Advanced Configuration` section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models for `text_embedding` task:
* `all-MiniLM-L6-v2` * `all-MiniLM-L12-v2` * `all-mpnet-base-v2` * `e5-base-v2` * `e5-small-v2` * `multilingual-e5-base` * `multilingual-e5-small`
For Elastic's `chat_completion` and `completion` tasks: The selected model must support the `Text Generation` task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for `Text Generation`. When creating dedicated endpoint select the `Text Generation` task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use. Recommended models for `chat_completion` and `completion` tasks:
* `Mistral-7B-Instruct-v0.2` * `QwQ-32B` * `Phi-3-mini-128k-instruct`
For Elastic's `rerank` task: The selected model must support the `sentence-ranking` task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints for `Rerank` so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models for `rerank` task:
* `bge-reranker-base` * `jina-reranker-v1-turbo-en-GGUF`
https://www.elastic.co/guide/en/elasticsearch/reference/current/infer-service-hugging-face.html
func (*PutHuggingFace) ChunkingSettings ¶
func (r *PutHuggingFace) ChunkingSettings(chunkingsettings *types.InferenceChunkingSettings) *PutHuggingFace
ChunkingSettings The chunking configuration object. API name: chunking_settings
func (PutHuggingFace) Do ¶
func (r PutHuggingFace) Do(providedCtx context.Context) (*Response, error)
Do runs the request through the transport, handle the response and returns a puthuggingface.Response
func (*PutHuggingFace) ErrorTrace ¶
func (r *PutHuggingFace) ErrorTrace(errortrace bool) *PutHuggingFace
ErrorTrace When set to `true` Elasticsearch will include the full stack trace of errors when they occur. API name: error_trace
func (*PutHuggingFace) FilterPath ¶
func (r *PutHuggingFace) FilterPath(filterpaths ...string) *PutHuggingFace
FilterPath Comma-separated list of filters in dot notation which reduce the response returned by Elasticsearch. API name: filter_path
func (*PutHuggingFace) Header ¶
func (r *PutHuggingFace) Header(key, value string) *PutHuggingFace
Header set a key, value pair in the PutHuggingFace headers map.
func (*PutHuggingFace) HttpRequest ¶
HttpRequest returns the http.Request object built from the given parameters.
func (*PutHuggingFace) Human ¶
func (r *PutHuggingFace) Human(human bool) *PutHuggingFace
Human When set to `true` will return statistics in a format suitable for humans. For example `"exists_time": "1h"` for humans and `"eixsts_time_in_millis": 3600000` for computers. When disabled the human readable values will be omitted. This makes sense for responses being consumed only by machines. API name: human
func (PutHuggingFace) Perform ¶
Perform runs the http.Request through the provided transport and returns an http.Response.
func (*PutHuggingFace) Pretty ¶
func (r *PutHuggingFace) Pretty(pretty bool) *PutHuggingFace
Pretty If set to `true` the returned JSON will be "pretty-formatted". Only use this option for debugging only. API name: pretty
func (*PutHuggingFace) Raw ¶
func (r *PutHuggingFace) Raw(raw io.Reader) *PutHuggingFace
Raw takes a json payload as input which is then passed to the http.Request If specified Raw takes precedence on Request method.
func (*PutHuggingFace) Request ¶
func (r *PutHuggingFace) Request(req *Request) *PutHuggingFace
Request allows to set the request property with the appropriate payload.
func (*PutHuggingFace) Service ¶
func (r *PutHuggingFace) Service(service huggingfaceservicetype.HuggingFaceServiceType) *PutHuggingFace
Service The type of service supported for the specified task type. In this case, `hugging_face`. API name: service
func (*PutHuggingFace) ServiceSettings ¶
func (r *PutHuggingFace) ServiceSettings(servicesettings *types.HuggingFaceServiceSettings) *PutHuggingFace
ServiceSettings Settings used to install the inference model. These settings are specific to the `hugging_face` service. API name: service_settings
func (*PutHuggingFace) TaskSettings ¶ added in v8.19.0
func (r *PutHuggingFace) TaskSettings(tasksettings *types.HuggingFaceTaskSettings) *PutHuggingFace
TaskSettings Settings to configure the inference task. These settings are specific to the task type you specified. API name: task_settings
func (*PutHuggingFace) Timeout ¶ added in v8.19.0
func (r *PutHuggingFace) Timeout(duration string) *PutHuggingFace
Timeout Specifies the amount of time to wait for the inference endpoint to be created. API name: timeout
type Request ¶
type Request struct { // ChunkingSettings The chunking configuration object. ChunkingSettings *types.InferenceChunkingSettings `json:"chunking_settings,omitempty"` // Service The type of service supported for the specified task type. In this case, // `hugging_face`. Service huggingfaceservicetype.HuggingFaceServiceType `json:"service"` // ServiceSettings Settings used to install the inference model. These settings are specific to // the `hugging_face` service. ServiceSettings types.HuggingFaceServiceSettings `json:"service_settings"` // TaskSettings Settings to configure the inference task. // These settings are specific to the task type you specified. TaskSettings *types.HuggingFaceTaskSettings `json:"task_settings,omitempty"` }
Request holds the request body struct for the package puthuggingface
type Response ¶
type Response struct { // ChunkingSettings Chunking configuration object ChunkingSettings *types.InferenceChunkingSettings `json:"chunking_settings,omitempty"` // InferenceId The inference Id InferenceId string `json:"inference_id"` // Service The service type Service string `json:"service"` // ServiceSettings Settings specific to the service ServiceSettings json.RawMessage `json:"service_settings"` // TaskSettings Task settings specific to the service and task type TaskSettings json.RawMessage `json:"task_settings,omitempty"` // TaskType The task type TaskType tasktypehuggingface.TaskTypeHuggingFace `json:"task_type"` }
Response holds the response body struct for the package puthuggingface