databricksreceiver

package
v0.95.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

README

Databricks Receiver

Status
Stability development: metrics
Distributions []
Issues Open issues Closed issues

The Databricks Receiver uses the Databricks API to generate metrics about the operation of a Databricks instance.

In addition to generating metrics from the Databricks API, it also generates metrics from the Spark subsystem running in a Databricks instance.

Supported pipeline types: metrics

🚧 This receiver is in DEVELOPMENT. Behavior, configuration fields, and metric data model are subject to change.

Configuration

The following fields are required:

  • instance_name: A string representing the name of the instance. This value gets set as a databricks.instance.name resource attribute.
  • endpoint: The URL containing a protocol (http or https), hostname, and (optional) port of the Databricks API, without a trailing slash.
  • token: An access token to authenticate to the Databricks API.
  • spark_org_id: The Spark Org ID. See the Spark Subsystem Configuration section below for how to get this value.
  • spark_endpoint: The URL containing a protocol (http or https), hostname, and (optional) port of the Spark API. See the Spark Subsystem Configuration section below for how to get this value.
  • spark_ui_port: A number representing the Spark UI Port (typically 40001). See the Spark Subsystem Configuration section below for how to get this value.

The following fields are optional:

  • collection_interval: How often this receiver fetches information from the Databricks API. Must be a string readable by time.ParseDuration. Defaults to 30s.
  • max_results: The maximum number of items to return per API call. Defaults to 25 which is the maximum value. If set explicitly, the API requires a value greater than 0 and less than or equal to 25.
Example
databricks:
  instance_name: my-instance
  endpoint: https://dbr.example.net
  token: abc123
  spark_org_id: 1234567890
  spark_endpoint: https://spark.example.net
  spark_ui_port: 40001
  collection_interval: 10s
Spark Subsystem Configuration

To get the configuration parameters this receiver will need to get Spark metrics, run the following Scala notebook and copy its output values into your config:

%scala
val sparkOrgId = spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")
val sparkEndpoint = dbutils.notebook.getContext.apiUrl.get
val sparkUiPort = spark.conf.get("spark.ui.port")

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() receiver.Factory

Types

type Config

type Config struct {
	InstanceName                            string `mapstructure:"instance_name"`
	Token                                   string `mapstructure:"token"`
	SparkOrgID                              string `mapstructure:"spark_org_id"`
	SparkEndpoint                           string `mapstructure:"spark_endpoint"`
	confighttp.HTTPClientSettings           `mapstructure:",squash"`
	scraperhelper.ScraperControllerSettings `mapstructure:",squash"`
	MaxResults                              int                           `mapstructure:"max_results"`
	SparkUIPort                             int                           `mapstructure:"spark_ui_port"`
	MetricsBuilderConfig                    metadata.MetricsBuilderConfig `mapstructure:"squash,"`
}

func (*Config) Validate added in v0.71.0

func (c *Config) Validate() error

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL