Prometheus Receiver
The Prometheus Receiver receives metric data in Prometheus format.
See the Design for additional information on this receiver.
⚠️ Warning
Note: This component is currently work in progress. It has several limitations
and please don't use it if the following limitations are a concern:
- Collector cannot auto-scale the scraping yet when multiple replicas of the
collector are run.
- When running multiple replicas of the collector with the same config, it will
scrape the targets multiple times.
- Users need to configure each replica with different scraping configuration
if they want to manually shard the scraping.
- The Prometheus receiver is a stateful component.
Unsupported features
The Prometheus receiver is meant to minimally be a drop-in replacement for Prometheus. However,
there are advanced features of Prometheus that we don't support and thus explicitly will return
an error for if the receiver's configuration YAML/code contains any of the following
- alert_config.alertmanagers
- alert_config.relabel_configs
- remote_read
- remote_write
- rule_files
Getting Started
This receiver is a drop-in replacement for getting Prometheus to scrape your
services. It supports the full set of Prometheus configuration in scrape_config,
including service discovery. Just like you would write in a YAML configuration
file before starting Prometheus, such as with:
Note: Since the collector configuration supports env variable substitution
$ characters in your prometheus configuration are interpreted as environment
variables. If you want to use $ characters in your prometheus configuration,
you must escape them using $$.
prometheus --config.file=prom.yaml
You can copy and paste that same configuration under the config attribute:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['0.0.0.0:8888']
- job_name: k8s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: "true"
action: keep
metric_relabel_configs:
- source_labels: [__name__]
regex: "(request_duration_seconds.*|response_duration_seconds.*)"
action: keep
The prometheus receiver also supports additional top-level options:
- trim_metric_suffixes: [Experimental] When set to true, this enables trimming unit and some counter type suffixes from metric names. For example, it would cause
singing_duration_seconds_total to be trimmed to singing_duration. This can be useful when trying to restore the original metric names used in OpenTelemetry instrumentation. Defaults to false.
Example configuration:
receivers:
prometheus:
trim_metric_suffixes: true
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['0.0.0.0:8888']
Complete Configuration Example
The following example demonstrates a complete end-to-end configuration showing how to use the Prometheus receiver with processors and exporters in a service pipeline:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'my-service'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# Filter metrics to keep only those matching the regex pattern
metric_relabel_configs:
- source_labels: [__name__]
regex: 'http_request_duration_seconds.*'
action: keep
processors:
resource:
attributes:
# Note: service.name is automatically set by the prometheus receiver from job_name
- key: deployment.environment
value: production
action: upsert
exporters:
otlp_grpc:
endpoint: otel-collector:4317
tls:
insecure: false
# For local testing only you may set `insecure: true`, but avoid this in production.
sending_queue:
batch:
timeout: 10s
send_batch_size: 1000
prometheusremotewrite:
endpoint: https://prometheus:9090/api/v1/write
sending_queue:
batch:
timeout: 10s
send_batch_size: 1000
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [resource]
exporters: [otlp_grpc, prometheusremotewrite]
This configuration:
- Scrapes metrics from a service running on
localhost:9090 every 5 seconds
- Filters metrics to keep only those matching
http_request_duration_seconds.* using metric_relabel_configs
- Adds resource attributes (
deployment.environment) to all metrics (note: service.name is automatically set from the job name)
- Uses exporter-level batching via
sending_queue.batch to improve efficiency when multiple scrapes occur
- Exports metrics to both an OTLP endpoint and Prometheus remote write endpoint
Prometheus native histograms
Native histograms are a data type in Prometheus, for more information see the specification.
The Prometheus receiver automatically converts native histograms to OpenTelemetry exponential histograms. To enable scraping and ingestion of native histograms, you need to configure two things in your Prometheus scrape config:
- Enable native histogram scraping: Set
scrape_native_histograms: true (globally or per-job)
- Use the protobuf scrape protocol: Include
PrometheusProto in scrape_protocols (required until Prometheus supports native histograms over text formats)
receivers:
prometheus:
config:
global:
# Required: Include PrometheusProto to scrape native histograms
scrape_protocols: [ PrometheusProto, OpenMetricsText1.0.0, OpenMetricsText0.0.1, PrometheusText0.0.4 ]
# Enable native histogram scraping globally
scrape_native_histograms: true
scrape_configs:
- job_name: 'my-app'
# Per-job setting takes precedence over global
# scrape_native_histograms: true
static_configs:
- targets: ['localhost:8080']
This feature applies to the most common integer counter histograms; gauge histograms are dropped.
In case a metric has both the conventional (aka classic) buckets and also native histogram buckets, only the native histogram buckets will be
taken into account to create the corresponding exponential histogram. To scrape the classic buckets instead use the
scrape option always_scrape_classic_histograms.
OpenTelemetry Operator
Additional to this static job definitions this receiver allows to query a list of jobs from the
OpenTelemetryOperators TargetAllocator or a compatible endpoint.
receivers:
prometheus:
target_allocator:
endpoint: http://my-targetallocator-service
interval: 30s
collector_id: collector-1
The target_allocator section embeds the full confighttp client configuration.
Exemplars
This receiver accepts exemplars coming in Prometheus format and converts it to OTLP format.
- Value is expected to be received in
float64 format
- Timestamp is expected to be received in
ms
- Labels with key
span_id in prometheus exemplars are set as OTLP span id and labels with key trace_id are set as trace id
- Rest of the labels are copied as it is to OTLP format
Resource and Scope
This receiver drops the target_info prometheus metric, if present, and uses attributes on
that metric to populate the OpenTelemetry Resource.
It drops otel_scope_name and otel_scope_version labels, if present, from metrics, and uses them to populate
the OpenTelemetry Instrumentation Scope name and version. It drops the otel_scope_info metric,
and uses attributes (other than otel_scope_name and otel_scope_version) to populate Scope
Attributes.
Prometheus API Server
The Prometheus API server can be enabled to host info about the Prometheus targets, config, service discovery, and metrics. The server_config can be specified using the OpenTelemetry confighttp package. An example configuration would be:
receivers:
prometheus:
api_server:
enabled: true
server_config:
endpoint: "localhost:9090"
The API server hosts the same paths as the Prometheus agent-mode API. These include:
More info about querying /api/v1/ and the data format that is returned can be found in the Prometheus documentation.
Feature gates
See documentation.md for the complete list of feature gates supported by this receiver.
Feature gates can be enabled using the --feature-gates flag:
"--feature-gates=<feature-gate>"
Benchmark Results
Current Prometheus receiver benchmark results are published on the Collector Benchmarks page. The table below links directly to the current Prometheus receiver charts by scenario and metric type.
Troubleshooting and Best Practices
This section provides guidance for common issues, performance optimization, and best practices when using the Prometheus receiver in production environments.
Common Issues and Solutions
Metrics Not Appearing
Symptoms: Metrics are not being scraped or exported despite correct configuration.
Possible Causes and Solutions:
-
Target Not Reachable
- Verify network connectivity between the collector and target endpoints
- Check firewall rules and security groups
- Test connectivity using
curl or wget to the target's metrics endpoint
-
Incorrect Scrape Configuration
- Verify
scrape_configs syntax matches Prometheus format
- Check that
targets are correctly formatted (e.g., ['hostname:port'])
- Ensure
job_name is unique and descriptive
-
Metric Filtering Too Aggressive
- Review
metric_relabel_configs to ensure desired metrics are not being dropped
- Temporarily remove filters to verify metrics are being scraped
- Use the Prometheus API server (if enabled) to inspect active targets
-
Service Discovery Not Working
- For Kubernetes service discovery, verify RBAC permissions for service account
- Check that service discovery configurations match your environment
- Review collector logs for service discovery errors
Debugging Steps:
# Enable the Prometheus API server to inspect targets
receivers:
prometheus:
api_server:
enabled: true
server_config:
endpoint: "localhost:9090"
Then query /api/v1/targets to see target status and any scrape errors.
- Enable debug logs: You can also enable debug-level logs in the collector to see detailed scrape errors in logs:
service:
telemetry:
logs:
level: debug # Use with caution in production
This will surface detailed scrape errors and help diagnose connectivity or configuration issues.
High CPU Usage
Symptoms: Collector consuming excessive CPU resources, especially with high metric volumes.
Solutions:
-
Optimize Scrape Intervals
- Increase
scrape_interval for less critical metrics
- Use different intervals for different jobs based on metric importance
- Consider using
scrape_timeout to prevent long-running scrapes
-
Reduce Metric Volume
- Use
metric_relabel_configs to drop unnecessary metrics at scrape time
- Filter metrics before they enter the pipeline to reduce processing overhead
- Consider using the
filter processor for more complex filtering logic
-
Disable Expensive Features
- Avoid enabling
receiver.prometheusreceiver.EnableCreatedTimestampZeroIngestion unless necessary (known CPU impact)
- Use exporter-level batching to reduce export frequency
- Consider disabling extra scrape metrics if not needed
Example Configuration:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'high-frequency'
scrape_interval: 30s # Increased from default
scrape_timeout: 10s # Prevent hanging scrapes
metric_relabel_configs:
# Drop verbose metrics to reduce volume
- source_labels: [__name__]
regex: 'go_.*'
action: drop
Memory Issues
Symptoms: Collector running out of memory, especially with many targets or long scrape intervals.
Solutions:
-
Limit Target Count
- Use service discovery filters to reduce number of targets
- Implement manual sharding across multiple collector instances
- Use TargetAllocator for automatic sharding in Kubernetes
-
Optimize Batch Processing
- Configure exporter-level batching with appropriate
send_batch_size and timeout via sending_queue.batch
- Balance between memory usage (smaller batches) and efficiency (larger batches)
-
Monitor Memory Usage
- Enable the
memory_limiter processor to prevent OOM conditions
- Set appropriate memory limits based on your metric volume
Example Configuration:
processors:
memory_limiter:
limit_mib: 512
check_interval: 1s
exporters:
otlp:
endpoint: otel-collector:4317
sending_queue:
batch:
timeout: 10s
send_batch_size: 1000 # Adjust based on memory constraints
Best Practices for Production
Multi-Replica Deployments
When running multiple collector replicas, manually shard scraping to avoid duplicate metrics:
Option 1: Manual Sharding by Job
# Collector Replica 1
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'service-a'
static_configs:
- targets: ['service-a:9090']
- job_name: 'service-b'
static_configs:
- targets: ['service-b:9090']
# Collector Replica 2
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'service-c'
static_configs:
- targets: ['service-c:9090']
- job_name: 'service-d'
static_configs:
- targets: ['service-d:9090']
Option 2: Use TargetAllocator (Recommended for Kubernetes)
receivers:
prometheus:
target_allocator:
endpoint: http://targetallocator-service:8080
interval: 30s
collector_id: ${HOSTNAME} # Unique per replica
-
Scrape Interval Tuning
- Critical metrics: 5-15 seconds
- Standard metrics: 30-60 seconds
- Low-priority metrics: 2-5 minutes
-
Metric Filtering Strategy
- Filter at scrape time using
metric_relabel_configs (most efficient)
- Use
filter processor for complex logic
- Avoid filtering in exporters when possible
-
Resource Management
- Always use
memory_limiter processor in production
- Configure appropriate resource limits in Kubernetes
- Monitor collector metrics (CPU, memory, scrape duration)
Example Production Configuration:
receivers:
prometheus:
config:
global:
scrape_interval: 30s
scrape_timeout: 10s
scrape_configs:
- job_name: 'critical-services'
scrape_interval: 15s
static_configs:
- targets: ['service1:9090', 'service2:9090']
metric_relabel_configs:
- source_labels: [__name__]
regex: 'http_request_duration_seconds.*|http_request_total'
action: keep
processors:
memory_limiter:
limit_mib: 1024
check_interval: 1s
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
exporters:
otlp_grpc:
endpoint: otel-collector:4317
tls:
insecure: false
ca_file: /etc/ssl/certs/ca-certificates.crt
sending_queue:
batch:
timeout: 10s
send_batch_size: 2000
Monitoring the Receiver
Monitor the Prometheus receiver itself to ensure it's operating correctly:
-
Enable Extra Scrape Metrics
- In the Prometheus config set
extra_scrape_metrics to true in the global section.
-
Key Metrics to Monitor:
prometheus_receiver_scrapes_total: Total number of scrapes
prometheus_receiver_scrape_errors_total: Number of failed scrapes
prometheus_receiver_target_scrapes_exceeded_timeout_total: Timeouts
- Collector's internal metrics (CPU, memory, pipeline metrics)
-
Set Up Alerts:
- Alert on high scrape error rates
- Alert on scrape timeouts
- Alert on collector memory/CPU usage
Security Considerations
-
TLS Configuration
- Always use TLS for exporter endpoints in production
- Use proper certificate management
- Consider using mTLS for enhanced security
-
Network Security
- Restrict network access to metrics endpoints
- Use service meshes or network policies to limit exposure
- Consider using authentication for sensitive metrics endpoints
-
Configuration Security
- Avoid hardcoding credentials in configuration files
- Use environment variable substitution for sensitive values
- Implement proper secret management (e.g., Kubernetes secrets)
Debugging Tips
-
Enable Verbose Logging
service:
telemetry:
logs:
level: debug # Use with caution in production
-
Use Prometheus API Server
- Enable API server to inspect targets, config, and scrape pools
- Query
/api/v1/targets to see target health
- Check
/api/v1/status/config to verify configuration
-
Test Configuration
- Validate YAML syntax before deployment
- Test with a single job first, then expand
- Use
otelcol with --dry-run flag if available
-
Check Collector Logs
- Look for scrape errors, timeouts, or connection issues
- Monitor for memory or CPU warnings
- Review service discovery logs for Kubernetes deployments
Additional Resources