Failover Connector
Supported Pipeline Types
Allows for health based routing between trace, metric, and log pipelines depending on the health of target downstream exporters.
Configuration
If you are not already familiar with connectors, you may find it helpful to first visit the Connectors README.
The following settings are available:
priority_levels (required)
: list of pipeline level priorities in a 1 - n configuration, multiple pipelines can sit at a single priority level.
retry_interval (optional)
: the frequency at which the pipeline levels will attempt to reestablish connection with all higher priority levels. Default value is 10 minutes. (See Example below for further explanation)
retry_gap (optional)
: the amount of time between trying two separate priority levels in a single retry_interval timeframe. Default value is 30 seconds. (See Example below for further explanation)
max_retries (optional)
: the maximum retries per level. Default value is 10. Set to 0 to allow unlimited retries.
The connector intakes a list of priority_levels
each of which can contain multiple pipelines.
If any pipeline at a stable level fails, the level is considered unhealthy and the connector will move down one priority level and route all data to the new level (assuming it is stable).
The connector will periodically try to reestablish a stable connection with the higher priority levels. retry_interval
will be the frequency at which the connector will try to iterate through all unhealthy higher priority levels while retry_gap
is how long it will wait after a failed retry at one level before retrying the next level (if retry_gap is 2m, after trying to reestablish level 1, it will wait 2m before trying level 2) It will retry a maximum of one unhealthy level before returning to the current stable level.)
There is a max_retries
config param as well that will track how many retries have occurred at each level, and once the max is hit, it will no longer retry that priority level.
Configuration Example:
connectors:
failover:
priority_levels:
- [traces/first, traces/also_first]
- [traces/second]
- [traces/third]
retry_interval: 5m
retry_gap: 1m
max_retries: 10
service:
pipelines:
traces:
receivers: [otlp]
exporters: [failover]
traces/first:
receivers: [failover]
exporters: [otlp/first]
traces/second:
receivers: [failover]
exporters: [otlp/second]
traces/third:
receivers: [failover]
exporters: [otlp/third]
traces/also_first:
receivers: [failover]
exporters: [otlp/fourth]
Example with Explanation:
connectors:
failover:
priority_levels:
- [traces/first]
- [traces/second]
- [traces/third]
- [traces/fourth]
retry_interval: 5m
retry_gap: 1m
max_retries: 10
Assume the current stable level is level 4 (traces/fourth) on the priority_level list.
At the start of the retry_interval
, the connector will try to reestablish the pipeline on level 1 (trace/first). If it fails, the connector will return to level 4 (traces/fourth) and wait the 1m as the retry_gap
, when that 1m passes it will now retry level 2 (traces/second) and if that fails will first return to level 4 before waiting another 1m until trying level 3.
Once it tries level 3 and it fails, it will return to level 4 and wait the 10m retry_interval again before repeating the process. If a retry is successful then the retried level becomes the stable level, and the connector will continue to retry any higher priority levels that haven't exceeded the max_retries
.