Skip to main content

Monitoring Setup

rabbitmq-backup exposes Prometheus metrics via an HTTP endpoint. This guide covers enabling metrics, scraping with Prometheus, visualizing in Grafana, and setting up alerts.

Enable Prometheus Metrics

Add the metrics section to your configuration file:

backup.yaml
metrics:
enabled: true
port: 8080
bind_address: "0.0.0.0"
path: /metrics
FieldDefaultDescription
enabledtrueEnable the metrics HTTP server
port8080Port for the metrics endpoint
bind_address0.0.0.0Network interface to bind to
path/metricsHTTP path for the metrics endpoint

When the backup runs, the metrics endpoint is available at http://<host>:8080/metrics.

Available Metrics

Backup Metrics

MetricTypeLabelsDescription
rabbitmq_backup_messages_readCounterqueue, vhost, queue_typeTotal messages read during backup
rabbitmq_backup_bytes_readCounterqueue, vhost, queue_typeTotal bytes read during backup
rabbitmq_backup_segments_writtenCounterqueue, vhost, queue_typeTotal segments written
rabbitmq_backup_segments_bytesCounterqueue, vhost, queue_typeTotal segment bytes written
rabbitmq_backup_checkpoint_syncsCounter--Total checkpoint sync operations
rabbitmq_backup_errorsCounterqueue, vhost, error_typeTotal errors by type

Restore Metrics

MetricTypeLabelsDescription
rabbitmq_restore_messages_publishedCounterqueue, vhost, queue_typeTotal messages published during restore
rabbitmq_restore_messages_confirmedCounterqueue, vhost, queue_typeTotal messages confirmed by broker
rabbitmq_restore_messages_failedCounterqueue, vhost, queue_typeTotal messages that failed to publish

Connection Metrics

MetricTypeDescription
rabbitmq_backup_amqp_connections_activeGaugeActive AMQP connections
rabbitmq_backup_stream_connections_activeGaugeActive Stream Protocol connections

Configure Prometheus

Add a scrape target for rabbitmq-backup:

prometheus.yml
scrape_configs:
- job_name: 'rabbitmq-backup'
scrape_interval: 15s
static_targets:
- targets: ['rabbitmq-backup-host:8080']

Kubernetes (PodMonitor)

If you use the Prometheus Operator in Kubernetes:

podmonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: rabbitmq-backup
namespace: rabbitmq-backup
spec:
selector:
matchLabels:
app: rabbitmq-backup
podMetricsEndpoints:
- port: metrics
path: /metrics
interval: 15s

Grafana Dashboard

Import the following panels into a Grafana dashboard to monitor backup and restore operations.

Backup Progress (Messages Read Over Time)

rate(rabbitmq_backup_messages_read[5m])

Backup Throughput (Bytes/sec)

rate(rabbitmq_backup_bytes_read[5m])

Segments Written by Queue

increase(rabbitmq_backup_segments_written[1h])

Compression Ratio

1 - (rate(rabbitmq_backup_segments_bytes[5m]) / rate(rabbitmq_backup_bytes_read[5m]))

Error Rate

rate(rabbitmq_backup_errors[5m])

Restore Progress

rate(rabbitmq_restore_messages_published[5m])

Confirm Rate (Restore)

rate(rabbitmq_restore_messages_confirmed[5m]) / rate(rabbitmq_restore_messages_published[5m])

Alerting Rules

Add these Prometheus alerting rules to detect backup failures and anomalies:

rabbitmq-backup-alerts.yaml
groups:
- name: rabbitmq-backup
rules:
# Alert if backup errors exceed threshold
- alert: RabbitMQBackupErrors
expr: increase(rabbitmq_backup_errors[1h]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup errors detected"
description: >
{{ $value }} backup errors in the last hour
for queue {{ $labels.queue }} in vhost {{ $labels.vhost }}.

# Alert if no messages were read (backup may be stuck)
- alert: RabbitMQBackupStalled
expr: rate(rabbitmq_backup_messages_read[15m]) == 0
for: 30m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup appears stalled"
description: >
No messages read in the last 30 minutes.
Check the backup process logs.

# Alert if restore confirms are failing
- alert: RabbitMQRestoreFailures
expr: rate(rabbitmq_restore_messages_failed[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "RabbitMQ restore is failing to publish messages"
description: >
{{ $value }} messages/sec failing during restore
for queue {{ $labels.queue }}.

# Alert if no active connections (process may have crashed)
- alert: RabbitMQBackupNoConnections
expr: rabbitmq_backup_amqp_connections_active == 0
for: 10m
labels:
severity: critical
annotations:
summary: "No active AMQP connections for backup"
description: >
The backup process has no active AMQP connections.
It may have crashed or lost connectivity.

Verify Metrics

Manually curl the metrics endpoint to verify it is working:

curl http://localhost:8080/metrics

Expected output:

# HELP rabbitmq_backup_messages_read Total messages read during backup
# TYPE rabbitmq_backup_messages_read counter
rabbitmq_backup_messages_read{queue="orders-queue",vhost="/",queue_type="classic"} 1234
# HELP rabbitmq_backup_segments_written Total segments written
# TYPE rabbitmq_backup_segments_written counter
rabbitmq_backup_segments_written{queue="orders-queue",vhost="/",queue_type="classic"} 1
...