Monitoring Setup
rabbitmq-backup exposes Prometheus metrics via an HTTP endpoint. This guide covers enabling metrics, scraping with Prometheus, visualizing in Grafana, and setting up alerts.
Enable Prometheus Metrics
Add the metrics section to your configuration file:
backup.yaml
metrics:
enabled: true
port: 8080
bind_address: "0.0.0.0"
path: /metrics
| Field | Default | Description |
|---|---|---|
enabled | true | Enable the metrics HTTP server |
port | 8080 | Port for the metrics endpoint |
bind_address | 0.0.0.0 | Network interface to bind to |
path | /metrics | HTTP path for the metrics endpoint |
When the backup runs, the metrics endpoint is available at http://<host>:8080/metrics.
Available Metrics
Backup Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
rabbitmq_backup_messages_read | Counter | queue, vhost, queue_type | Total messages read during backup |
rabbitmq_backup_bytes_read | Counter | queue, vhost, queue_type | Total bytes read during backup |
rabbitmq_backup_segments_written | Counter | queue, vhost, queue_type | Total segments written |
rabbitmq_backup_segments_bytes | Counter | queue, vhost, queue_type | Total segment bytes written |
rabbitmq_backup_checkpoint_syncs | Counter | -- | Total checkpoint sync operations |
rabbitmq_backup_errors | Counter | queue, vhost, error_type | Total errors by type |
Restore Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
rabbitmq_restore_messages_published | Counter | queue, vhost, queue_type | Total messages published during restore |
rabbitmq_restore_messages_confirmed | Counter | queue, vhost, queue_type | Total messages confirmed by broker |
rabbitmq_restore_messages_failed | Counter | queue, vhost, queue_type | Total messages that failed to publish |
Connection Metrics
| Metric | Type | Description |
|---|---|---|
rabbitmq_backup_amqp_connections_active | Gauge | Active AMQP connections |
rabbitmq_backup_stream_connections_active | Gauge | Active Stream Protocol connections |
Configure Prometheus
Add a scrape target for rabbitmq-backup:
prometheus.yml
scrape_configs:
- job_name: 'rabbitmq-backup'
scrape_interval: 15s
static_targets:
- targets: ['rabbitmq-backup-host:8080']
Kubernetes (PodMonitor)
If you use the Prometheus Operator in Kubernetes:
podmonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: rabbitmq-backup
namespace: rabbitmq-backup
spec:
selector:
matchLabels:
app: rabbitmq-backup
podMetricsEndpoints:
- port: metrics
path: /metrics
interval: 15s
Grafana Dashboard
Import the following panels into a Grafana dashboard to monitor backup and restore operations.
Backup Progress (Messages Read Over Time)
rate(rabbitmq_backup_messages_read[5m])
Backup Throughput (Bytes/sec)
rate(rabbitmq_backup_bytes_read[5m])
Segments Written by Queue
increase(rabbitmq_backup_segments_written[1h])
Compression Ratio
1 - (rate(rabbitmq_backup_segments_bytes[5m]) / rate(rabbitmq_backup_bytes_read[5m]))
Error Rate
rate(rabbitmq_backup_errors[5m])
Restore Progress
rate(rabbitmq_restore_messages_published[5m])
Confirm Rate (Restore)
rate(rabbitmq_restore_messages_confirmed[5m]) / rate(rabbitmq_restore_messages_published[5m])
Alerting Rules
Add these Prometheus alerting rules to detect backup failures and anomalies:
rabbitmq-backup-alerts.yaml
groups:
- name: rabbitmq-backup
rules:
# Alert if backup errors exceed threshold
- alert: RabbitMQBackupErrors
expr: increase(rabbitmq_backup_errors[1h]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup errors detected"
description: >
{{ $value }} backup errors in the last hour
for queue {{ $labels.queue }} in vhost {{ $labels.vhost }}.
# Alert if no messages were read (backup may be stuck)
- alert: RabbitMQBackupStalled
expr: rate(rabbitmq_backup_messages_read[15m]) == 0
for: 30m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup appears stalled"
description: >
No messages read in the last 30 minutes.
Check the backup process logs.
# Alert if restore confirms are failing
- alert: RabbitMQRestoreFailures
expr: rate(rabbitmq_restore_messages_failed[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "RabbitMQ restore is failing to publish messages"
description: >
{{ $value }} messages/sec failing during restore
for queue {{ $labels.queue }}.
# Alert if no active connections (process may have crashed)
- alert: RabbitMQBackupNoConnections
expr: rabbitmq_backup_amqp_connections_active == 0
for: 10m
labels:
severity: critical
annotations:
summary: "No active AMQP connections for backup"
description: >
The backup process has no active AMQP connections.
It may have crashed or lost connectivity.
Verify Metrics
Manually curl the metrics endpoint to verify it is working:
curl http://localhost:8080/metrics
Expected output:
# HELP rabbitmq_backup_messages_read Total messages read during backup
# TYPE rabbitmq_backup_messages_read counter
rabbitmq_backup_messages_read{queue="orders-queue",vhost="/",queue_type="classic"} 1234
# HELP rabbitmq_backup_segments_written Total segments written
# TYPE rabbitmq_backup_segments_written counter
rabbitmq_backup_segments_written{queue="orders-queue",vhost="/",queue_type="classic"} 1
...