Metrics Reference
rabbitmq-backup exposes Prometheus metrics via an HTTP endpoint. Metrics are collected using the prometheus-client crate (version 0.24) and served in Prometheus text exposition format.
Enabling Metrics
Add a metrics section to your configuration file:
metrics:
enabled: true
port: 8080
bind_address: "0.0.0.0"
path: /metrics
The metrics server starts automatically when the backup or restore command runs. It serves the following endpoints:
| Endpoint | Content-Type | Description |
|---|---|---|
GET /metrics | text/plain; version=0.0.4 | Prometheus metrics in text exposition format. |
GET /health | application/json | Health check: {"status":"ok"}. |
GET /healthz | application/json | Same as /health (Kubernetes convention). |
GET / | text/html | HTML index page with links to metrics and health endpoints. |
Backup Metrics
rabbitmq_backup_messages_read
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total number of messages read during backup. |
| Labels | queue, vhost, queue_type |
Incremented for each message successfully consumed from a queue (AMQP) or stream.
Example:
# HELP rabbitmq_backup_messages_read Total messages read during backup
# TYPE rabbitmq_backup_messages_read counter
rabbitmq_backup_messages_read{queue="orders",vhost="/",queue_type="classic"} 1542
rabbitmq_backup_messages_read{queue="payments",vhost="/",queue_type="quorum"} 823
rabbitmq_backup_messages_read{queue="events",vhost="/",queue_type="stream"} 10000
rabbitmq_backup_bytes_read
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total bytes read during backup (uncompressed message payload sizes). |
| Labels | queue, vhost, queue_type |
Example:
rabbitmq_backup_bytes_read{queue="orders",vhost="/",queue_type="classic"} 2097152
rabbitmq_backup_segments_written
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total number of segments finalized and written to storage. |
| Labels | queue, vhost, queue_type |
Incremented each time a segment is finalized (either by reaching the size threshold segment_max_bytes, the time threshold segment_max_interval_ms, or when the backup completes).
Example:
rabbitmq_backup_segments_written{queue="orders",vhost="/",queue_type="classic"} 2
rabbitmq_backup_segments_written{queue="payments",vhost="/",queue_type="quorum"} 1
rabbitmq_backup_segments_bytes
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total bytes written to storage (compressed segment sizes). |
| Labels | queue, vhost, queue_type |
Example:
rabbitmq_backup_segments_bytes{queue="orders",vhost="/",queue_type="classic"} 524288
rabbitmq_backup_checkpoint_syncs
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total number of checkpoint sync operations (local SQLite to remote storage). |
| Labels | (none) |
Example:
rabbitmq_backup_checkpoint_syncs 15
rabbitmq_backup_errors
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total errors by type and queue. |
| Labels | queue, vhost, error_type |
The error_type label classifies errors into one of:
| error_type | Description |
|---|---|
amqp | AMQP 0-9-1 protocol errors. |
stream | Stream protocol errors. |
storage | Storage backend read/write errors. |
serialization | JSON/segment serialization errors. |
connection | TCP connection failures. |
authentication | AMQP or Management API auth failures. |
Example:
rabbitmq_backup_errors{queue="orders",vhost="/",error_type="amqp"} 1
rabbitmq_backup_errors{queue="payments",vhost="/",error_type="storage"} 0
Restore Metrics
rabbitmq_restore_messages_published
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total messages published to the target cluster during restore. |
| Labels | queue, vhost, queue_type |
Example:
rabbitmq_restore_messages_published{queue="orders",vhost="/",queue_type="classic"} 1542
rabbitmq_restore_messages_confirmed
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total messages confirmed by the target broker (publisher confirms). |
| Labels | queue, vhost, queue_type |
Only meaningful when restore.publisher_confirms: true.
Example:
rabbitmq_restore_messages_confirmed{queue="orders",vhost="/",queue_type="classic"} 1542
rabbitmq_restore_messages_failed
| Property | Value |
|---|---|
| Type | Counter |
| Description | Total messages that failed to publish during restore. |
| Labels | queue, vhost, queue_type |
Example:
rabbitmq_restore_messages_failed{queue="orders",vhost="/",queue_type="classic"} 0
Connection Metrics
rabbitmq_backup_amqp_connections_active
| Property | Value |
|---|---|
| Type | Gauge |
| Description | Number of currently active AMQP connections. |
| Labels | (none) |
Incremented when a new AMQP connection is established, decremented when closed. During backup, each queue uses its own connection.
Example:
rabbitmq_backup_amqp_connections_active 4
rabbitmq_backup_stream_connections_active
| Property | Value |
|---|---|
| Type | Gauge |
| Description | Number of currently active stream protocol connections. |
| Labels | (none) |
Example:
rabbitmq_backup_stream_connections_active 1
Label Reference
QueueLabels
Used by per-queue backup and restore metrics.
| Label | Description | Example Values |
|---|---|---|
queue | Queue or stream name | orders, payments, events-stream |
vhost | RabbitMQ virtual host | /, production, staging |
queue_type | Queue type | classic, quorum, stream |
ErrorLabels
Used by the error counter.
| Label | Description | Example Values |
|---|---|---|
queue | Queue name where the error occurred | orders |
vhost | Virtual host | / |
error_type | Error classification | amqp, stream, storage, serialization, connection, authentication |
Prometheus Scrape Configuration
Add a scrape job to your prometheus.yml:
scrape_configs:
- job_name: 'rabbitmq-backup'
scrape_interval: 15s
static_configs:
- targets: ['rabbitmq-backup-host:8080']
labels:
environment: 'production'
cluster: 'rabbitmq-prod'
For Kubernetes with Prometheus Operator, add pod annotations:
apiVersion: v1
kind: Pod
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
Or use a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rabbitmq-backup
spec:
selector:
matchLabels:
app: rabbitmq-backup
endpoints:
- port: metrics
interval: 15s
path: /metrics
Grafana Dashboard Setup
Importing a Dashboard
Create a Grafana dashboard with the following panels. The examples below use PromQL queries.
Recommended Panels
Backup Progress: Messages Per Queue
Panel type: Time series
rate(rabbitmq_backup_messages_read[5m])
Legend: {{queue}} ({{vhost}})
Shows the backup throughput in messages per second, broken down by queue.
Backup Progress: Bytes Written
Panel type: Time series
rate(rabbitmq_backup_segments_bytes[5m])
Legend: {{queue}}
Total Messages Backed Up
Panel type: Stat
sum(rabbitmq_backup_messages_read)
Compression Ratio
Panel type: Stat
1 - (sum(rabbitmq_backup_segments_bytes) / sum(rabbitmq_backup_bytes_read))
Unit: Percent (0-1)
Shows how effective compression is across all queues.
Segments Written
Panel type: Bar gauge
rabbitmq_backup_segments_written
Legend: {{queue}}
Active Connections
Panel type: Gauge
rabbitmq_backup_amqp_connections_active + rabbitmq_backup_stream_connections_active
Error Rate
Panel type: Time series
rate(rabbitmq_backup_errors[5m])
Legend: {{queue}} - {{error_type}}
Use an alert threshold here: any non-zero error rate warrants investigation.
Restore Progress
Panel type: Time series
rate(rabbitmq_restore_messages_published[5m])
Legend: {{queue}}
Restore Failures
Panel type: Stat (red if > 0)
sum(rabbitmq_restore_messages_failed)
Checkpoint Sync Rate
Panel type: Time series
rate(rabbitmq_backup_checkpoint_syncs[5m])
Example Dashboard JSON
You can import this dashboard JSON into Grafana. Save as rabbitmq-backup-dashboard.json:
{
"dashboard": {
"title": "RabbitMQ Backup & Restore",
"tags": ["rabbitmq", "backup"],
"timezone": "browser",
"panels": [
{
"title": "Backup: Messages Read (rate/sec)",
"type": "timeseries",
"targets": [
{
"expr": "rate(rabbitmq_backup_messages_read[5m])",
"legendFormat": "{{queue}} ({{vhost}})"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"title": "Backup: Bytes Written (rate/sec)",
"type": "timeseries",
"targets": [
{
"expr": "rate(rabbitmq_backup_segments_bytes[5m])",
"legendFormat": "{{queue}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"title": "Active Connections",
"type": "gauge",
"targets": [
{
"expr": "rabbitmq_backup_amqp_connections_active",
"legendFormat": "AMQP"
},
{
"expr": "rabbitmq_backup_stream_connections_active",
"legendFormat": "Stream"
}
],
"gridPos": {"h": 8, "w": 6, "x": 0, "y": 8}
},
{
"title": "Errors",
"type": "timeseries",
"targets": [
{
"expr": "rate(rabbitmq_backup_errors[5m])",
"legendFormat": "{{queue}} - {{error_type}}"
}
],
"gridPos": {"h": 8, "w": 6, "x": 6, "y": 8}
},
{
"title": "Restore: Messages Published (rate/sec)",
"type": "timeseries",
"targets": [
{
"expr": "rate(rabbitmq_restore_messages_published[5m])",
"legendFormat": "{{queue}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
}
]
}
}
Alerting Rules
Example Prometheus alerting rules:
groups:
- name: rabbitmq-backup
rules:
- alert: RabbitMQBackupErrors
expr: rate(rabbitmq_backup_errors[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup errors detected"
description: "Queue {{ $labels.queue }} in vhost {{ $labels.vhost }} has {{ $labels.error_type }} errors."
- alert: RabbitMQBackupNoProgress
expr: rate(rabbitmq_backup_messages_read[10m]) == 0 AND rabbitmq_backup_amqp_connections_active > 0
for: 10m
labels:
severity: warning
annotations:
summary: "RabbitMQ backup not making progress"
description: "No messages have been read in the last 10 minutes despite active connections."
- alert: RabbitMQRestoreFailures
expr: rabbitmq_restore_messages_failed > 0
for: 1m
labels:
severity: critical
annotations:
summary: "RabbitMQ restore has failed messages"
description: "{{ $value }} messages failed to publish during restore."