Point-in-Time Recovery (PITR)
rabbitmq-backup supports Point-in-Time Recovery (PITR), allowing you to restore only the messages that were backed up within a specific time window. This is useful for disaster recovery, data correction, and selective replay scenarios.
How PITR Works
PITR in rabbitmq-backup is built on three foundations:
backed_up_attimestamp: Every message is stamped with the exact time it was captured during backup.- Segment-level timestamp range: Each segment header records the first and last
backed_up_attimestamps, enabling coarse-grained filtering without decompressing. - Record-level filtering: During restore, each record's
backed_up_atis compared against the configured time window.
The backed_up_at Timestamp
When a message is read from RabbitMQ during backup, the tool records the current UTC time in epoch milliseconds:
BackupRecord {
// ... message fields ...
backed_up_at: chrono::Utc::now().timestamp_millis(),
// ...
}
This timestamp represents when the message was captured by the backup tool, not when the message was originally published. This distinction is important:
| Timestamp | Source | Meaning |
|---|---|---|
backed_up_at | Set by backup tool | When this message was read from the queue and recorded in a segment. |
properties.timestamp | Set by publisher (optional) | AMQP timestamp property set by the original message publisher. May be null. |
Why backed_up_at Instead of properties.timestamp?
properties.timestampis optional -- many publishers do not set it. Using it for PITR would exclude messages without a timestamp.properties.timestampis application-controlled -- it could be set to any value (past, future, or epoch 0). It is not a reliable ordering indicator.backed_up_atis guaranteed -- every record has this timestamp, set at capture time by the backup tool.backed_up_atreflects backup ordering -- messages within a segment are ordered by capture time, making time-window filtering predictable.
Segment-Level Timestamps
Each RBAK segment header (32 bytes) includes the first and last backed_up_at timestamps:
Header bytes 16-23: First Timestamp (i64 LE, epoch ms)
Header bytes 24-31: Last Timestamp (i64 LE, epoch ms)
These timestamps are also stored in the manifest's SegmentMetadata:
{
"key": "backup-001/queues/_default/orders/segment-0001.zst",
"first_timestamp": 1712700000000,
"last_timestamp": 1712720000000,
"record_count": 1000
}
This enables two levels of optimization during restore:
- Segment-level skip: If a segment's
[first_timestamp, last_timestamp]range does not overlap with the restore time window, the entire segment can be skipped without downloading or decompressing it. - Record-level filter: Within a segment that overlaps the time window, individual records are filtered by their
backed_up_attimestamp.
Time Window Configuration
PITR is configured in the restore section of the YAML configuration:
restore:
time_window_start: 1712739600000 # 2025-04-10T10:00:00Z in epoch ms
time_window_end: 1712754000000 # 2025-04-10T14:00:00Z in epoch ms
Fields
| Field | Type | Default | Description |
|---|---|---|---|
time_window_start | i64 (epoch ms) | null | Include only records with backed_up_at >= time_window_start. If null, no start filter is applied. |
time_window_end | i64 (epoch ms) | null | Include only records with backed_up_at <= time_window_end. If null, no end filter is applied. |
Filter Combinations
time_window_start | time_window_end | Behavior |
|---|---|---|
null | null | No PITR filtering -- restore all messages. |
| Set | null | Restore messages from the start time onward. |
null | Set | Restore messages up to the end time. |
| Set | Set | Restore messages within the closed interval [start, end]. |
Filtering Algorithm
The filtering logic is implemented in restore/engine.rs:
fn should_include(record: &BackupRecord, opts: &RestoreOptions) -> bool {
let after_start = opts
.time_window_start
.is_none_or(|s| record.backed_up_at >= s);
let before_end = opts
.time_window_end
.is_none_or(|e| record.backed_up_at <= e);
after_start && before_end
}
The filter is applied after decompressing the segment and before publishing messages to the target broker:
Segment downloaded
↓
CRC32 verified
↓
Payload decompressed
↓
Records parsed from length-prefixed JSON
↓
┌─────────────────────────────────────┐
│ PITR Filter: should_include()? │
│ backed_up_at >= start? AND │
│ backed_up_at <= end? │
└─────────────────────────────────────┘
↓ included ↓ excluded
Published to target Counted as "skipped"
Restore Statistics
The restore engine tracks PITR filtering in its statistics:
INFO Queue orders restored: 542 published, 1000 skipped (PITR), 0 failed
INFO Restore complete: 542 restored, 1000 skipped, 0 failed (1 queues)
- restored: Messages that passed the PITR filter and were successfully published.
- skipped: Messages that were excluded by the PITR filter.
- failed: Messages that passed the filter but failed to publish.
Practical Usage Patterns
Scenario 1: Restore After a Bad Deployment
A faulty deployment at 14:30 UTC corrupted messages in the orders queue. You want to restore messages from before the deployment:
restore:
time_window_end: 1712761800000 # 2025-04-10T14:30:00Z
This restores all messages backed up before the deployment, discarding any captured after the corruption started.
Scenario 2: Replay a Specific Time Window
You need to reprocess messages from a 2-hour window for debugging:
restore:
time_window_start: 1712739600000 # 10:00 UTC
time_window_end: 1712746800000 # 12:00 UTC
queue_mapping:
orders: orders-replay # Restore to a separate queue
publish_mode: direct-to-queue
Scenario 3: Restore Everything Since Last Known Good
Your last known-good state was at 06:00 UTC. Restore everything from that point:
restore:
time_window_start: 1712725200000 # 06:00 UTC
Scenario 4: Dry Run to Count Messages in a Window
Before performing a real restore, check how many messages fall within your time window:
restore:
time_window_start: 1712739600000
time_window_end: 1712754000000
dry_run: true
Output:
Dry run summary:
Backup ID: prod-daily-2025-04-10
Queues: 3
Messages: 2621
Segments: 4
Size: 905216 bytes
PITR window: 1712739600000 - 1712754000000 (epoch ms)
Converting Human-Readable Dates to Epoch Milliseconds
The PITR configuration uses epoch milliseconds. Here are common ways to convert:
Using date (Linux/macOS)
# Convert to epoch milliseconds
date -d "2025-04-10T14:30:00Z" +%s000
# Output: 1712761800000
# macOS (BSD date)
date -j -f "%Y-%m-%dT%H:%M:%SZ" "2025-04-10T14:30:00Z" +%s000
Using Python
from datetime import datetime, timezone
dt = datetime(2025, 4, 10, 14, 30, 0, tzinfo=timezone.utc)
print(int(dt.timestamp() * 1000))
# Output: 1712761800000
Using JavaScript
new Date("2025-04-10T14:30:00Z").getTime()
// Output: 1712761800000
Limitations and Considerations
Timestamp Granularity
backed_up_at has millisecond granularity. Messages captured within the same millisecond have the same timestamp. This is normally not an issue because:
- Backup throughput is typically hundreds to thousands of messages per second, not millions.
- The time window is usually specified in minutes or hours, not milliseconds.
Backup Duration and Timestamp Spread
The backed_up_at timestamp reflects when the message was captured, not when it was published. For a backup that takes 10 minutes:
- Messages at the start of the backup have earlier
backed_up_atvalues. - Messages at the end have later
backed_up_atvalues. - The spread is the duration of the backup operation.
This means PITR granularity is limited by the backup frequency. For sub-minute granularity, consider running backups more frequently or using stream queues with offset-based checkpointing.
Cross-Queue Consistency
PITR filtering is applied per-queue. If multiple queues have related messages (e.g., orders and payments), filtering by the same time window will include the messages that were captured during that window in each queue. However, because queues are backed up in parallel, there may be slight timing differences between queues.
For strict cross-queue consistency, consider:
- Using a single backup operation for all related queues (they will have similar
backed_up_atspreads). - Adding a small buffer to the time window edges (e.g., extend by 1 minute on each side).
No Random-Access Seek
The current implementation reads and filters all records within a segment sequentially. There is no index for seeking directly to a specific timestamp within a segment. For very large segments, this means the entire segment is decompressed even if only a few records match the time window.
Mitigation: Use smaller segment_max_bytes or segment_max_interval_ms values to produce more, smaller segments. This improves segment-level skip efficiency at the cost of more storage objects.