Skip to main content

Segment Format Specification

This page provides the complete binary-level specification for the RBAK segment format used by rabbitmq-backup. Each segment file contains a batch of backed-up messages in a compact, integrity-checked format.

Design Goals

  1. Self-describing: Each segment contains enough metadata in the header to identify its contents without external context.
  2. Integrity-checked: CRC32 footer enables corruption detection. SHA-256 in the manifest provides end-to-end verification.
  3. Streamable: The header can be read without decompressing the payload, enabling efficient segment filtering by timestamp range.
  4. Compact: Compressed payload with configurable algorithms (zstd, LZ4) reduces storage costs and transfer time.
  5. Simple: Fixed-size header and footer with a variable-length compressed payload. No complex indexing or B-tree structures.

Overall Structure

Offset  0                                                    Offset N
├───────────────────┬────────────────────────┬───────────────┤
│ Header (32 B) │ Compressed Payload │ Footer (8 B) │
│ │ (variable) │ │
└───────────────────┴────────────────────────┴───────────────┘

Total size = 32 + len(compressed_payload) + 8 bytes


Header (32 bytes)

Byte offset:  0    1    2    3    4    5    6    7
┌────┬────┬────┬────┬────┬────┬────┬────┐
0 │ 'R' │ 'B' │ 'A' │ 'K' │ Ver│Comp│ Reserved │
├────┴────┴────┴────┴────┴────┴────┴────┤
8 │ Record Count (u64 LE) │
│ │
├────────────────────────────────────────┤
16 │ First Timestamp (i64 LE) │
│ │
├────────────────────────────────────────┤
24 │ Last Timestamp (i64 LE) │
│ │
└────────────────────────────────────────┘

Field Definitions

OffsetSize (bytes)TypeNameDescription
04[u8; 4]MagicFile magic: ASCII RBAK = [0x52, 0x42, 0x41, 0x4B]. Identifies this as an RBAK segment file.
41u8VersionFormat version number. Current version: 1. Future versions may change the header layout or payload encoding.
51u8CompressionCompression algorithm used for the payload. 0 = None, 1 = zstd, 2 = LZ4.
62[u8; 2]ReservedReserved for future use. Must be [0x00, 0x00]. Readers should ignore these bytes.
88u64 (LE)Record CountNumber of BackupRecord entries in the payload. Little-endian byte order.
168i64 (LE)First Timestampbacked_up_at timestamp (epoch milliseconds) of the first record in the segment. 0 if no records.
248i64 (LE)Last Timestampbacked_up_at timestamp (epoch milliseconds) of the last record in the segment. 0 if no records.

Compression Type Values

ValueAlgorithmFile ExtensionNotes
0None(none)Payload is stored uncompressed.
1zstd.zstDefault. Zstandard compression. Level configurable (1-22, default 3).
2LZ4.lz4LZ4 frame format via lz4_flex. Maximum decompression speed.

Compressed Payload (Variable Length)

The payload begins at byte offset 32 (immediately after the header) and extends to byte offset N - 8 (8 bytes before the end of the file).

Payload length = total_file_size - HEADER_SIZE - FOOTER_SIZE = total_file_size - 40

The payload, when decompressed, contains a sequence of length-prefixed JSON records:

Decompressed payload layout:

┌─────────────────────────────────────────────────────────────────┐
│ len₁ (4 bytes, u32 LE) │ JSON₁ (len₁ bytes) │
├─────────────────────────────────────────────────────────────────┤
│ len₂ (4 bytes, u32 LE) │ JSON₂ (len₂ bytes) │
├─────────────────────────────────────────────────────────────────┤
│ ... │
├─────────────────────────────────────────────────────────────────┤
│ lenₙ (4 bytes, u32 LE) │ JSONₙ (lenₙ bytes) │
└─────────────────────────────────────────────────────────────────┘

Record Framing

Each record is preceded by a 4-byte little-endian u32 indicating the length of the following JSON payload in bytes. This framing allows the reader to parse records without scanning for delimiters.

┌──────────┬─────────────────────────────┐
│ 4 bytes │ len bytes │
│ u32 LE │ JSON-encoded BackupRecord │
│ (length) │ │
└──────────┴─────────────────────────────┘

Record Schema

Each JSON record is a serialized BackupRecord:

{
"body": [98, 121, 116, 101, 115],
"properties": {
"content_type": "application/json",
"content_encoding": null,
"delivery_mode": 2,
"priority": null,
"correlation_id": null,
"reply_to": null,
"expiration": null,
"message_id": "msg-001",
"timestamp": null,
"type_field": null,
"user_id": null,
"app_id": "my-service",
"cluster_id": null
},
"headers": [],
"exchange": "orders.exchange",
"routing_key": "order.created",
"delivery_tag": 42,
"redelivered": false,
"backed_up_at": 1712756400123,
"source_queue": "orders",
"source_vhost": "/"
}

The body field is a JSON array of unsigned bytes ([u8]), or null for messages with no body. The backed_up_at field is an epoch-millisecond timestamp set at the moment the message was captured by the backup tool -- this is the timestamp used for PITR filtering.


Byte offset:  N-8   N-7   N-6   N-5   N-4   N-3   N-2   N-1
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ CRC32 (u32 LE) │ 'K' │ 'A' │ 'B' │ 'R' │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘

Field Definitions

Offset from EOFSize (bytes)TypeNameDescription
-84u32 (LE)CRC32CRC32 checksum (IEEE/Castagnoli via crc32fast) of all preceding bytes: data[0..N-8] (header + compressed payload).
-44[u8; 4]End MagicASCII KABR = [0x4B, 0x41, 0x42, 0x52]. Reverse of the start magic. Signals the end of the segment.

Integrity Verification

Segment integrity is verified at two levels:

Level 1: CRC32 (Segment-Level)

The CRC32 checksum in the footer covers all bytes from offset 0 to N - 8:

CRC32 input:  [Header (32 bytes)] [Compressed payload (variable)]
CRC32 output: stored as u32 LE at offset N-8

Verification algorithm:

let footer_start = data.len() - FOOTER_SIZE;  // N - 8
let expected_crc = u32::from_le_bytes(data[footer_start..footer_start + 4]);
let actual_crc = crc32fast::hash(&data[..footer_start]);
assert_eq!(expected_crc, actual_crc);

This catches:

  • Bit rot in storage
  • Truncated uploads
  • Network corruption during transfer

Level 2: SHA-256 (Manifest-Level)

The SHA-256 digest of the entire segment file (header + compressed payload + footer) is stored in the manifest's SegmentMetadata.checksum field:

SHA-256 input:  [Header (32)] [Compressed payload] [Footer (8)]
SHA-256 output: stored as hex string in manifest.json

This provides end-to-end verification between the manifest and the stored segment. The validate --deep command checks both CRC32 and SHA-256.


Reading a Segment: Step by Step

  1. Read raw bytes from storage.
  2. Verify minimum size: len >= HEADER_SIZE + FOOTER_SIZE (40 bytes).
  3. Check start magic: data[0..4] == b"RBAK".
  4. Check end magic: data[N-4..N] == b"KABR".
  5. Verify CRC32: compute crc32fast::hash(data[0..N-8]) and compare with u32::from_le_bytes(data[N-8..N-4]).
  6. Parse header: extract version, compression, record_count, first/last timestamp.
  7. Extract compressed payload: data[32..N-8].
  8. Decompress using the algorithm indicated in the header.
  9. Parse records: iterate over length-prefixed JSON entries.
  10. Verify record count: ensure the number of parsed records matches the header's record count.

Writing a Segment: Step by Step

  1. Accumulate records in a buffer as length-prefixed JSON: [u32 LE length][JSON bytes].
  2. Check rotation: if uncompressed_bytes >= segment_max_bytes or elapsed >= segment_max_interval_ms, proceed to finalize.
  3. Compress the buffer using the configured algorithm and level.
  4. Build the header (32 bytes):
    • Write magic RBAK
    • Write version 1
    • Write compression type byte
    • Write 2 reserved zero bytes
    • Write record count as u64 LE
    • Write first timestamp as i64 LE
    • Write last timestamp as i64 LE
  5. Concatenate: [header][compressed payload]
  6. Compute CRC32 over the concatenated bytes.
  7. Append footer: [CRC32 as u32 LE][KABR]
  8. Compute SHA-256 over the complete segment (for the manifest).
  9. Upload the complete segment to storage.
  10. Record metadata in the manifest: key, sequence, record_count, size_bytes, uncompressed_bytes, timestamps, checksum.

Byte-Level Example

A minimal segment containing one record with zstd compression:

Offset  Hex                                          ASCII    Field
------ ------------------------------------------- ------- -----
0x00 52 42 41 4B RBAK Magic
0x04 01 Version (1)
0x05 01 Compression (zstd)
0x06 00 00 Reserved
0x08 01 00 00 00 00 00 00 00 Record Count (1)
0x10 CB 04 A7 D2 8E 01 00 00 First Timestamp
0x18 CB 04 A7 D2 8E 01 00 00 Last Timestamp
0x20 28 B5 2F FD ... Compressed payload
... (variable length) ... (zstd frame)
0xNN XX XX XX XX CRC32
0xNN+4 4B 41 42 52 KABR End Magic

Version History

VersionChanges
1Initial format. 32-byte header, length-prefixed JSON records, CRC32 + end magic footer.

Future versions may introduce:

  • Additional header fields in the reserved bytes
  • Alternative payload encodings (e.g., Protocol Buffers, MessagePack)
  • Multi-part segments for parallel decompression
  • Index blocks for random-access record lookup

The version field in the header ensures forward compatibility -- readers that encounter an unknown version can report a clear error rather than silently misinterpreting data.