1 # File formats {#page_formats}
5 There are two file formats used by `librsync` and `rdiff`: the
6 *signature* file, which summarizes a data file, and the *delta* file,
7 which describes the edits from one data file to another.
9 librsync does not know or care about any formats in the data files.
11 All integers are big-endian.
15 All librsync files start with a `uint32` magic number identifying them.
16 These are declared in `librsync.h`:
19 /** A delta file. At present, there's only one delta format. **/
20 RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */
23 * A signature file with MD4 signatures. Backward compatible with
24 * librsync < 1.0, but strongly deprecated because it creates a security
25 * vulnerability on files containing partly untrusted data. See
26 * <https://github.com/librsync/librsync/issues/5>.
28 RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */
31 * A signature file using the BLAKE2 hash. Supported from librsync 1.0.
33 RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */
38 Signatures consist of a header followed by a number of block
41 Each block signature gives signature hashes for one block of
42 `block_len` bytes from the input data file. The final data block
43 may be shorter. The number of blocks in the signature is therefore
45 ceil(input_len/block_len)
47 The signature header is (see `rs_sig_s_header`):
49 u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC
50 u32 block_len; // bytes per block
51 u32 strong_sum_len; // bytes per strong sum in each block
53 The block signature contains a rolling or weak checksum used to find
54 moved data, and a strong hash used to check the match is correct.
55 The weak checksum is computed as in `rollsum.c`. The strong hash is
56 either MD4 or BLAKE2 depending on the magic number.
58 To make the signatures smaller at a cost of a greater chance of collisions,
59 the `strong_sum_len` in the header can cause the strong sum to be truncated
60 to the left after computation.
62 Each signature block format is (see `rs_sig_do_block`):
65 u8[strong_sum_len] strong_sum;
69 Deltas consist of the delta magic constant `RS_DELTA_MAGIC` followed by a
70 series of commands. Commands tell the patch logic how to construct the result
71 file (new version) from the basis file (old version).
73 There are three kinds of commands: the literal command, the copy command, and
74 the end command. A command consists of a single byte followed by zero or more
75 arguments. The number and size of the arguments are defined in `prototab.c`.
77 A literal command describes data not present in the basis file. It has one
78 argument: `length`. The format is:
80 u8 command; // in the range 0x41 through 0x44 inclusive
82 u8[length] data; // new data to append
84 A copy command describes a range of data in the basis file. It has two
85 arguments: `start` and `length`. The format is:
87 u8 command; // in the range 0x45 through 0x54 inclusive
88 u8[arg1_len] start; // offset in the basis to begin copying data
89 u8[arg2_len] length; // number of bytes to copy from the basis
91 The end command indicates the end of the delta file. It consists of a single
92 null byte and has no arguments.