Backport regression fixes ready for a 3.4.4 release#980
Conversation
Build the test-helper programs without running the suite, so an external harness (fleettest.py) can invoke runtests.py with its own options.
A delta (--no-whole-file) resume whose basis is an absolute --partial-dir
looped forever on exit code 23 ("failed verification -- update put into
partial-dir"), stranding the correct data in the partial-dir and never
populating the destination.
Cause: an absolute --partial-dir makes the basis path absolute, but the
receiver opened it with secure_relative_open(NULL, fnamecmp, ...), which by
design rejects an absolute relpath (EINVAL). The basis fd was then -1, so
receive_data() mapped no basis and (because the matched-block sum_update() is
guarded by "if (mapbuf)") computed the whole-file verification checksum over
the literal data only -> a spurious mismatch every run. (The data itself was
correct, since the in-place update leaves the matched basis bytes in place.)
Under a non-chroot daemon the in-place write went through the same call and
failed outright.
Fix: add secure_basis_open(), which treats an operator-trusted absolute basis
path as (trusted directory + confined leaf) -- the same way secure_relative_open
already trusts an absolute basedir while keeping O_NOFOLLOW on the leaf -- and
use it for both the basis read and the inplace-partial write. The strict
"reject absolute relpath" contract of secure_relative_open is left intact.
Defense-in-depth: receive_data() now treats a block-match token with no mapped
basis as a protocol inconsistency (it can only arise from a basis that the
generator opened but the receiver could not), failing cleanly instead of
silently dropping those bytes from the verify checksum or the output.
Thanks to @sylvain-ilm for the report (RsyncProject#724, RsyncProject#725).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…Project#910) --delete-missing-args (missing_args==2) sends a missing --files-from arg as a mode-0 entry (IS_MISSING_FILE), the generator's delete signal. The mode-type validation in recv_file_entry() rejected mode 0 as an invalid file type, aborting the transfer with 'invalid file mode 00 ... code 2' before the generator could act (a regression from 3.4.1). Allow mode 0 through only when missing_args==2 (the delete mode -- not --ignore-missing-args, which never sends a mode-0 entry); all other modes are still rejected. Thanks to @mgkeeley for the report (RsyncProject#910).
…ncProject#897) A daemon module with path=/ makes F_PATHNAME absolute, so the secure_path built for the content open starts with '/'. secure_relative_open() rejects an absolute relpath with EINVAL, so a use-chroot=no daemon with path=/ could not send any file ('failed to open ...: Invalid argument (22)') -- a regression from 3.4.2. Strip leading slashes to a module-relative path; resolution stays confined beneath module_dir. Thanks to @moonlitbugs for the report (RsyncProject#897).
RsyncProject#915) The symlink-race hardening routed the receiver's basis open through secure_relative_open(), which rejects any '..' -- so a sibling --link-dest=../01 on a use-chroot=no daemon was silently ignored and every file re-transferred (RsyncProject#915/RsyncProject#928, a regression from 3.4.1). Narrow the confinement to the sanitizing daemon (am_daemon && !am_chrooted) and re-anchor it at the module root, the real trust boundary: secure_relative_open() prefixes the cwd's module-relative path (from rsync's logical curr_dir[], a guaranteed lexical prefix of module_dir) and resolves beneath module_dir, so RESOLVE_BENEATH permits an in-module '..' climb while still rejecting one that escapes the module. secure_basis_open() opens with a bare do_open() in the non-sanitizing cases. t_stub.c gains weak curr_dir[]/curr_dir_len for the helpers (via #pragma weak on non-GNU compilers, where rsync.h erases __attribute__). Two tests: link-dest-relative-basis asserts the in-module '..' is honoured; link-dest-module-escape asserts a --link-dest=../../OUTSIDE climb that leaves the module is refused (not hard-linked to an outside file). See upstream PR RsyncProject#930. Thanks to @fufu65 (RsyncProject#915) and @JetAppsClark (RsyncProject#928) for the reports.
sum_sizes_sqroot() capped the strong-sum length at SUM_LENGTH (16), the legacy MD4/MD5 digest size. Since 0902b52 the sum2 array elements are xfer_sum_len bytes and the sender rejects a sums header whose s2length exceeds xfer_sum_len. When the negotiated transfer checksum is shorter than 16 bytes -- xxh64 (8), used when the build's libxxhash lacks xxh128/xxh3 (e.g. Ubuntu 20.04) -- the generator still emitted s2length up to 16, so --append-verify and other full-checksum (redo) transfers died with "Invalid checksum length 16 [sender]" (protocol incompatibility). Cap s2length at MIN(SUM_LENGTH, xfer_sum_len): unchanged for any checksum >= 16 bytes (md5/xxh128/sha1), corrected for short ones. Also closes a latent over-read of the xfer_sum_len-sized digest buffer.
Commit d046525 made my_alloc() calloc every fresh allocation and made expand_item_list() memset the freshly grown tail, to hand out predictably zeroed memory. But that forces the kernel to back pages callers never touch: each per-directory file_list pre-allocates a FLIST_START-entry (32768) pointer array -- 256KB -- and calloc now zeroes the whole array even for an empty directory. With incremental recursion over many directories the resident set explodes; 80000 empty dirs went from ~336MB to ~10.8GB. Restore the pre-d046525d malloc/calloc split: fresh allocations use malloc (so untouched tails stay lazy) and only explicit do_calloc requests (new_array0) are zeroed. Callers that need zeroed memory already ask for it, and the full test suite passes. Thanks to @guilherme-puida for the report (RsyncProject#959). Fixes: RsyncProject#959
…oject#896) do_mknod_at() (the symlink-race-safe variant used by a non-chrooted daemon receiver) calls mknodat()/mkfifoat(), but the at-variant was gated only on AT_FDCWD. Older Darwin declares AT_FDCWD without mknodat(), so the build failed with "mknodat undeclared". Probe mknodat()/mkfifoat() in configure and require HAVE_MKNODAT for the at-variant; without it do_mknod_at() falls back to do_mknod(), exactly as it already does where AT_FDCWD is missing. Linux keeps the mknodat path since HAVE_MKNODAT is defined there. Thanks to @debohman for the report (RsyncProject#896). Fixes: RsyncProject#896
Without --secluded-args, the client's safe_arg() backslash-escapes shell and wildcard chars in option values before sending them to the server, so --chown's --usermap=*:user is transmitted as --usermap=\*:user. Over ssh a remote shell removes the backslashes before rsync parses the args, but a daemon has no shell and read_args() stored option args verbatim -- so the receiver saw the literal "\*", the usermap/groupmap wildcard never matched, and the module's configured uid/gid won instead. A regression from the secluded-args hardening; rsync 3.2.3 (protocol 31) worked. Un-backslash option args in read_args() on the daemon's first (non-protected) read, mirroring what the ssh-side shell does. File args after the dot are already handled by glob_expand(); the protected (NUL, already-unescaped) re-read and the server's stdin read pass unescape=0 so their raw args are left untouched. Thanks to @elcamlost for the report (RsyncProject#829). Fixes: RsyncProject#829
send_deflated_token() adds a matched block to the compressor history with
deflate(Z_INSERT_ONLY). Our bundled zlib implements Z_INSERT_ONLY (it
produces no output and consumes the input in one call), but a build
against a system zlib lacks it and falls back to Z_SYNC_FLUSH (see the top
of the file), which emits a flush block into obuf. For a large
incompressible matched token that block exceeds AVAIL_OUT_SIZE(CHUNK_SIZE),
so deflate returned with avail_in != 0 and the transfer aborted:
"deflate on token returned 0 (N bytes left)" at token.c
The insert output is never sent -- the receiver rebuilds the matching
history itself in see_deflate_token() -- so loop, resetting the output
buffer, and discard it. Drain with the same condition as the data loop
above: until the input is consumed AND avail_out != 0. Stopping at
avail_in == 0 alone can leave pending output in the deflate stream (a
full output buffer with bytes still buffered), which would then be emitted
by the next real deflate send and corrupt the stream. A bundled-zlib
build still finishes in one iteration.
Thanks to @brabalan for the report (RsyncProject#951).
Fixes: RsyncProject#951
A single-file --mkpath copy whose destination parent does not exist failed under --dry-run: make_path() only *reports* the directories it would create in a dry run, so change_dir#3 then tried to chdir into a parent that isn't there and aborted with "change_dir#3 ... failed". When the parent is genuinely missing in a dry run, skip the chdir and mark the destination as not-yet-present (dry_run++), exactly as the multi-file/dir-creation path already does, so the generator doesn't probe the missing tree. Gating it on the missing-parent case keeps an ordinary file-to-file dry run chdir'ing into and itemizing against an existing destination. Fixes: RsyncProject#880 Thanks to @pkzc for the report (RsyncProject#880). Co-authored-by: Stiliyan Tonev (Bark) <[email protected]>
Otherwise we get errors. Fixes: RsyncProject#927
|
@steadytao regarding the testsuite, what I've been doing is using fleettest.py to test this PR branch against the master test suite. That is why I've been merging new options into fleettest to support running the testsuite from one branch against the code from another across the whole fleet of machines. |
…roject#905/RsyncProject#900, R10 RsyncProject#904) configure now probes for <linux/openat2.h> + SYS_openat2 and defines HAVE_OPENAT2 only when both are present; syscall.c gates the openat2 include and the openat2(RESOLVE_BENEATH) tier on HAVE_OPENAT2, so the build no longer fails on kernels/headers that lack the openat2 header (3.4.3 included it unconditionally on Linux). android.c probes openat2 usability behind a SIGSYS handler so the Android/Termux seccomp sandbox falls back to the portable resolver instead of killing the process. Backport combining c73e006, 83a24c2, the syscall.c guards from 1d5b5ab, and 4634b0a; the --disable-openat2/gcov coverage knobs and test changes are omitted. Thanks to @mmayer (RsyncProject#924), @fda77 (RsyncProject#905), @darkshram (RsyncProject#900) and @ketas (RsyncProject#904) for the reports.
receive_data() crashed a receiver that was merely DISCARDING a file's delta stream. discard_receive_data() calls receive_data() with fname == NULL and fd == -1, so size_r == 0 and mapbuf == NULL. A normal block-MATCH token (against a block the basis and source share) then reaches the !mapbuf branch added in 31fbb17 ("receiver: fix absolute --partial-dir delta resume"), which calls full_fname(fname). full_fname() dereferences its argument unconditionally (util1.c: `if (*fn == '/')`), so fname == NULL faults there -> receiver SIGSEGV. This is a normal-operation crash with a stock cooperating sender, not an adversarial one. The generator hands the sender real block sums whenever the basis is readable and we're in delta mode; the receiver only decides to discard afterwards, when its output cannot be produced -- e.g. the destination directory is not writable (mkstemp fails), the basis turns out to be a directory, or a --partial-dir resume is skipped. A MATCH token arriving during that discard hit the NULL deref. The 31fbb17 branch is correct only for a REAL output transfer (fd != -1, fname valid): there, a block match with no mapped basis is a genuine protocol inconsistency (the generator promised a basis the receiver could not open), and honoring it would silently omit those bytes from the verification checksum or leave a hole, so hard-erroring -- and full_fname(fname) -- is right. It conflated that with the discard path. The discriminator is fd, not mapbuf: on the discard path fd == -1 always; on the real-output inconsistency fd != -1. Scope the "no basis file" protocol error to fd != -1 (where fname is non-NULL and full_fname is safe) and, on the discard path (fd == -1), absorb the matched bytes benignly (offset += len; continue) -- symmetric with the literal-token handling just above, and restoring the pre-31fbb17d behavior. The real-transfer inconsistency check is preserved unchanged.
The workflows triggered only on 'master', so PRs targeting a release branch (e.g. v3.4-stable for 3.4.4) got no CI. Add a '*-stable' branch wildcard to the push and pull_request filters.
…imitation The RsyncProject#915 relative alt-basis fix relies on kernel RESOLVE_BENEATH (openat2 on Linux 5.6+, O_RESOLVE_BENEATH on FreeBSD 13+/macOS 15+); on platforms without it secure_relative_open() rejects any ".." component, so relative alt-basis dirs are unavailable there. Spell this out in the release notes per review on PR RsyncProject#980.
The stable branch keeps the old shell test suite, so the modern Python suite lives on the v34-stable-testsuite branch. Build rsync here and run that suite against the built binary (helpers/config.h as tooldir from this build, test scripts via --srcdir), giving regression coverage for 3.4.x without importing the full master suite. Addresses review on PR RsyncProject#980.
The stable branch keeps the old shell test suite, so the modern Python suite lives on the v34-stable-testsuite branch. Build rsync here and run that suite against the built binary (helpers/config.h as tooldir from this build, test scripts via --srcdir), giving regression coverage for 3.4.x without importing the full master suite. Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports). Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the daemon tests the pipe run skips. Addresses review on PR RsyncProject#980.
|
@steadytao I've now added CI jobs for running the master (approximately) test suite on the 3.4 builds on both ubuntu-latest and ubuntu-2204. |
Helpers link util2.o but not options.c, so they used the stub's max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort with "exceeded --max-alloc=0". CI didn't catch it because the openat2 path avoids those allocations, but the secure_relative_open() fallback hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the v34-stable-testsuite fix. Reported by steadytao on PR RsyncProject#980.
Add the NEWS entry for rsync 3.4.4 (8 June 2026): the backported regression fixes, the PORTABILITY note documenting the RsyncProject#915 alt-basis platform limitation, the openat2 autodetect/mknodat fallback build notes, the stable-testsuite CI addition, and a CREDITS section for the contributors, reporters, and the PR RsyncProject#980 review.
The stable branch keeps the old shell test suite, so the modern Python suite lives on the v34-stable-testsuite branch. Build rsync here and run that suite against the built binary (helpers/config.h as tooldir from this build, test scripts via --srcdir), giving regression coverage for 3.4.x without importing the full master suite. Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports). Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the daemon tests the pipe run skips. Addresses review on PR RsyncProject#980.
Helpers link util2.o but not options.c, so they used the stub's max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort with "exceeded --max-alloc=0". CI didn't catch it because the openat2 path avoids those allocations, but the secure_relative_open() fallback hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the v34-stable-testsuite fix. Reported by steadytao on PR RsyncProject#980.
Add the NEWS entry for rsync 3.4.4 (8 June 2026): the backported regression fixes, the PORTABILITY note documenting the #915 alt-basis platform limitation, the openat2 autodetect/mknodat fallback build notes, the stable-testsuite CI addition, and a CREDITS section for the contributors, reporters, and the PR #980 review.
The stable branch keeps the old shell test suite, so the modern Python suite lives on the v34-stable-testsuite branch. Build rsync here and run that suite against the built binary (helpers/config.h as tooldir from this build, test scripts via --srcdir), giving regression coverage for 3.4.x without importing the full master suite. Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports). Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the daemon tests the pipe run skips. Addresses review on PR #980.
Helpers link util2.o but not options.c, so they used the stub's max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort with "exceeded --max-alloc=0". CI didn't catch it because the openat2 path avoids those allocations, but the secure_relative_open() fallback hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the v34-stable-testsuite fix. Reported by steadytao on PR #980.
This is a set of regression fixes from master on top of 3.4.3. The plan is to make a 3.4.4 release with just regression fixes