Skip to content

Backport regression fixes ready for a 3.4.4 release#980

Merged
tridge merged 22 commits into
RsyncProject:v3.4-stablefrom
tridge:v3.4
Jun 8, 2026
Merged

Backport regression fixes ready for a 3.4.4 release#980
tridge merged 22 commits into
RsyncProject:v3.4-stablefrom
tridge:v3.4

Conversation

@tridge

@tridge tridge commented Jun 6, 2026

Copy link
Copy Markdown
Member

This is a set of regression fixes from master on top of 3.4.3. The plan is to make a 3.4.4 release with just regression fixes

tridge and others added 14 commits June 6, 2026 14:52
Build the test-helper programs without running the suite, so an external
harness (fleettest.py) can invoke runtests.py with its own options.
A delta (--no-whole-file) resume whose basis is an absolute --partial-dir
looped forever on exit code 23 ("failed verification -- update put into
partial-dir"), stranding the correct data in the partial-dir and never
populating the destination.

Cause: an absolute --partial-dir makes the basis path absolute, but the
receiver opened it with secure_relative_open(NULL, fnamecmp, ...), which by
design rejects an absolute relpath (EINVAL). The basis fd was then -1, so
receive_data() mapped no basis and (because the matched-block sum_update() is
guarded by "if (mapbuf)") computed the whole-file verification checksum over
the literal data only -> a spurious mismatch every run. (The data itself was
correct, since the in-place update leaves the matched basis bytes in place.)
Under a non-chroot daemon the in-place write went through the same call and
failed outright.

Fix: add secure_basis_open(), which treats an operator-trusted absolute basis
path as (trusted directory + confined leaf) -- the same way secure_relative_open
already trusts an absolute basedir while keeping O_NOFOLLOW on the leaf -- and
use it for both the basis read and the inplace-partial write. The strict
"reject absolute relpath" contract of secure_relative_open is left intact.

Defense-in-depth: receive_data() now treats a block-match token with no mapped
basis as a protocol inconsistency (it can only arise from a basis that the
generator opened but the receiver could not), failing cleanly instead of
silently dropping those bytes from the verify checksum or the output.

Thanks to @sylvain-ilm for the report (RsyncProject#724, RsyncProject#725).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…Project#910)

--delete-missing-args (missing_args==2) sends a missing --files-from arg as a
mode-0 entry (IS_MISSING_FILE), the generator's delete signal.  The mode-type
validation in recv_file_entry() rejected mode 0 as an invalid file type,
aborting the transfer with 'invalid file mode 00 ... code 2' before the
generator could act (a regression from 3.4.1).  Allow mode 0 through only when
missing_args==2 (the delete mode -- not --ignore-missing-args, which never
sends a mode-0 entry); all other modes are still rejected.

Thanks to @mgkeeley for the report (RsyncProject#910).
…ncProject#897)

A daemon module with path=/ makes F_PATHNAME absolute, so the secure_path built
for the content open starts with '/'.  secure_relative_open() rejects an
absolute relpath with EINVAL, so a use-chroot=no daemon with path=/ could not
send any file ('failed to open ...: Invalid argument (22)') -- a regression
from 3.4.2.  Strip leading slashes to a module-relative path; resolution stays
confined beneath module_dir.

Thanks to @moonlitbugs for the report (RsyncProject#897).
RsyncProject#915)

The symlink-race hardening routed the receiver's basis open through
secure_relative_open(), which rejects any '..' -- so a sibling
--link-dest=../01 on a use-chroot=no daemon was silently ignored and every file
re-transferred (RsyncProject#915/RsyncProject#928, a regression from 3.4.1).

Narrow the confinement to the sanitizing daemon (am_daemon && !am_chrooted) and
re-anchor it at the module root, the real trust boundary: secure_relative_open()
prefixes the cwd's module-relative path (from rsync's logical curr_dir[], a
guaranteed lexical prefix of module_dir) and resolves beneath module_dir, so
RESOLVE_BENEATH permits an in-module '..' climb while still rejecting one that
escapes the module.  secure_basis_open() opens with a bare do_open() in the
non-sanitizing cases.  t_stub.c gains weak curr_dir[]/curr_dir_len for the
helpers (via #pragma weak on non-GNU compilers, where rsync.h erases
__attribute__).

Two tests: link-dest-relative-basis asserts the in-module '..' is honoured;
link-dest-module-escape asserts a --link-dest=../../OUTSIDE climb that leaves
the module is refused (not hard-linked to an outside file).  See upstream
PR RsyncProject#930.

Thanks to @fufu65 (RsyncProject#915) and @JetAppsClark (RsyncProject#928) for the reports.
sum_sizes_sqroot() capped the strong-sum length at SUM_LENGTH (16), the
legacy MD4/MD5 digest size.  Since 0902b52 the sum2 array elements are
xfer_sum_len bytes and the sender rejects a sums header whose s2length
exceeds xfer_sum_len.  When the negotiated transfer checksum is shorter
than 16 bytes -- xxh64 (8), used when the build's libxxhash lacks
xxh128/xxh3 (e.g. Ubuntu 20.04) -- the generator still emitted s2length
up to 16, so --append-verify and other full-checksum (redo) transfers
died with "Invalid checksum length 16 [sender]" (protocol incompatibility).

Cap s2length at MIN(SUM_LENGTH, xfer_sum_len): unchanged for any checksum
>= 16 bytes (md5/xxh128/sha1), corrected for short ones.  Also closes a
latent over-read of the xfer_sum_len-sized digest buffer.
Commit d046525 made my_alloc() calloc every fresh allocation and made
expand_item_list() memset the freshly grown tail, to hand out predictably
zeroed memory.  But that forces the kernel to back pages callers never
touch: each per-directory file_list pre-allocates a FLIST_START-entry
(32768) pointer array -- 256KB -- and calloc now zeroes the whole array
even for an empty directory.  With incremental recursion over many
directories the resident set explodes; 80000 empty dirs went from ~336MB
to ~10.8GB.

Restore the pre-d046525d malloc/calloc split: fresh allocations use
malloc (so untouched tails stay lazy) and only explicit do_calloc
requests (new_array0) are zeroed.  Callers that need zeroed memory
already ask for it, and the full test suite passes.

Thanks to @guilherme-puida for the report (RsyncProject#959).

Fixes: RsyncProject#959
…oject#896)

do_mknod_at() (the symlink-race-safe variant used by a non-chrooted
daemon receiver) calls mknodat()/mkfifoat(), but the at-variant was
gated only on AT_FDCWD.  Older Darwin declares AT_FDCWD without
mknodat(), so the build failed with "mknodat undeclared".

Probe mknodat()/mkfifoat() in configure and require HAVE_MKNODAT for the
at-variant; without it do_mknod_at() falls back to do_mknod(), exactly
as it already does where AT_FDCWD is missing.  Linux keeps the mknodat
path since HAVE_MKNODAT is defined there.

Thanks to @debohman for the report (RsyncProject#896).

Fixes: RsyncProject#896
Without --secluded-args, the client's safe_arg() backslash-escapes shell
and wildcard chars in option values before sending them to the server, so
--chown's --usermap=*:user is transmitted as --usermap=\*:user.  Over ssh a
remote shell removes the backslashes before rsync parses the args, but a
daemon has no shell and read_args() stored option args verbatim -- so the
receiver saw the literal "\*", the usermap/groupmap wildcard never matched,
and the module's configured uid/gid won instead.  A regression from the
secluded-args hardening; rsync 3.2.3 (protocol 31) worked.

Un-backslash option args in read_args() on the daemon's first
(non-protected) read, mirroring what the ssh-side shell does.  File args
after the dot are already handled by glob_expand(); the protected (NUL,
already-unescaped) re-read and the server's stdin read pass unescape=0 so
their raw args are left untouched.

Thanks to @elcamlost for the report (RsyncProject#829).

Fixes: RsyncProject#829
send_deflated_token() adds a matched block to the compressor history with
deflate(Z_INSERT_ONLY).  Our bundled zlib implements Z_INSERT_ONLY (it
produces no output and consumes the input in one call), but a build
against a system zlib lacks it and falls back to Z_SYNC_FLUSH (see the top
of the file), which emits a flush block into obuf.  For a large
incompressible matched token that block exceeds AVAIL_OUT_SIZE(CHUNK_SIZE),
so deflate returned with avail_in != 0 and the transfer aborted:

    "deflate on token returned 0 (N bytes left)"  at token.c

The insert output is never sent -- the receiver rebuilds the matching
history itself in see_deflate_token() -- so loop, resetting the output
buffer, and discard it.  Drain with the same condition as the data loop
above: until the input is consumed AND avail_out != 0.  Stopping at
avail_in == 0 alone can leave pending output in the deflate stream (a
full output buffer with bytes still buffered), which would then be emitted
by the next real deflate send and corrupt the stream.  A bundled-zlib
build still finishes in one iteration.

Thanks to @brabalan for the report (RsyncProject#951).

Fixes: RsyncProject#951
A single-file --mkpath copy whose destination parent does not exist
failed under --dry-run: make_path() only *reports* the directories it
would create in a dry run, so change_dir#3 then tried to chdir into a
parent that isn't there and aborted with "change_dir#3 ... failed".

When the parent is genuinely missing in a dry run, skip the chdir and
mark the destination as not-yet-present (dry_run++), exactly as the
multi-file/dir-creation path already does, so the generator doesn't
probe the missing tree.  Gating it on the missing-parent case keeps an
ordinary file-to-file dry run chdir'ing into and itemizing against an
existing destination.

Fixes: RsyncProject#880

Thanks to @pkzc for the report (RsyncProject#880).

Co-authored-by: Stiliyan Tonev (Bark) <[email protected]>
steadytao

This comment was marked as outdated.

@tridge

tridge commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

@steadytao regarding the testsuite, what I've been doing is using fleettest.py to test this PR branch against the master test suite. That is why I've been merging new options into fleettest to support running the testsuite from one branch against the code from another across the whole fleet of machines.
This is the branch I've created to be the "testsuite branch" for 3.4.x stable:
https://github.com/RsyncProject/rsync/tree/v34-stable-testsuite
I guess I could add a CI test that clones that branch and tests 3.4 properly in CI?
The reason I didn't want to just bring the master testsuite into 3.4 is it is such a large change and I also need to create backport patch sets for 3.2.7 and 3.4.1 for the LTS distro maintainers.
happy to discuss if you have ideas

steadytao and others added 4 commits June 8, 2026 08:09
…roject#905/RsyncProject#900, R10 RsyncProject#904)

configure now probes for <linux/openat2.h> + SYS_openat2 and defines
HAVE_OPENAT2 only when both are present; syscall.c gates the openat2 include
and the openat2(RESOLVE_BENEATH) tier on HAVE_OPENAT2, so the build no longer
fails on kernels/headers that lack the openat2 header (3.4.3 included it
unconditionally on Linux).  android.c probes openat2 usability behind a SIGSYS
handler so the Android/Termux seccomp sandbox falls back to the portable
resolver instead of killing the process.

Backport combining c73e006, 83a24c2, the syscall.c guards from 1d5b5ab, and
4634b0a; the --disable-openat2/gcov coverage knobs and test changes are omitted.

Thanks to @mmayer (RsyncProject#924), @fda77 (RsyncProject#905), @darkshram (RsyncProject#900) and @ketas (RsyncProject#904) for the reports.
receive_data() crashed a receiver that was merely DISCARDING a file's
delta stream. discard_receive_data() calls receive_data() with
fname == NULL and fd == -1, so size_r == 0 and mapbuf == NULL. A normal
block-MATCH token (against a block the basis and source share) then
reaches the !mapbuf branch added in 31fbb17 ("receiver: fix absolute
--partial-dir delta resume"), which calls full_fname(fname). full_fname()
dereferences its argument unconditionally (util1.c: `if (*fn == '/')`),
so fname == NULL faults there -> receiver SIGSEGV.

This is a normal-operation crash with a stock cooperating sender, not an
adversarial one. The generator hands the sender real block sums whenever
the basis is readable and we're in delta mode; the receiver only decides
to discard afterwards, when its output cannot be produced -- e.g. the
destination directory is not writable (mkstemp fails), the basis turns
out to be a directory, or a --partial-dir resume is skipped. A MATCH
token arriving during that discard hit the NULL deref.

The 31fbb17 branch is correct only for a REAL output transfer (fd != -1,
fname valid): there, a block match with no mapped basis is a genuine
protocol inconsistency (the generator promised a basis the receiver could
not open), and honoring it would silently omit those bytes from the
verification checksum or leave a hole, so hard-erroring -- and
full_fname(fname) -- is right. It conflated that with the discard path.

The discriminator is fd, not mapbuf: on the discard path fd == -1 always;
on the real-output inconsistency fd != -1. Scope the "no basis file"
protocol error to fd != -1 (where fname is non-NULL and full_fname is
safe) and, on the discard path (fd == -1), absorb the matched bytes
benignly (offset += len; continue) -- symmetric with the literal-token
handling just above, and restoring the pre-31fbb17d behavior. The
real-transfer inconsistency check is preserved unchanged.
The workflows triggered only on 'master', so PRs targeting a release branch
(e.g. v3.4-stable for 3.4.4) got no CI. Add a '*-stable' branch wildcard to
the push and pull_request filters.
tridge added a commit to tridge/rsync that referenced this pull request Jun 7, 2026
…imitation

The RsyncProject#915 relative alt-basis fix relies on kernel RESOLVE_BENEATH
(openat2 on Linux 5.6+, O_RESOLVE_BENEATH on FreeBSD 13+/macOS 15+);
on platforms without it secure_relative_open() rejects any ".."
component, so relative alt-basis dirs are unavailable there. Spell
this out in the release notes per review on PR RsyncProject#980.
tridge added a commit to tridge/rsync that referenced this pull request Jun 7, 2026
The stable branch keeps the old shell test suite, so the modern Python
suite lives on the v34-stable-testsuite branch. Build rsync here and run
that suite against the built binary (helpers/config.h as tooldir from
this build, test scripts via --srcdir), giving regression coverage for
3.4.x without importing the full master suite. Addresses review on PR RsyncProject#980.
tridge added a commit to tridge/rsync that referenced this pull request Jun 7, 2026
The stable branch keeps the old shell test suite, so the modern Python
suite lives on the v34-stable-testsuite branch. Build rsync here and run
that suite against the built binary (helpers/config.h as tooldir from
this build, test scripts via --srcdir), giving regression coverage for
3.4.x without importing the full master suite.

Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports).
Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list
the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the
daemon tests the pipe run skips. Addresses review on PR RsyncProject#980.
@tridge tridge requested review from steadytao and thesamesam June 7, 2026 23:36
@tridge

tridge commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

@steadytao I've now added CI jobs for running the master (approximately) test suite on the 3.4 builds on both ubuntu-latest and ubuntu-2204.

steadytao

This comment was marked as outdated.

tridge added a commit to tridge/rsync that referenced this pull request Jun 8, 2026
Helpers link util2.o but not options.c, so they used the stub's
max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort
with "exceeded --max-alloc=0". CI didn't catch it because the openat2
path avoids those allocations, but the secure_relative_open() fallback
hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the
v34-stable-testsuite fix. Reported by steadytao on PR RsyncProject#980.

@steadytao steadytao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

tridge added 4 commits June 8, 2026 12:32
Add the NEWS entry for rsync 3.4.4 (8 June 2026): the backported
regression fixes, the PORTABILITY note documenting the RsyncProject#915 alt-basis
platform limitation, the openat2 autodetect/mknodat fallback build
notes, the stable-testsuite CI addition, and a CREDITS section for the
contributors, reporters, and the PR RsyncProject#980 review.
The stable branch keeps the old shell test suite, so the modern Python
suite lives on the v34-stable-testsuite branch. Build rsync here and run
that suite against the built binary (helpers/config.h as tooldir from
this build, test scripts via --srcdir), giving regression coverage for
3.4.x without importing the full master suite.

Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports).
Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list
the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the
daemon tests the pipe run skips. Addresses review on PR RsyncProject#980.
Helpers link util2.o but not options.c, so they used the stub's
max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort
with "exceeded --max-alloc=0". CI didn't catch it because the openat2
path avoids those allocations, but the secure_relative_open() fallback
hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the
v34-stable-testsuite fix. Reported by steadytao on PR RsyncProject#980.
@tridge tridge merged commit ed2950f into RsyncProject:v3.4-stable Jun 8, 2026
11 checks passed
tridge added a commit that referenced this pull request Jun 8, 2026
Add the NEWS entry for rsync 3.4.4 (8 June 2026): the backported
regression fixes, the PORTABILITY note documenting the #915 alt-basis
platform limitation, the openat2 autodetect/mknodat fallback build
notes, the stable-testsuite CI addition, and a CREDITS section for the
contributors, reporters, and the PR #980 review.
tridge added a commit that referenced this pull request Jun 8, 2026
The stable branch keeps the old shell test suite, so the modern Python
suite lives on the v34-stable-testsuite branch. Build rsync here and run
that suite against the built binary (helpers/config.h as tooldir from
this build, test scripts via --srcdir), giving regression coverage for
3.4.x without importing the full master suite.

Runs on ubuntu-latest and ubuntu-22.04 (older-LTS coverage for backports).
Each does a pipe-transport pass (with the same RSYNC_EXPECT_SKIPPED list
the v34-stable-testsuite ubuntu jobs use) and a --use-tcp pass for the
daemon tests the pipe run skips. Addresses review on PR #980.
tridge added a commit that referenced this pull request Jun 8, 2026
Helpers link util2.o but not options.c, so they used the stub's
max_alloc = 0, which makes every my_alloc()/my_strdup() in util2.c abort
with "exceeded --max-alloc=0". CI didn't catch it because the openat2
path avoids those allocations, but the secure_relative_open() fallback
hits my_strdup() and aborts. Set max_alloc = (size_t)-1, matching the
v34-stable-testsuite fix. Reported by steadytao on PR #980.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants