diff mbox series

vfs: add fchmodat2 syscall

Message ID 20200910142335.GG3265@brightrain.aerifal.cx (mailing list archive)
State New, archived
Headers show
Series vfs: add fchmodat2 syscall | expand

Commit Message

dalias@libc.org Sept. 10, 2020, 2:23 p.m. UTC
POSIX defines fchmodat as having a 4th argument, flags, that can be
AT_SYMLINK_NOFOLLOW. Support for changing the access mode of symbolic
links is optional (EOPNOTSUPP allowed if not supported), but this flag
is important even on systems where symlinks do not have access modes,
since it's the only way to safely change the mode of a file which
might be asynchronously replaced with a symbolic link, without a race
condition whereby the link target is changed.

It's possible to emulate AT_SYMLINK_NOFOLLOW in userspace, and both
musl libc and glibc do this, by opening an O_PATH file descriptor and
performing chmod on the corresponding magic symlink in /proc/self/fd.
However, this requires procfs to be mounted and accessible.

It was determined (see glibc issue #14578 and commit a492b1e5ef) that,
on some filesystems, performing chmod on the link itself produces a
change in the inode's access mode, but returns an EOPNOTSUPP error.
This is non-conforming and wrong. Rather than try to fix all the
broken filesystem backends, block attempts to change the symlink
access mode via fchmodat2 at the frontend layer. This matches the
userspace emulation done in libc implementations. No change is made to
the underlying chmod_common(), so it's still possible to attempt
changes via procfs, if desired. If at some point all filesystems have
been fixed, this could be relaxed to allow filesystems to make their
own decision whether changing access mode of links is supported.

Signed-off-by: Rich Felker <dalias@libc.org>
---
 arch/alpha/kernel/syscalls/syscall.tbl      |  1 +
 arch/arm/tools/syscall.tbl                  |  1 +
 arch/arm64/include/asm/unistd.h             |  2 +-
 arch/arm64/include/asm/unistd32.h           |  2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +
 arch/s390/kernel/syscalls/syscall.tbl       |  1 +
 arch/sh/kernel/syscalls/syscall.tbl         |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +
 fs/open.c                                   | 29 ++++++++++++++++++---
 include/linux/syscalls.h                    |  2 ++
 include/uapi/asm-generic/unistd.h           |  4 ++-
 21 files changed, 49 insertions(+), 6 deletions(-)

Comments

Al Viro Sept. 10, 2020, 3:18 p.m. UTC | #1
On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:

> It was determined (see glibc issue #14578 and commit a492b1e5ef) that,
> on some filesystems, performing chmod on the link itself produces a
> change in the inode's access mode, but returns an EOPNOTSUPP error.

Which filesystem types are those?
David Laight Sept. 10, 2020, 3:23 p.m. UTC | #2
From: Rich Felker
> Sent: 10 September 2020 15:24
...
> index 9af548fb841b..570a21f4d81e 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -610,15 +610,30 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
>  	return err;
>  }
> 
> -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int flags)
>  {
>  	struct path path;
>  	int error;
>  	unsigned int lookup_flags = LOOKUP_FOLLOW;
> +
> +	if (flags & AT_SYMLINK_NOFOLLOW)
> +		lookup_flags &= ~LOOKUP_FOLLOW;
> +	if (flags & ~AT_SYMLINK_NOFOLLOW)
> +		return -EINVAL;

I think I'd swap over those two tests.
So unsupported flags are clearly errored.

>  retry:
>  	error = user_path_at(dfd, filename, lookup_flags, &path);
>  	if (!error) {
> -		error = chmod_common(&path, mode);
> +		/* Block chmod from getting to fs layer. Ideally the
> +		 * fs would either allow it or fail with EOPNOTSUPP,
> +		 * but some are buggy and return an error but change
> +		 * the mode, which is non-conforming and wrong.
> +		 * Userspace emulation of AT_SYMLINK_NOFOLLOW in
> +		 * glibc and musl blocked it too, for same reason. */
> +		if (S_ISLNK(path.dentry->d_inode->i_mode)
> +		    && (flags & AT_SYMLINK_NOFOLLOW))
> +			error = -EOPNOTSUPP;

Again swap the order of the tests. I think it reads better as:
		if ((flags & AT_SYMLINK_NOFOLLOW)
		    && S_ISLNK(path.dentry->d_inode->i_mode))
			error = -EOPNOTSUPP;
As well as saving a few clock cycles.

> +		else
> +			error = chmod_common(&path, mode);
>  		path_put(&path);
>  		if (retry_estale(error, lookup_flags)) {
>  			lookup_flags |= LOOKUP_REVAL;
...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
dalias@libc.org Sept. 10, 2020, 3:45 p.m. UTC | #3
On Thu, Sep 10, 2020 at 04:18:28PM +0100, Al Viro wrote:
> On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> 
> > It was determined (see glibc issue #14578 and commit a492b1e5ef) that,
> > on some filesystems, performing chmod on the link itself produces a
> > change in the inode's access mode, but returns an EOPNOTSUPP error.
> 
> Which filesystem types are those?

It's been a long time and I don't know if the details were recorded.
It was reported for xfs but I believe we later found it happening for
others. See:

https://sourceware.org/bugzilla/show_bug.cgi?id=14578#c17
https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html

and especially:

https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00518.html

where Christoph seems to have endorsed the approach in my patch. I'm
fine with doing it differently if you'd prefer, though.

Rich
Greg Kroah-Hartman Sept. 10, 2020, 4:16 p.m. UTC | #4
On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> POSIX defines fchmodat as having a 4th argument, flags, that can be
> AT_SYMLINK_NOFOLLOW. Support for changing the access mode of symbolic
> links is optional (EOPNOTSUPP allowed if not supported), but this flag
> is important even on systems where symlinks do not have access modes,
> since it's the only way to safely change the mode of a file which
> might be asynchronously replaced with a symbolic link, without a race
> condition whereby the link target is changed.
> 
> It's possible to emulate AT_SYMLINK_NOFOLLOW in userspace, and both
> musl libc and glibc do this, by opening an O_PATH file descriptor and
> performing chmod on the corresponding magic symlink in /proc/self/fd.
> However, this requires procfs to be mounted and accessible.
> 
> It was determined (see glibc issue #14578 and commit a492b1e5ef) that,
> on some filesystems, performing chmod on the link itself produces a
> change in the inode's access mode, but returns an EOPNOTSUPP error.
> This is non-conforming and wrong. Rather than try to fix all the
> broken filesystem backends, block attempts to change the symlink
> access mode via fchmodat2 at the frontend layer. This matches the
> userspace emulation done in libc implementations. No change is made to
> the underlying chmod_common(), so it's still possible to attempt
> changes via procfs, if desired. If at some point all filesystems have
> been fixed, this could be relaxed to allow filesystems to make their
> own decision whether changing access mode of links is supported.

A new syscall just because we have broken filesystems seems really odd,
why not just fix the filesystems instead?

thanks,

greg k-h
Christoph Hellwig Sept. 10, 2020, 4:20 p.m. UTC | #5
On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> userspace emulation done in libc implementations. No change is made to
> the underlying chmod_common(), so it's still possible to attempt
> changes via procfs, if desired.

And that is the goddamn problem.  We need to fix that _first_.  After
that we can add sugarcoating using new syscalls if needed.
dalias@libc.org Sept. 10, 2020, 4:35 p.m. UTC | #6
On Thu, Sep 10, 2020 at 06:16:15PM +0200, Greg KH wrote:
> On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> > POSIX defines fchmodat as having a 4th argument, flags, that can be
> > AT_SYMLINK_NOFOLLOW. Support for changing the access mode of symbolic
> > links is optional (EOPNOTSUPP allowed if not supported), but this flag
> > is important even on systems where symlinks do not have access modes,
> > since it's the only way to safely change the mode of a file which
> > might be asynchronously replaced with a symbolic link, without a race
> > condition whereby the link target is changed.
> > 
> > It's possible to emulate AT_SYMLINK_NOFOLLOW in userspace, and both
> > musl libc and glibc do this, by opening an O_PATH file descriptor and
> > performing chmod on the corresponding magic symlink in /proc/self/fd.
> > However, this requires procfs to be mounted and accessible.
> > 
> > It was determined (see glibc issue #14578 and commit a492b1e5ef) that,
> > on some filesystems, performing chmod on the link itself produces a
> > change in the inode's access mode, but returns an EOPNOTSUPP error.
> > This is non-conforming and wrong. Rather than try to fix all the
> > broken filesystem backends, block attempts to change the symlink
> > access mode via fchmodat2 at the frontend layer. This matches the
> > userspace emulation done in libc implementations. No change is made to
> > the underlying chmod_common(), so it's still possible to attempt
> > changes via procfs, if desired. If at some point all filesystems have
> > been fixed, this could be relaxed to allow filesystems to make their
> > own decision whether changing access mode of links is supported.
> 
> A new syscall just because we have broken filesystems seems really odd,
> why not just fix the filesystems instead?

The part about broken filesystems is just the justification for doing
the EOPNOTSUPP check at this layer rather than relying on the
filesystem to do it, not the purposse for the syscall.

The purpose of the syscall is fixing the deficiency in the original
one, which lacked the flags argument, making it so you have to do a
complicated emulation dance involving O_PATH and procfs magic symlinks
to implement the standard functionality.

Rich
dalias@libc.org Sept. 10, 2020, 4:39 p.m. UTC | #7
On Thu, Sep 10, 2020 at 05:20:59PM +0100, Christoph Hellwig wrote:
> On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> > userspace emulation done in libc implementations. No change is made to
> > the underlying chmod_common(), so it's still possible to attempt
> > changes via procfs, if desired.
> 
> And that is the goddamn problem.  We need to fix that _first_.

Can you clarify exactly what that is? Do you mean fixing the
underlying fs backends, or just ensuring that the chmod for symlinks
doesn't reach them by putting the check in chmod_common? I'm ok with
any of these.

> After that we can add sugarcoating using new syscalls if needed.

The new syscall is _not_ about this problem. It's about the missing
flags argument and inability to implement fchmodat() without access to
procfs. The above problem is just something you encounter and have to
make a decision about in order to fix the missing flags problem and
make a working AT_SYMLINK_NOFOLLOW.

Rich
Christoph Hellwig Sept. 10, 2020, 4:42 p.m. UTC | #8
On Thu, Sep 10, 2020 at 12:39:50PM -0400, Rich Felker wrote:
> On Thu, Sep 10, 2020 at 05:20:59PM +0100, Christoph Hellwig wrote:
> > On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> > > userspace emulation done in libc implementations. No change is made to
> > > the underlying chmod_common(), so it's still possible to attempt
> > > changes via procfs, if desired.
> > 
> > And that is the goddamn problem.  We need to fix that _first_.
> 
> Can you clarify exactly what that is? Do you mean fixing the
> underlying fs backends, or just ensuring that the chmod for symlinks
> doesn't reach them by putting the check in chmod_common? I'm ok with
> any of these.

Either - we need to make sure the user can't change the permission
bits.

> > After that we can add sugarcoating using new syscalls if needed.
> 
> The new syscall is _not_ about this problem. It's about the missing
> flags argument and inability to implement fchmodat() without access to
> procfs. The above problem is just something you encounter and have to
> make a decision about in order to fix the missing flags problem and
> make a working AT_SYMLINK_NOFOLLOW.

And I'm generally supportive of that.  But we need to fix the damn
bug first an then do nice to haves.
dalias@libc.org Sept. 10, 2020, 5:02 p.m. UTC | #9
On Thu, Sep 10, 2020 at 05:42:34PM +0100, Christoph Hellwig wrote:
> On Thu, Sep 10, 2020 at 12:39:50PM -0400, Rich Felker wrote:
> > On Thu, Sep 10, 2020 at 05:20:59PM +0100, Christoph Hellwig wrote:
> > > On Thu, Sep 10, 2020 at 10:23:37AM -0400, Rich Felker wrote:
> > > > userspace emulation done in libc implementations. No change is made to
> > > > the underlying chmod_common(), so it's still possible to attempt
> > > > changes via procfs, if desired.
> > > 
> > > And that is the goddamn problem.  We need to fix that _first_.
> > 
> > Can you clarify exactly what that is? Do you mean fixing the
> > underlying fs backends, or just ensuring that the chmod for symlinks
> > doesn't reach them by putting the check in chmod_common? I'm ok with
> > any of these.
> 
> Either - we need to make sure the user can't change the permission
> bits.
> 
> > > After that we can add sugarcoating using new syscalls if needed.
> > 
> > The new syscall is _not_ about this problem. It's about the missing
> > flags argument and inability to implement fchmodat() without access to
> > procfs. The above problem is just something you encounter and have to
> > make a decision about in order to fix the missing flags problem and
> > make a working AT_SYMLINK_NOFOLLOW.
> 
> And I'm generally supportive of that.  But we need to fix the damn
> bug first an then do nice to haves.

Would you be happy with a pair of patches where the first blocks chmod
of symlinks in chmod_common and the second adds the syscall with
flags? I think this is a clearly understandable fix, but it does
eliminate the ability to *fix* link access modes that have been set to
ridiculous values (note: I don't think it really matters since the
modes don't do anything anyway) in the past.

That's why I preferred to *start* with the forced-EOPNOTSUPP just in
the new interface, so that links won't inadvertently get bogus modes
set on them when libc starts using it. As long as some filesystems are
representing access modes in links (and returning them via stat), it
seems like there should be a way to "fix" any that were set in the
past. The patch as I've submitted it now is the least invasive change
in this sense; it does not take away any capability that already
existed.

Rich
Christoph Hellwig Sept. 11, 2020, 6:57 a.m. UTC | #10
On Thu, Sep 10, 2020 at 01:02:56PM -0400, Rich Felker wrote:
> Would you be happy with a pair of patches where the first blocks chmod
> of symlinks in chmod_common and the second adds the syscall with
> flags? I think this is a clearly understandable fix, but it does
> eliminate the ability to *fix* link access modes that have been set to
> ridiculous values (note: I don't think it really matters since the
> modes don't do anything anyway) in the past.

I'd be much happier with that, yes.
diff mbox series

Patch

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index ec8bed9e7b75..5648fa8be7a1 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@ 
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
+550	common	fchmodat2			sys_fchmodat2
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 171077cbf419..b6b715bb3315 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -453,3 +453,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@ 
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9..cd0845f3c19f 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -887,6 +887,8 @@  __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_fchmodat2 440
+__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index f52a41f4c340..7c3f8564d0f3 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -360,3 +360,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 81fc799d8392..063d875377bf 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index b4e263916f41..6aea8a435fd0 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -445,3 +445,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a4..a9205843251d 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -378,3 +378,4 @@ 
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
+440	n32	fchmodat2			sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 557f9954a2b9..31da28e2d6f3 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -354,3 +354,4 @@ 
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
+440	n64	fchmodat2			sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c8..af0e38302ed8 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -427,3 +427,4 @@ 
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
+440	o32	fchmodat2			sys_fchmodat2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4..379cdb44ca0b 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -437,3 +437,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7b..ada11db506e6 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -529,3 +529,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 10456bc936fb..a4dae0abb353 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@ 
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
+440  common	fchmodat2		sys_fchmodat2			sys_fchmodat2
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index ae0a00beea5f..b59b4408b85f 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4af114e84f20..e817416f81df 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -485,3 +485,4 @@ 
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9d1102873666..208b06650cef 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -444,3 +444,4 @@ 
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	fchmodat2		sys_fchmodat2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a688..d9a591db72fb 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -361,6 +361,7 @@ 
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	fchmodat2		sys_fchmodat2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 6276e3c2d3fc..ff756cb2f5d7 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -410,3 +410,4 @@ 
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	fchmodat2			sys_fchmodat2
diff --git a/fs/open.c b/fs/open.c
index 9af548fb841b..570a21f4d81e 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -610,15 +610,30 @@  SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 	return err;
 }
 
-static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int flags)
 {
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW;
+
+	if (flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & ~AT_SYMLINK_NOFOLLOW)
+		return -EINVAL;
 retry:
 	error = user_path_at(dfd, filename, lookup_flags, &path);
 	if (!error) {
-		error = chmod_common(&path, mode);
+		/* Block chmod from getting to fs layer. Ideally the
+		 * fs would either allow it or fail with EOPNOTSUPP,
+		 * but some are buggy and return an error but change
+		 * the mode, which is non-conforming and wrong.
+		 * Userspace emulation of AT_SYMLINK_NOFOLLOW in
+		 * glibc and musl blocked it too, for same reason. */
+		if (S_ISLNK(path.dentry->d_inode->i_mode)
+		    && (flags & AT_SYMLINK_NOFOLLOW))
+			error = -EOPNOTSUPP;
+		else
+			error = chmod_common(&path, mode);
 		path_put(&path);
 		if (retry_estale(error, lookup_flags)) {
 			lookup_flags |= LOOKUP_REVAL;
@@ -628,15 +643,21 @@  static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
 	return error;
 }
 
+SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
+		umode_t, mode, int, flags)
+{
+	return do_fchmodat(dfd, filename, mode, flags);
+}
+
 SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
 		umode_t, mode)
 {
-	return do_fchmodat(dfd, filename, mode);
+	return do_fchmodat(dfd, filename, mode, 0);
 }
 
 SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
 {
-	return do_fchmodat(AT_FDCWD, filename, mode);
+	return do_fchmodat(AT_FDCWD, filename, mode, 0);
 }
 
 int chown_common(const struct path *path, uid_t user, gid_t group)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 75ac7f8ae93c..ced00c56eba7 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -435,6 +435,8 @@  asmlinkage long sys_chroot(const char __user *filename);
 asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
 asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
 			     umode_t mode);
+asmlinkage long sys_fchmodat2(int dfd, const char __user * filename,
+			      umode_t mode, int flags);
 asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
 			     gid_t group, int flag);
 asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d..ebf5cdb3f444 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@  __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_fchmodat2 440
+__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different