diff mbox series

[RFC] f*xattr: allow O_PATH descriptors

Message ID 20220607153139.35588-1-cgzones@googlemail.com (mailing list archive)
State Handled Elsewhere
Delegated to: Paul Moore
Headers show
Series [RFC] f*xattr: allow O_PATH descriptors | expand

Commit Message

Christian Göttsche June 7, 2022, 3:31 p.m. UTC
From: Miklos Szeredi <mszeredi@redhat.com>

Support file descriptors obtained via O_PATH for extended attribute
operations.

Extended attributes are for example used by SELinux for the security
context of file objects. To avoid time-of-check-time-of-use issues while
setting those contexts it is advisable to pin the file in question and
operate on a file descriptor instead of the path name. This can be
emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
which might not be mounted e.g. inside of chroots, see[2].

[1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
[2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845

Original patch by Miklos Szeredi <mszeredi@redhat.com>
https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/

> While this carries a minute risk of someone relying on the property of
> xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> introducing another set of syscalls.
>
> Only file->f_path and file->f_inode are accessed in these functions.
>
> Current versions return EBADF, hence easy to detect the presense of
> this feature and fall back in case it's missing.

CC: linux-api@vger.kernel.org
CC: linux-man@vger.kernel.org
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
---
 fs/xattr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Amir Goldstein June 8, 2022, 5:13 a.m. UTC | #1
On Wed, Jun 8, 2022 at 5:23 AM Christian Göttsche
<cgzones@googlemail.com> wrote:
>
> From: Miklos Szeredi <mszeredi@redhat.com>
>
> Support file descriptors obtained via O_PATH for extended attribute
> operations.
>
> Extended attributes are for example used by SELinux for the security
> context of file objects. To avoid time-of-check-time-of-use issues while
> setting those contexts it is advisable to pin the file in question and
> operate on a file descriptor instead of the path name. This can be
> emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> which might not be mounted e.g. inside of chroots, see[2].
>
> [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
>
> Original patch by Miklos Szeredi <mszeredi@redhat.com>
> https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
>
> > While this carries a minute risk of someone relying on the property of
> > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > introducing another set of syscalls.

The bitter irony is that we now want to add another set of syscalls ;-)

https://lore.kernel.org/linux-fsdevel/CAOQ4uxiqG-w8s+zRqk945UtJcE4u0zjPhSs=MSYJ0jMLLjUTFg@mail.gmail.com/

> >
> > Only file->f_path and file->f_inode are accessed in these functions.
> >
> > Current versions return EBADF, hence easy to detect the presense of
> > this feature and fall back in case it's missing.
>
> CC: linux-api@vger.kernel.org
> CC: linux-man@vger.kernel.org
> Signed-off-by: Christian Göttsche <cgzones@googlemail.com>

I think it is important to inspect this with consistency of the UAPI in mind.
What I see is that fchdir(), fcntl(), fstat(), fstatat() already accept O_PATH
so surely they behave the same w.r.t old kernels and EBADF.
Those could all be better documented in their man pages.

w.r.t permission checks, this is no different than what *xattr() variants
already provide.

Therefore, I see no reason to object to this UAPI change.

You may add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Thanks,
Amir.

> ---
>  fs/xattr.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xattr.c b/fs/xattr.c
> index e8dd03e4561e..16360ac4eb1b 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -656,7 +656,7 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
>  SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
>                 const void __user *,value, size_t, size, int, flags)
>  {
> -       struct fd f = fdget(fd);
> +       struct fd f = fdget_raw(fd);
>         int error = -EBADF;
>
>         if (!f.file)
> @@ -768,7 +768,7 @@ SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
>  SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
>                 void __user *, value, size_t, size)
>  {
> -       struct fd f = fdget(fd);
> +       struct fd f = fdget_raw(fd);
>         ssize_t error = -EBADF;
>
>         if (!f.file)
> @@ -844,7 +844,7 @@ SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list,
>
>  SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
>  {
> -       struct fd f = fdget(fd);
> +       struct fd f = fdget_raw(fd);
>         ssize_t error = -EBADF;
>
>         if (!f.file)
> @@ -910,7 +910,7 @@ SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname,
>
>  SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
>  {
> -       struct fd f = fdget(fd);
> +       struct fd f = fdget_raw(fd);
>         int error = -EBADF;
>
>         if (!f.file)
> --
> 2.36.1
>
Christian Brauner June 8, 2022, 11:27 a.m. UTC | #2
On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Support file descriptors obtained via O_PATH for extended attribute
> operations.
> 
> Extended attributes are for example used by SELinux for the security
> context of file objects. To avoid time-of-check-time-of-use issues while
> setting those contexts it is advisable to pin the file in question and
> operate on a file descriptor instead of the path name. This can be
> emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> which might not be mounted e.g. inside of chroots, see[2].
> 
> [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> 
> Original patch by Miklos Szeredi <mszeredi@redhat.com>
> https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> 
> > While this carries a minute risk of someone relying on the property of
> > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > introducing another set of syscalls.
> >
> > Only file->f_path and file->f_inode are accessed in these functions.
> >
> > Current versions return EBADF, hence easy to detect the presense of
> > this feature and fall back in case it's missing.
> 
> CC: linux-api@vger.kernel.org
> CC: linux-man@vger.kernel.org
> Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> ---

I'd be somewhat fine with getxattr and listxattr but I'm worried that
setxattr/removexattr waters down O_PATH semantics even more. I don't
want O_PATH fds to be useable for operations which are semantically
equivalent to a write.

In sensitive environments such as service management/container runtimes
we often send O_PATH fds around precisely because it is restricted what
they can be used for. I'd prefer to not to plug at this string.
Amir Goldstein June 8, 2022, 12:28 p.m. UTC | #3
On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > From: Miklos Szeredi <mszeredi@redhat.com>
> >
> > Support file descriptors obtained via O_PATH for extended attribute
> > operations.
> >
> > Extended attributes are for example used by SELinux for the security
> > context of file objects. To avoid time-of-check-time-of-use issues while
> > setting those contexts it is advisable to pin the file in question and
> > operate on a file descriptor instead of the path name. This can be
> > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > which might not be mounted e.g. inside of chroots, see[2].
> >
> > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> >
> > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> >
> > > While this carries a minute risk of someone relying on the property of
> > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > introducing another set of syscalls.
> > >
> > > Only file->f_path and file->f_inode are accessed in these functions.
> > >
> > > Current versions return EBADF, hence easy to detect the presense of
> > > this feature and fall back in case it's missing.
> >
> > CC: linux-api@vger.kernel.org
> > CC: linux-man@vger.kernel.org
> > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > ---
>
> I'd be somewhat fine with getxattr and listxattr but I'm worried that
> setxattr/removexattr waters down O_PATH semantics even more. I don't
> want O_PATH fds to be useable for operations which are semantically
> equivalent to a write.

It is not really semantically equivalent to a write if it works on a
O_RDONLY fd already.

>
> In sensitive environments such as service management/container runtimes
> we often send O_PATH fds around precisely because it is restricted what
> they can be used for. I'd prefer to not to plug at this string.

But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
are almost identical w.r.t permission checks and everything else.

So this change introduces nothing new that a user in said environment
cannot already accomplish with setxattr().

Besides, as the commit message said, doing setxattr() on an O_PATH
fd is already possible with setxattr("/proc/self/$fd"), so whatever security
hole you are trying to prevent is already wide open.

In effect, I think containing setxattr() can only be accomplished with LSM.

Thanks,
Amir.
Christian Brauner June 8, 2022, 12:48 p.m. UTC | #4
On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > From: Miklos Szeredi <mszeredi@redhat.com>
> > >
> > > Support file descriptors obtained via O_PATH for extended attribute
> > > operations.
> > >
> > > Extended attributes are for example used by SELinux for the security
> > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > setting those contexts it is advisable to pin the file in question and
> > > operate on a file descriptor instead of the path name. This can be
> > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > which might not be mounted e.g. inside of chroots, see[2].
> > >
> > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > >
> > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > >
> > > > While this carries a minute risk of someone relying on the property of
> > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > introducing another set of syscalls.
> > > >
> > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > >
> > > > Current versions return EBADF, hence easy to detect the presense of
> > > > this feature and fall back in case it's missing.
> > >
> > > CC: linux-api@vger.kernel.org
> > > CC: linux-man@vger.kernel.org
> > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > ---
> >
> > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > want O_PATH fds to be useable for operations which are semantically
> > equivalent to a write.
> 
> It is not really semantically equivalent to a write if it works on a
> O_RDONLY fd already.

The fact that it works on a O_RDONLY fd has always been weird. And is
probably a bug. If you look at xattr_permission() you can see that it
checks for MAY_WRITE for set operations... setxattr() writes to disk for
real filesystems. I don't know how much closer to a write this can get.

In general, one semantic aberration doesn't justify piling another one
on top.

(And one thing that speaks for O_RDONLY is at least that it actually
opens the file wheres O_PATH doesn't.)

> 
> >
> > In sensitive environments such as service management/container runtimes
> > we often send O_PATH fds around precisely because it is restricted what
> > they can be used for. I'd prefer to not to plug at this string.
> 
> But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> are almost identical w.r.t permission checks and everything else.
> 
> So this change introduces nothing new that a user in said environment
> cannot already accomplish with setxattr().
> 
> Besides, as the commit message said, doing setxattr() on an O_PATH
> fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> hole you are trying to prevent is already wide open.

That is very much a something that we're trying to restrict for this
exact reason and is one of the main motivator for upgrade mask in
openat2(). If I want to send a O_PATH around I want it to not be
upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
and part of the original patchset in [2]. O_PATH semantics don't need to
become weird.

[1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
[2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
Amir Goldstein June 8, 2022, 3:12 p.m. UTC | #5
On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > >
> > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > operations.
> > > >
> > > > Extended attributes are for example used by SELinux for the security
> > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > setting those contexts it is advisable to pin the file in question and
> > > > operate on a file descriptor instead of the path name. This can be
> > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > which might not be mounted e.g. inside of chroots, see[2].
> > > >
> > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > >
> > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > >
> > > > > While this carries a minute risk of someone relying on the property of
> > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > introducing another set of syscalls.
> > > > >
> > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > >
> > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > this feature and fall back in case it's missing.
> > > >
> > > > CC: linux-api@vger.kernel.org
> > > > CC: linux-man@vger.kernel.org
> > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > ---
> > >
> > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > want O_PATH fds to be useable for operations which are semantically
> > > equivalent to a write.
> >
> > It is not really semantically equivalent to a write if it works on a
> > O_RDONLY fd already.
>
> The fact that it works on a O_RDONLY fd has always been weird. And is
> probably a bug. If you look at xattr_permission() you can see that it

Bug or no bug, this is the UAPI. It is not fixable anymore.

> checks for MAY_WRITE for set operations... setxattr() writes to disk for
> real filesystems. I don't know how much closer to a write this can get.
>
> In general, one semantic aberration doesn't justify piling another one
> on top.
>
> (And one thing that speaks for O_RDONLY is at least that it actually
> opens the file wheres O_PATH doesn't.)

Ok. I care mostly about consistent UAPI, so if you want to set the
rule that modify f*() operations are not allowed to use O_PATH fd,
I can live with that, although fcntl(2) may be breaking that rule, but
fine by me.
It's good to have consistent rules and it's good to add a new UAPI for
new behavior.

However...

>
> >
> > >
> > > In sensitive environments such as service management/container runtimes
> > > we often send O_PATH fds around precisely because it is restricted what
> > > they can be used for. I'd prefer to not to plug at this string.
> >
> > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > are almost identical w.r.t permission checks and everything else.
> >
> > So this change introduces nothing new that a user in said environment
> > cannot already accomplish with setxattr().
> >
> > Besides, as the commit message said, doing setxattr() on an O_PATH
> > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > hole you are trying to prevent is already wide open.
>
> That is very much a something that we're trying to restrict for this
> exact reason and is one of the main motivator for upgrade mask in
> openat2(). If I want to send a O_PATH around I want it to not be
> upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> and part of the original patchset in [2]. O_PATH semantics don't need to
> become weird.
>
> [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com

... thinking forward, if this patch is going to be rejected, the patch that
will follow is *xattrat() syscalls.

What will you be able to argue then?

There are several *at() syscalls that modify metadata.
fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.

Do you intend to try and block setxattrat()?
Just try and block setxattrat(.., AT_EMPTY_PATH)?
those *at() syscalls have real use cases to avoid TOCTOU races.
Do you propose that applications will have to use fsetxattr() on an open
file to avert races?

I completely understand the idea behind upgrade masks
for limiting f_mode, but I don't know if trying to retroactively
change semantics of setxattr() in the move to setxattrat()
is going to be a good idea.

And forgive me if I am failing to see the big picture.
It is certainly a possibility.

Thanks,
Amir.
Andreas Dilger June 8, 2022, 4:53 p.m. UTC | #6
On Jun 7, 2022, at 9:31 AM, Christian Göttsche <cgzones@googlemail.com> wrote:
> 
> From: Miklos Szeredi <mszeredi@redhat.com>
> 
> Support file descriptors obtained via O_PATH for extended attribute
> operations.
> 
> Extended attributes are for example used by SELinux for the security
> context of file objects. To avoid time-of-check-time-of-use issues while
> setting those contexts it is advisable to pin the file in question and
> operate on a file descriptor instead of the path name. This can be
> emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> which might not be mounted e.g. inside of chroots, see[2].

Will this allow get/set xattrs directly on symlinks?  That is one problem
that we have with some of the xattrs that are inherited on symlinks, but
there is no way to change them.  Allowing setxattr directly on a symlink
would be very useful.

Cheers, Andreas

> [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> 
> Original patch by Miklos Szeredi <mszeredi@redhat.com>
> https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> 
>> While this carries a minute risk of someone relying on the property of
>> xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
>> introducing another set of syscalls.
>> 
>> Only file->f_path and file->f_inode are accessed in these functions.
>> 
>> Current versions return EBADF, hence easy to detect the presense of
>> this feature and fall back in case it's missing.
> 
> CC: linux-api@vger.kernel.org
> CC: linux-man@vger.kernel.org
> Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> ---
> fs/xattr.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index e8dd03e4561e..16360ac4eb1b 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -656,7 +656,7 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
> SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
> 		const void __user *,value, size_t, size, int, flags)
> {
> -	struct fd f = fdget(fd);
> +	struct fd f = fdget_raw(fd);
> 	int error = -EBADF;
> 
> 	if (!f.file)
> @@ -768,7 +768,7 @@ SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
> SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
> 		void __user *, value, size_t, size)
> {
> -	struct fd f = fdget(fd);
> +	struct fd f = fdget_raw(fd);
> 	ssize_t error = -EBADF;
> 
> 	if (!f.file)
> @@ -844,7 +844,7 @@ SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list,
> 
> SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
> {
> -	struct fd f = fdget(fd);
> +	struct fd f = fdget_raw(fd);
> 	ssize_t error = -EBADF;
> 
> 	if (!f.file)
> @@ -910,7 +910,7 @@ SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname,
> 
> SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
> {
> -	struct fd f = fdget(fd);
> +	struct fd f = fdget_raw(fd);
> 	int error = -EBADF;
> 
> 	if (!f.file)
> --
> 2.36.1
> 


Cheers, Andreas
Amir Goldstein June 9, 2022, 4:35 a.m. UTC | #7
On Wed, Jun 8, 2022 at 9:01 PM Andreas Dilger <adilger@dilger.ca> wrote:
>
> On Jun 7, 2022, at 9:31 AM, Christian Göttsche <cgzones@googlemail.com> wrote:
> >
> > From: Miklos Szeredi <mszeredi@redhat.com>
> >
> > Support file descriptors obtained via O_PATH for extended attribute
> > operations.
> >
> > Extended attributes are for example used by SELinux for the security
> > context of file objects. To avoid time-of-check-time-of-use issues while
> > setting those contexts it is advisable to pin the file in question and
> > operate on a file descriptor instead of the path name. This can be
> > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > which might not be mounted e.g. inside of chroots, see[2].
>
> Will this allow get/set xattrs directly on symlinks?  That is one problem
> that we have with some of the xattrs that are inherited on symlinks, but
> there is no way to change them.  Allowing setxattr directly on a symlink
> would be very useful.

It is possible.
See: https://github.com/libfuse/libfuse/pull/514

That's why Miklos withdrew this patch:
https://lore.kernel.org/linux-fsdevel/CAOssrKeV7g0wPg4ozspG4R7a+5qARqWdG+GxWtXB-MCfbVM=9A@mail.gmail.com/

Thanks,
Amir.
Christian Brauner June 9, 2022, 8:56 a.m. UTC | #8
On Wed, Jun 08, 2022 at 06:12:29PM +0300, Amir Goldstein wrote:
> On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > >
> > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > operations.
> > > > >
> > > > > Extended attributes are for example used by SELinux for the security
> > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > setting those contexts it is advisable to pin the file in question and
> > > > > operate on a file descriptor instead of the path name. This can be
> > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > >
> > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > >
> > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > >
> > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > introducing another set of syscalls.
> > > > > >
> > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > >
> > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > this feature and fall back in case it's missing.
> > > > >
> > > > > CC: linux-api@vger.kernel.org
> > > > > CC: linux-man@vger.kernel.org
> > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > ---
> > > >
> > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > want O_PATH fds to be useable for operations which are semantically
> > > > equivalent to a write.
> > >
> > > It is not really semantically equivalent to a write if it works on a
> > > O_RDONLY fd already.
> >
> > The fact that it works on a O_RDONLY fd has always been weird. And is
> > probably a bug. If you look at xattr_permission() you can see that it
> 
> Bug or no bug, this is the UAPI. It is not fixable anymore.
> 
> > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > real filesystems. I don't know how much closer to a write this can get.
> >
> > In general, one semantic aberration doesn't justify piling another one
> > on top.
> >
> > (And one thing that speaks for O_RDONLY is at least that it actually
> > opens the file wheres O_PATH doesn't.)
> 
> Ok. I care mostly about consistent UAPI, so if you want to set the
> rule that modify f*() operations are not allowed to use O_PATH fd,
> I can live with that, although fcntl(2) may be breaking that rule, but
> fine by me.
> It's good to have consistent rules and it's good to add a new UAPI for
> new behavior.
> 
> However...
> 
> >
> > >
> > > >
> > > > In sensitive environments such as service management/container runtimes
> > > > we often send O_PATH fds around precisely because it is restricted what
> > > > they can be used for. I'd prefer to not to plug at this string.
> > >
> > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > are almost identical w.r.t permission checks and everything else.
> > >
> > > So this change introduces nothing new that a user in said environment
> > > cannot already accomplish with setxattr().
> > >
> > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > hole you are trying to prevent is already wide open.
> >
> > That is very much a something that we're trying to restrict for this
> > exact reason and is one of the main motivator for upgrade mask in
> > openat2(). If I want to send a O_PATH around I want it to not be
> > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > and part of the original patchset in [2]. O_PATH semantics don't need to
> > become weird.
> >
> > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> 
> ... thinking forward, if this patch is going to be rejected, the patch that
> will follow is *xattrat() syscalls.
> 
> What will you be able to argue then?
> 
> There are several *at() syscalls that modify metadata.
> fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> 
> Do you intend to try and block setxattrat()?
> Just try and block setxattrat(.., AT_EMPTY_PATH)?
> those *at() syscalls have real use cases to avoid TOCTOU races.
> Do you propose that applications will have to use fsetxattr() on an open
> file to avert races?
> 
> I completely understand the idea behind upgrade masks
> for limiting f_mode, but I don't know if trying to retroactively
> change semantics of setxattr() in the move to setxattrat()
> is going to be a good idea.
> 
> And forgive me if I am failing to see the big picture.
> It is certainly a possibility.

The big picture is that we should not water down O_PATH to the point
where it can be used for almost any operation.

Just looking at your point about fchown*(), the open(2) manpage
still notes:

    "Obtain a file descriptor that can be used for two purposes: to indicate
    a location in the filesystem tree and to perform operations that act
    purely at  the  file  descriptor level.  The file itself is not opened,
    and other file operations (e.g., read(2), write(2), fchmod(2),
    fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF."

so it explicitly mentions that *fch{own,mod}*() operations
__and fgetxattr()__
should fail and in fact that any non-fd level operations should fail...

Yet we are watering that contract down to the point where the
distinction between an O_PATH fd and a regular fd becomes more and more
meaningless

That's a point completely separate from upgrade masks; which are there
to make O_PATH fds more meaningful.
Christian Göttsche June 9, 2022, 9:14 a.m. UTC | #9
On Thu, 9 Jun 2022 at 06:36, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Jun 8, 2022 at 9:01 PM Andreas Dilger <adilger@dilger.ca> wrote:
> >
> > On Jun 7, 2022, at 9:31 AM, Christian Göttsche <cgzones@googlemail.com> wrote:
> > >
> > > From: Miklos Szeredi <mszeredi@redhat.com>
> > >
> > > Support file descriptors obtained via O_PATH for extended attribute
> > > operations.
> > >
> > > Extended attributes are for example used by SELinux for the security
> > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > setting those contexts it is advisable to pin the file in question and
> > > operate on a file descriptor instead of the path name. This can be
> > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > which might not be mounted e.g. inside of chroots, see[2].
> >
> > Will this allow get/set xattrs directly on symlinks?  That is one problem
> > that we have with some of the xattrs that are inherited on symlinks, but
> > there is no way to change them.  Allowing setxattr directly on a symlink
> > would be very useful.
>
> It is possible.
> See: https://github.com/libfuse/libfuse/pull/514
>

Does it really? (It should not since xtattr(7) mentions some quota
related issues.)
In my tests setting extended attributes via O_PATH on symlinks fails
with ENOTSUP (even as root), except for special ones, like
"security.selinux".

> That's why Miklos withdrew this patch:
> https://lore.kernel.org/linux-fsdevel/CAOssrKeV7g0wPg4ozspG4R7a+5qARqWdG+GxWtXB-MCfbVM=9A@mail.gmail.com/
>
> Thanks,
> Amir.
Aleksa Sarai June 18, 2022, 3:18 a.m. UTC | #10
On 2022-06-08, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > >
> > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > operations.
> > > > >
> > > > > Extended attributes are for example used by SELinux for the security
> > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > setting those contexts it is advisable to pin the file in question and
> > > > > operate on a file descriptor instead of the path name. This can be
> > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > >
> > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > >
> > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > >
> > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > introducing another set of syscalls.
> > > > > >
> > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > >
> > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > this feature and fall back in case it's missing.
> > > > >
> > > > > CC: linux-api@vger.kernel.org
> > > > > CC: linux-man@vger.kernel.org
> > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > ---
> > > >
> > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > want O_PATH fds to be useable for operations which are semantically
> > > > equivalent to a write.
> > >
> > > It is not really semantically equivalent to a write if it works on a
> > > O_RDONLY fd already.
> >
> > The fact that it works on a O_RDONLY fd has always been weird. And is
> > probably a bug. If you look at xattr_permission() you can see that it
> 
> Bug or no bug, this is the UAPI. It is not fixable anymore.
> 
> > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > real filesystems. I don't know how much closer to a write this can get.
> >
> > In general, one semantic aberration doesn't justify piling another one
> > on top.
> >
> > (And one thing that speaks for O_RDONLY is at least that it actually
> > opens the file wheres O_PATH doesn't.)
> 
> Ok. I care mostly about consistent UAPI, so if you want to set the
> rule that modify f*() operations are not allowed to use O_PATH fd,
> I can live with that, although fcntl(2) may be breaking that rule, but
> fine by me.
> It's good to have consistent rules and it's good to add a new UAPI for
> new behavior.
> 
> However...
> 
> >
> > >
> > > >
> > > > In sensitive environments such as service management/container runtimes
> > > > we often send O_PATH fds around precisely because it is restricted what
> > > > they can be used for. I'd prefer to not to plug at this string.
> > >
> > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > are almost identical w.r.t permission checks and everything else.
> > >
> > > So this change introduces nothing new that a user in said environment
> > > cannot already accomplish with setxattr().
> > >
> > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > hole you are trying to prevent is already wide open.
> >
> > That is very much a something that we're trying to restrict for this
> > exact reason and is one of the main motivator for upgrade mask in
> > openat2(). If I want to send a O_PATH around I want it to not be
> > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > and part of the original patchset in [2]. O_PATH semantics don't need to
> > become weird.
> >
> > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> 
> ... thinking forward, if this patch is going to be rejected, the patch that
> will follow is *xattrat() syscalls.
> 
> What will you be able to argue then?
> 
> There are several *at() syscalls that modify metadata.
> fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> 
> Do you intend to try and block setxattrat()?
> Just try and block setxattrat(.., AT_EMPTY_PATH)?
> those *at() syscalls have real use cases to avoid TOCTOU races.
> Do you propose that applications will have to use fsetxattr() on an open
> file to avert races?
> 
> I completely understand the idea behind upgrade masks
> for limiting f_mode, but I don't know if trying to retroactively
> change semantics of setxattr() in the move to setxattrat()
> is going to be a good idea.

The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
foo(/proc/self/fd/<fd>) should always be identical, and the current
semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
assume that keeping them makes sense (the most obvious example is being
able to do tricks to open /proc/$pid/exe as O_RDWR).

I suspect that the long-term solution would be to have more upgrade
masks so that userspace can opt-in to not allowing any kind of
(metadata) write access through a particular file descriptor. You're
quite right that we have several metadata write AT_EMPTY_PATH APIs, and
so we can't retroactively block /everything/ but we should try to come
up with less leaky rules by default if it won't break userspace.
Amir Goldstein June 18, 2022, 9:11 a.m. UTC | #11
On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> On 2022-06-08, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > >
> > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > operations.
> > > > > >
> > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > >
> > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > >
> > > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > > >
> > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > introducing another set of syscalls.
> > > > > > >
> > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > >
> > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > this feature and fall back in case it's missing.
> > > > > >
> > > > > > CC: linux-api@vger.kernel.org
> > > > > > CC: linux-man@vger.kernel.org
> > > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > > ---
> > > > >
> > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > equivalent to a write.
> > > >
> > > > It is not really semantically equivalent to a write if it works on a
> > > > O_RDONLY fd already.
> > >
> > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > probably a bug. If you look at xattr_permission() you can see that it
> >
> > Bug or no bug, this is the UAPI. It is not fixable anymore.
> >
> > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > real filesystems. I don't know how much closer to a write this can get.
> > >
> > > In general, one semantic aberration doesn't justify piling another one
> > > on top.
> > >
> > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > opens the file wheres O_PATH doesn't.)
> >
> > Ok. I care mostly about consistent UAPI, so if you want to set the
> > rule that modify f*() operations are not allowed to use O_PATH fd,
> > I can live with that, although fcntl(2) may be breaking that rule, but
> > fine by me.
> > It's good to have consistent rules and it's good to add a new UAPI for
> > new behavior.
> >
> > However...
> >
> > >
> > > >
> > > > >
> > > > > In sensitive environments such as service management/container runtimes
> > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > they can be used for. I'd prefer to not to plug at this string.
> > > >
> > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > are almost identical w.r.t permission checks and everything else.
> > > >
> > > > So this change introduces nothing new that a user in said environment
> > > > cannot already accomplish with setxattr().
> > > >
> > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > hole you are trying to prevent is already wide open.
> > >
> > > That is very much a something that we're trying to restrict for this
> > > exact reason and is one of the main motivator for upgrade mask in
> > > openat2(). If I want to send a O_PATH around I want it to not be
> > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > become weird.
> > >
> > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> >
> > ... thinking forward, if this patch is going to be rejected, the patch that
> > will follow is *xattrat() syscalls.
> >
> > What will you be able to argue then?
> >
> > There are several *at() syscalls that modify metadata.
> > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> >
> > Do you intend to try and block setxattrat()?
> > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > those *at() syscalls have real use cases to avoid TOCTOU races.
> > Do you propose that applications will have to use fsetxattr() on an open
> > file to avert races?
> >
> > I completely understand the idea behind upgrade masks
> > for limiting f_mode, but I don't know if trying to retroactively
> > change semantics of setxattr() in the move to setxattrat()
> > is going to be a good idea.
>
> The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> foo(/proc/self/fd/<fd>) should always be identical, and the current
> semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> assume that keeping them makes sense (the most obvious example is being
> able to do tricks to open /proc/$pid/exe as O_RDWR).

Please make a note that I have applications relying on current magic symlink
semantics w.r.t setxattr() and other metadata operations, and the libselinux
commit linked from the patch commit message proves that magic symlink
semantics are used in the wild, so it is not likely that those semantics could
be changed, unless userspace breakage could be justified by fixing a serious
security issue (i.e. open /proc/$pid/exe as O_RDWR).

>
> I suspect that the long-term solution would be to have more upgrade
> masks so that userspace can opt-in to not allowing any kind of
> (metadata) write access through a particular file descriptor. You're
> quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> so we can't retroactively block /everything/ but we should try to come
> up with less leaky rules by default if it won't break userspace.
>

Ok, let me try to say this in my own words using an example to see that
we are all on the same page:

- lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
- fsetxattr(fd,...) is not applicable for symbolic links
- setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
  when setting xattr on symbolic links
- setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
  "new API" for setting xattr on symlinks (and special files)
- The new API is going to be more strict than the old magic symlink API
- *If* it turns out to not break user applications, old API can also become
  more strict to align with new API (unlikely the case for setxattr())
- This will allow sandboxed containers to opt-out of the "old API", by
  restricting access to /proc/self/fd and to implement more fine grained
  control over which metadata operations are allowed on an O_PATH fd

Did I understand the plan correctly?
Do you agree with me that the plan to keep AT_EMPTY_PATH and
magic symlink semantics may not be realistic?

Thanks,
Amir.
Christian Göttsche June 18, 2022, 11:19 a.m. UTC | #12
On Sat, 18 Jun 2022 at 11:11, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > On 2022-06-08, Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > >
> > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > operations.
> > > > > > >
> > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > >
> > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > >
> > > > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > > > >
> > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > introducing another set of syscalls.
> > > > > > > >
> > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > >
> > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > this feature and fall back in case it's missing.
> > > > > > >
> > > > > > > CC: linux-api@vger.kernel.org
> > > > > > > CC: linux-man@vger.kernel.org
> > > > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > > > ---
> > > > > >
> > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > equivalent to a write.
> > > > >
> > > > > It is not really semantically equivalent to a write if it works on a
> > > > > O_RDONLY fd already.
> > > >
> > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > probably a bug. If you look at xattr_permission() you can see that it
> > >
> > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > >
> > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > real filesystems. I don't know how much closer to a write this can get.
> > > >
> > > > In general, one semantic aberration doesn't justify piling another one
> > > > on top.
> > > >
> > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > opens the file wheres O_PATH doesn't.)
> > >
> > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > fine by me.
> > > It's good to have consistent rules and it's good to add a new UAPI for
> > > new behavior.
> > >
> > > However...
> > >
> > > >
> > > > >
> > > > > >
> > > > > > In sensitive environments such as service management/container runtimes
> > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > >
> > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > are almost identical w.r.t permission checks and everything else.
> > > > >
> > > > > So this change introduces nothing new that a user in said environment
> > > > > cannot already accomplish with setxattr().
> > > > >
> > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > hole you are trying to prevent is already wide open.
> > > >
> > > > That is very much a something that we're trying to restrict for this
> > > > exact reason and is one of the main motivator for upgrade mask in
> > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > become weird.
> > > >
> > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> > >
> > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > will follow is *xattrat() syscalls.
> > >
> > > What will you be able to argue then?
> > >
> > > There are several *at() syscalls that modify metadata.
> > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > >
> > > Do you intend to try and block setxattrat()?
> > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > Do you propose that applications will have to use fsetxattr() on an open
> > > file to avert races?
> > >
> > > I completely understand the idea behind upgrade masks
> > > for limiting f_mode, but I don't know if trying to retroactively
> > > change semantics of setxattr() in the move to setxattrat()
> > > is going to be a good idea.
> >
> > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > assume that keeping them makes sense (the most obvious example is being
> > able to do tricks to open /proc/$pid/exe as O_RDWR).
>
> Please make a note that I have applications relying on current magic symlink
> semantics w.r.t setxattr() and other metadata operations, and the libselinux
> commit linked from the patch commit message proves that magic symlink
> semantics are used in the wild, so it is not likely that those semantics could
> be changed, unless userspace breakage could be justified by fixing a serious
> security issue (i.e. open /proc/$pid/exe as O_RDWR).
>
> >
> > I suspect that the long-term solution would be to have more upgrade
> > masks so that userspace can opt-in to not allowing any kind of
> > (metadata) write access through a particular file descriptor. You're
> > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > so we can't retroactively block /everything/ but we should try to come
> > up with less leaky rules by default if it won't break userspace.
> >
>
> Ok, let me try to say this in my own words using an example to see that
> we are all on the same page:
>
> - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> - fsetxattr(fd,...) is not applicable for symbolic links

fsetxattr(2) works on symbolic links, e.g. for "security.selinux",
except for the user namespace:

https://github.com/torvalds/linux/blob/4b35035bcf80ddb47c0112c4fbd84a63a2836a18/fs/xattr.c#L124-L136
/*
* In the user.* namespace, only regular files and directories can have
* extended attributes. For sticky directories, only the owner and
* privileged users can write attributes.
*/
if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
    if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
        return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
    if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
       (mask & MAY_WRITE) &&
        !inode_owner_or_capable(mnt_userns, inode))
        return -EPERM;
}

Currently it just does not support O_PATH file descriptors.
And with O_RDONLY setting extended attributes is supported as well
(fsetxattr(2) does not require O_RDWR or O_WRONLY).

> - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
>   when setting xattr on symbolic links
> - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
>   "new API" for setting xattr on symlinks (and special files)
> - The new API is going to be more strict than the old magic symlink API
> - *If* it turns out to not break user applications, old API can also become
>   more strict to align with new API (unlikely the case for setxattr())
> - This will allow sandboxed containers to opt-out of the "old API", by
>   restricting access to /proc/self/fd and to implement more fine grained
>   control over which metadata operations are allowed on an O_PATH fd
>
> Did I understand the plan correctly?
> Do you agree with me that the plan to keep AT_EMPTY_PATH and
> magic symlink semantics may not be realistic?
>
> Thanks,
> Amir.
Amir Goldstein June 18, 2022, 3:30 p.m. UTC | #13
On Sat, Jun 18, 2022 at 2:19 PM Christian Göttsche
<cgzones@googlemail.com> wrote:
>
> On Sat, 18 Jun 2022 at 11:11, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
> > >
> > > On 2022-06-08, Amir Goldstein <amir73il@gmail.com> wrote:
> > > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > > >
> > > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > > operations.
> > > > > > > >
> > > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > > >
> > > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > > >
> > > > > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > > > > >
> > > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > > introducing another set of syscalls.
> > > > > > > > >
> > > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > > >
> > > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > > this feature and fall back in case it's missing.
> > > > > > > >
> > > > > > > > CC: linux-api@vger.kernel.org
> > > > > > > > CC: linux-man@vger.kernel.org
> > > > > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > > > > ---
> > > > > > >
> > > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > > equivalent to a write.
> > > > > >
> > > > > > It is not really semantically equivalent to a write if it works on a
> > > > > > O_RDONLY fd already.
> > > > >
> > > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > > probably a bug. If you look at xattr_permission() you can see that it
> > > >
> > > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > > >
> > > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > > real filesystems. I don't know how much closer to a write this can get.
> > > > >
> > > > > In general, one semantic aberration doesn't justify piling another one
> > > > > on top.
> > > > >
> > > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > > opens the file wheres O_PATH doesn't.)
> > > >
> > > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > > fine by me.
> > > > It's good to have consistent rules and it's good to add a new UAPI for
> > > > new behavior.
> > > >
> > > > However...
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > In sensitive environments such as service management/container runtimes
> > > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > > >
> > > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > > are almost identical w.r.t permission checks and everything else.
> > > > > >
> > > > > > So this change introduces nothing new that a user in said environment
> > > > > > cannot already accomplish with setxattr().
> > > > > >
> > > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > > hole you are trying to prevent is already wide open.
> > > > >
> > > > > That is very much a something that we're trying to restrict for this
> > > > > exact reason and is one of the main motivator for upgrade mask in
> > > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > > become weird.
> > > > >
> > > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> > > >
> > > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > > will follow is *xattrat() syscalls.
> > > >
> > > > What will you be able to argue then?
> > > >
> > > > There are several *at() syscalls that modify metadata.
> > > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > > >
> > > > Do you intend to try and block setxattrat()?
> > > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > > Do you propose that applications will have to use fsetxattr() on an open
> > > > file to avert races?
> > > >
> > > > I completely understand the idea behind upgrade masks
> > > > for limiting f_mode, but I don't know if trying to retroactively
> > > > change semantics of setxattr() in the move to setxattrat()
> > > > is going to be a good idea.
> > >
> > > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > > assume that keeping them makes sense (the most obvious example is being
> > > able to do tricks to open /proc/$pid/exe as O_RDWR).
> >
> > Please make a note that I have applications relying on current magic symlink
> > semantics w.r.t setxattr() and other metadata operations, and the libselinux
> > commit linked from the patch commit message proves that magic symlink
> > semantics are used in the wild, so it is not likely that those semantics could
> > be changed, unless userspace breakage could be justified by fixing a serious
> > security issue (i.e. open /proc/$pid/exe as O_RDWR).
> >
> > >
> > > I suspect that the long-term solution would be to have more upgrade
> > > masks so that userspace can opt-in to not allowing any kind of
> > > (metadata) write access through a particular file descriptor. You're
> > > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > > so we can't retroactively block /everything/ but we should try to come
> > > up with less leaky rules by default if it won't break userspace.
> > >
> >
> > Ok, let me try to say this in my own words using an example to see that
> > we are all on the same page:
> >
> > - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> > - fsetxattr(fd,...) is not applicable for symbolic links
>
> fsetxattr(2) works on symbolic links, e.g. for "security.selinux",
> except for the user namespace:
>
> https://github.com/torvalds/linux/blob/4b35035bcf80ddb47c0112c4fbd84a63a2836a18/fs/xattr.c#L124-L136
> /*
> * In the user.* namespace, only regular files and directories can have
> * extended attributes. For sticky directories, only the owner and
> * privileged users can write attributes.
> */
> if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
>     if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>         return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
>     if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
>        (mask & MAY_WRITE) &&
>         !inode_owner_or_capable(mnt_userns, inode))
>         return -EPERM;
> }
>
> Currently it just does not support O_PATH file descriptors.
> And with O_RDONLY setting extended attributes is supported as well
> (fsetxattr(2) does not require O_RDWR or O_WRONLY).

But it is not possible to get a O_RDONLY fd for a symlink object, is it?
That's why I wrote "fsetxattr() is not applicable for symbolic links".
And then libselinux is left to choose between one API that is racy (lsetxattr)
and another API that does not work in containers (magic symlink).
They need to be give a proper API.

>
> > - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
> >   when setting xattr on symbolic links
> > - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
> >   "new API" for setting xattr on symlinks (and special files)
> > - The new API is going to be more strict than the old magic symlink API
> > - *If* it turns out to not break user applications, old API can also become
> >   more strict to align with new API (unlikely the case for setxattr())
> > - This will allow sandboxed containers to opt-out of the "old API", by
> >   restricting access to /proc/self/fd and to implement more fine grained
> >   control over which metadata operations are allowed on an O_PATH fd
> >
> > Did I understand the plan correctly?
> > Do you agree with me that the plan to keep AT_EMPTY_PATH and
> > magic symlink semantics may not be realistic?

Sorry, this gave out messy.
This was supposed to ask whether this part of the plan:
"semantics of fooat(<fd>, AT_EMPTY_PATH) and foo(/proc/self/fd/<fd>)
should always be identical" is realistic, given that applications
already depend on existing setxattr(MAGIC_SYMLINK) semantics.

IMO, it should be fine to keep the same semantic and allow
setxattrat() on O_PATH fd (subject to xattr_permission() of course),
as long as open masks could be added to further restrict the O_PATH
fd from being used for setxattr() in the future.

IOW, we don't have to think of O_PATH as the "least capable" fd mode.
It is just a mode that that is not capable for data manipulations, but
there could be even less capable fd modes that are also not capable
for metadata manipulations such as setxattrat() or chownat().

Does that make sense?

Thanks,
Amir.
Aleksa Sarai June 20, 2022, 6:07 a.m. UTC | #14
On 2022-06-18, Amir Goldstein <amir73il@gmail.com> wrote:
> On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > On 2022-06-08, Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > From: Miklos Szeredi <mszeredi@redhat.com>
> > > > > > >
> > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > operations.
> > > > > > >
> > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > >
> > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > >
> > > > > > > Original patch by Miklos Szeredi <mszeredi@redhat.com>
> > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@redhat.com/
> > > > > > >
> > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > introducing another set of syscalls.
> > > > > > > >
> > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > >
> > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > this feature and fall back in case it's missing.
> > > > > > >
> > > > > > > CC: linux-api@vger.kernel.org
> > > > > > > CC: linux-man@vger.kernel.org
> > > > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> > > > > > > ---
> > > > > >
> > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > equivalent to a write.
> > > > >
> > > > > It is not really semantically equivalent to a write if it works on a
> > > > > O_RDONLY fd already.
> > > >
> > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > probably a bug. If you look at xattr_permission() you can see that it
> > >
> > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > >
> > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > real filesystems. I don't know how much closer to a write this can get.
> > > >
> > > > In general, one semantic aberration doesn't justify piling another one
> > > > on top.
> > > >
> > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > opens the file wheres O_PATH doesn't.)
> > >
> > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > fine by me.
> > > It's good to have consistent rules and it's good to add a new UAPI for
> > > new behavior.
> > >
> > > However...
> > >
> > > >
> > > > >
> > > > > >
> > > > > > In sensitive environments such as service management/container runtimes
> > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > >
> > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > are almost identical w.r.t permission checks and everything else.
> > > > >
> > > > > So this change introduces nothing new that a user in said environment
> > > > > cannot already accomplish with setxattr().
> > > > >
> > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > hole you are trying to prevent is already wide open.
> > > >
> > > > That is very much a something that we're trying to restrict for this
> > > > exact reason and is one of the main motivator for upgrade mask in
> > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > become weird.
> > > >
> > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@cyphar.com
> > >
> > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > will follow is *xattrat() syscalls.
> > >
> > > What will you be able to argue then?
> > >
> > > There are several *at() syscalls that modify metadata.
> > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > >
> > > Do you intend to try and block setxattrat()?
> > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > Do you propose that applications will have to use fsetxattr() on an open
> > > file to avert races?
> > >
> > > I completely understand the idea behind upgrade masks
> > > for limiting f_mode, but I don't know if trying to retroactively
> > > change semantics of setxattr() in the move to setxattrat()
> > > is going to be a good idea.
> >
> > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > assume that keeping them makes sense (the most obvious example is being
> > able to do tricks to open /proc/$pid/exe as O_RDWR).
> 
> Please make a note that I have applications relying on current magic symlink
> semantics w.r.t setxattr() and other metadata operations, and the libselinux
> commit linked from the patch commit message proves that magic symlink
> semantics are used in the wild, so it is not likely that those semantics could
> be changed, unless userspace breakage could be justified by fixing a serious
> security issue (i.e. open /proc/$pid/exe as O_RDWR).

Agreed. We also use magiclinks for similar TOCTOU-protection purposes in
runc (as does lxc) as well as in libpathrs so I'm aware we need to be
careful about changing existing behaviours. I would prefer to have the
default be as restrictive as possible, but naturally back-compat is
more important.

> > I suspect that the long-term solution would be to have more upgrade
> > masks so that userspace can opt-in to not allowing any kind of
> > (metadata) write access through a particular file descriptor. You're
> > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > so we can't retroactively block /everything/ but we should try to come
> > up with less leaky rules by default if it won't break userspace.
> 
> Ok, let me try to say this in my own words using an example to see that
> we are all on the same page:
> 
> - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> - fsetxattr(fd,...) is not applicable for symbolic links

While I agree with Christian's concerns about making O_PATH descriptors
more leaky, if userspace already relies on this through /proc/self/fd/$x
then there's not much we can do about it other than having an opt-out
available in openat2(2). Having the option to disable this stuff to
avoid making O_PATH descriptors less safe as a mechanism for passing
around "capability-less" file handles should make most people happy
(with the note that ideally we would not be *adding* capabilities to
O_PATH we don't need to).

> - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
>   when setting xattr on symbolic links
> - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
>   "new API" for setting xattr on symlinks (and special files)

If this is a usecase we need to support then we may as well just re-use
fsetxattr() since it's basically an *at(2) syscall already (and I don't
see why we'd want to split up the capabilities between two similar
*at(2)-like syscalls). Though this does come with the above caveats that
we need to have the opt-outs available if we're going to enshrine this
as intentional part of the ABI.

> - The new API is going to be more strict than the old magic symlink API
> - *If* it turns out to not break user applications, old API can also become
>   more strict to align with new API (unlikely the case for setxattr())
> - This will allow sandboxed containers to opt-out of the "old API", by
>   restricting access to /proc/self/fd and to implement more fine grained
>   control over which metadata operations are allowed on an O_PATH fd
> 
> Did I understand the plan correctly?

Yup, except I don't think we need setxattrat(2).

> Do you agree with me that the plan to keep AT_EMPTY_PATH and magic
> symlink semantics may not be realistic?

To clarify -- my view is that if any current /proc/self/fd/$n semantic
needs to be maintained then I would prefer that the proc-less method of
doing it (such as through AT_EMPTY_PATH et al) would have the same
capability and semantics. There are some cases where the current
/proc/self/fd/$n semantics need to be fixed (such as the /proc/$pid/exe
example) and in that case the proc-less semantics also need to be made
safe.

While I would like us to restrict O_PATH as much as possible, if
userspace already depends on certain behaviour then we may not be able
to do much about it. Having an opt-out would be very important since
enshrining these leaky behaviours (which seem to have been overlooked)
means we need to consider how userspace can opt out of them.

Unfortunately, it should be noted that due to the "magical" nature of
nd_jump_link(), I'm not sure how happy Al Viro will be with the kinds of
restrictions necessary. Even my current (quite limited) upgrade-mask
patchset has to do a fair bit of work to unify the semantics of
magic-links and openat(O_EMPTYPATH) -- expanding this to all *at(2)
syscalls might be quite painful. (There are also several handfuls of
semantic questions which need to be answered about magic-link modes and
whether for other *at(2) operations we may need even more complicated
rules or even a re-thinking of my current approach.)
Amir Goldstein June 20, 2022, 7:45 a.m. UTC | #15
> > > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > > assume that keeping them makes sense (the most obvious example is being
> > > able to do tricks to open /proc/$pid/exe as O_RDWR).
> >
> > Please make a note that I have applications relying on current magic symlink
> > semantics w.r.t setxattr() and other metadata operations, and the libselinux
> > commit linked from the patch commit message proves that magic symlink
> > semantics are used in the wild, so it is not likely that those semantics could
> > be changed, unless userspace breakage could be justified by fixing a serious
> > security issue (i.e. open /proc/$pid/exe as O_RDWR).
>
> Agreed. We also use magiclinks for similar TOCTOU-protection purposes in
> runc (as does lxc) as well as in libpathrs so I'm aware we need to be
> careful about changing existing behaviours. I would prefer to have the
> default be as restrictive as possible, but naturally back-compat is
> more important.
>
> > > I suspect that the long-term solution would be to have more upgrade
> > > masks so that userspace can opt-in to not allowing any kind of
> > > (metadata) write access through a particular file descriptor. You're
> > > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > > so we can't retroactively block /everything/ but we should try to come
> > > up with less leaky rules by default if it won't break userspace.
> >
> > Ok, let me try to say this in my own words using an example to see that
> > we are all on the same page:
> >
> > - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> > - fsetxattr(fd,...) is not applicable for symbolic links
>
> While I agree with Christian's concerns about making O_PATH descriptors
> more leaky, if userspace already relies on this through /proc/self/fd/$x
> then there's not much we can do about it other than having an opt-out
> available in openat2(2). Having the option to disable this stuff to
> avoid making O_PATH descriptors less safe as a mechanism for passing
> around "capability-less" file handles should make most people happy
> (with the note that ideally we would not be *adding* capabilities to
> O_PATH we don't need to).
>
> > - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
> >   when setting xattr on symbolic links
> > - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
> >   "new API" for setting xattr on symlinks (and special files)
>
> If this is a usecase we need to support then we may as well just re-use
> fsetxattr() since it's basically an *at(2) syscall already (and I don't
> see why we'd want to split up the capabilities between two similar
> *at(2)-like syscalls). Though this does come with the above caveats that
> we need to have the opt-outs available if we're going to enshrine this
> as intentional part of the ABI.


Christian preferred that new functionality be added with a new API
and I agree that this is nicer and more explicit.

The bigger question IMO is, whether fsomething() should stay identical
to somethingat(,,,AT_EMPTY_PATH). I don't think that it should.

To me, open(path,O_PATH)+somethingat(,,,AT_EMPTY_PATH) is identical
to something(path) - it just breaks the path resolution and operation to two
distinguished steps.

fsomething() was traditionally used for "really" open fds, so if we don't need
to, we better not relax it further by allowing O_PATH, but that's just one
opinion.

>
> > - The new API is going to be more strict than the old magic symlink API
> > - *If* it turns out to not break user applications, old API can also become
> >   more strict to align with new API (unlikely the case for setxattr())
> > - This will allow sandboxed containers to opt-out of the "old API", by
> >   restricting access to /proc/self/fd and to implement more fine grained
> >   control over which metadata operations are allowed on an O_PATH fd
> >
> > Did I understand the plan correctly?
>
> Yup, except I don't think we need setxattrat(2).
>
> > Do you agree with me that the plan to keep AT_EMPTY_PATH and magic
> > symlink semantics may not be realistic?
>
> To clarify -- my view is that if any current /proc/self/fd/$n semantic
> needs to be maintained then I would prefer that the proc-less method of
> doing it (such as through AT_EMPTY_PATH et al) would have the same
> capability and semantics. There are some cases where the current
> /proc/self/fd/$n semantics need to be fixed (such as the /proc/$pid/exe
> example) and in that case the proc-less semantics also need to be made
> safe.
>
> While I would like us to restrict O_PATH as much as possible, if
> userspace already depends on certain behaviour then we may not be able
> to do much about it. Having an opt-out would be very important since
> enshrining these leaky behaviours (which seem to have been overlooked)
> means we need to consider how userspace can opt out of them.
>
> Unfortunately, it should be noted that due to the "magical" nature of
> nd_jump_link(), I'm not sure how happy Al Viro will be with the kinds of
> restrictions necessary. Even my current (quite limited) upgrade-mask
> patchset has to do a fair bit of work to unify the semantics of
> magic-links and openat(O_EMPTYPATH) -- expanding this to all *at(2)
> syscalls might be quite painful. (There are also several handfuls of
> semantic questions which need to be answered about magic-link modes and
> whether for other *at(2) operations we may need even more complicated
> rules or even a re-thinking of my current approach.)
>

The question remains, regarding the $SUBJECT patch,
is it fair to block it and deprive libselinux of a non buggy API
until such time that all the details around masking O_PATH fds
will be made clear and the new API implemented?

There is no guarantee this will ever happen, so it does not seem
reasonable to me.

To be a reasonable reaction to the currently broken API is
to either accept the patch as is or request that setxattrat()
will be added to provide the new functionality.

Thanks,
Amir.
Aleksa Sarai June 22, 2022, 2:57 a.m. UTC | #16
On 2022-06-20, Amir Goldstein <amir73il@gmail.com> wrote:
> > > > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > > > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > > > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > > > assume that keeping them makes sense (the most obvious example is being
> > > > able to do tricks to open /proc/$pid/exe as O_RDWR).
> > >
> > > Please make a note that I have applications relying on current magic symlink
> > > semantics w.r.t setxattr() and other metadata operations, and the libselinux
> > > commit linked from the patch commit message proves that magic symlink
> > > semantics are used in the wild, so it is not likely that those semantics could
> > > be changed, unless userspace breakage could be justified by fixing a serious
> > > security issue (i.e. open /proc/$pid/exe as O_RDWR).
> >
> > Agreed. We also use magiclinks for similar TOCTOU-protection purposes in
> > runc (as does lxc) as well as in libpathrs so I'm aware we need to be
> > careful about changing existing behaviours. I would prefer to have the
> > default be as restrictive as possible, but naturally back-compat is
> > more important.
> >
> > > > I suspect that the long-term solution would be to have more upgrade
> > > > masks so that userspace can opt-in to not allowing any kind of
> > > > (metadata) write access through a particular file descriptor. You're
> > > > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > > > so we can't retroactively block /everything/ but we should try to come
> > > > up with less leaky rules by default if it won't break userspace.
> > >
> > > Ok, let me try to say this in my own words using an example to see that
> > > we are all on the same page:
> > >
> > > - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> > > - fsetxattr(fd,...) is not applicable for symbolic links
> >
> > While I agree with Christian's concerns about making O_PATH descriptors
> > more leaky, if userspace already relies on this through /proc/self/fd/$x
> > then there's not much we can do about it other than having an opt-out
> > available in openat2(2). Having the option to disable this stuff to
> > avoid making O_PATH descriptors less safe as a mechanism for passing
> > around "capability-less" file handles should make most people happy
> > (with the note that ideally we would not be *adding* capabilities to
> > O_PATH we don't need to).
> >
> > > - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
> > >   when setting xattr on symbolic links
> > > - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
> > >   "new API" for setting xattr on symlinks (and special files)
> >
> > If this is a usecase we need to support then we may as well just re-use
> > fsetxattr() since it's basically an *at(2) syscall already (and I don't
> > see why we'd want to split up the capabilities between two similar
> > *at(2)-like syscalls). Though this does come with the above caveats that
> > we need to have the opt-outs available if we're going to enshrine this
> > as intentional part of the ABI.
> 
> 
> Christian preferred that new functionality be added with a new API
> and I agree that this is nicer and more explicit.

Fair enough -- I misread the man page, setxattrat(2) makes more sense.

> The bigger question IMO is, whether fsomething() should stay identical
> to somethingat(,,,AT_EMPTY_PATH). I don't think that it should.
> 
> To me, open(path,O_PATH)+somethingat(,,,AT_EMPTY_PATH) is identical
> to something(path) - it just breaks the path resolution and operation to two
> distinguished steps.
> 
> fsomething() was traditionally used for "really" open fds, so if we don't need
> to, we better not relax it further by allowing O_PATH, but that's just one
> opinion.

Yeah, you're right -- it would be better to not muddle the two (even
though they are conceptually very similar).

> > > - The new API is going to be more strict than the old magic symlink API
> > > - *If* it turns out to not break user applications, old API can also become
> > >   more strict to align with new API (unlikely the case for setxattr())
> > > - This will allow sandboxed containers to opt-out of the "old API", by
> > >   restricting access to /proc/self/fd and to implement more fine grained
> > >   control over which metadata operations are allowed on an O_PATH fd
> > >
> > > Did I understand the plan correctly?
> >
> > Yup, except I don't think we need setxattrat(2).
> >
> > > Do you agree with me that the plan to keep AT_EMPTY_PATH and magic
> > > symlink semantics may not be realistic?
> >
> > To clarify -- my view is that if any current /proc/self/fd/$n semantic
> > needs to be maintained then I would prefer that the proc-less method of
> > doing it (such as through AT_EMPTY_PATH et al) would have the same
> > capability and semantics. There are some cases where the current
> > /proc/self/fd/$n semantics need to be fixed (such as the /proc/$pid/exe
> > example) and in that case the proc-less semantics also need to be made
> > safe.
> >
> > While I would like us to restrict O_PATH as much as possible, if
> > userspace already depends on certain behaviour then we may not be able
> > to do much about it. Having an opt-out would be very important since
> > enshrining these leaky behaviours (which seem to have been overlooked)
> > means we need to consider how userspace can opt out of them.
> >
> > Unfortunately, it should be noted that due to the "magical" nature of
> > nd_jump_link(), I'm not sure how happy Al Viro will be with the kinds of
> > restrictions necessary. Even my current (quite limited) upgrade-mask
> > patchset has to do a fair bit of work to unify the semantics of
> > magic-links and openat(O_EMPTYPATH) -- expanding this to all *at(2)
> > syscalls might be quite painful. (There are also several handfuls of
> > semantic questions which need to be answered about magic-link modes and
> > whether for other *at(2) operations we may need even more complicated
> > rules or even a re-thinking of my current approach.)
> 
> The question remains, regarding the $SUBJECT patch,
> is it fair to block it and deprive libselinux of a non buggy API
> until such time that all the details around masking O_PATH fds
> will be made clear and the new API implemented?
> 
> There is no guarantee this will ever happen, so it does not seem
> reasonable to me.
> 
> To be a reasonable reaction to the currently broken API is
> to either accept the patch as is or request that setxattrat()
> will be added to provide the new functionality.

Since the current functionality cannot be retroactively disabled as it
is being used already through /proc/self/fd/$n, adding
*xattrat(AT_EMPTY_PATH) doesn't really change what is currently possible
by userspace.

I would say we should add *xattrat(2) and then we can add an upgrade
mask blocking it (and other operations) later.
Christian Göttsche Aug. 19, 2022, 6:05 p.m. UTC | #17
On Wed, 22 Jun 2022 at 04:57, Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> On 2022-06-20, Amir Goldstein <amir73il@gmail.com> wrote:
> > To be a reasonable reaction to the currently broken API is
> > to either accept the patch as is or request that setxattrat()
> > will be added to provide the new functionality.
>
> Since the current functionality cannot be retroactively disabled as it
> is being used already through /proc/self/fd/$n, adding
> *xattrat(AT_EMPTY_PATH) doesn't really change what is currently possible
> by userspace.
>
> I would say we should add *xattrat(2) and then we can add an upgrade
> mask blocking it (and other operations) later.
>

It seems setxattrat() is the preferred way to continue.
fsetxattr() would have one advantage though (w.r.t. SELinux):

The steps to label a file are:
  1. get the type of the file (via stat(2) family)
  2. lookup the desired label from the label database via selabel_lookup(3)
  3. assign the retrieved label to the file

The label is sensitive to the file type, e.g.

    $ matchpathcon -m file /etc/shadow
    /etc/shadow     system_u:object_r:shadow_t:s0
    $ matchpathcon -m lnk_file /etc/shadow
    /etc/shadow     system_u:object_r:etc_t:s0

Using the *at() family the file type could change between step 1. and 3.,
which operating on an O_PATH file descriptor would prevent.
Amir Goldstein Aug. 19, 2022, 8:27 p.m. UTC | #18
On Fri, Aug 19, 2022 at 9:05 PM Christian Göttsche
<cgzones@googlemail.com> wrote:
>
> On Wed, 22 Jun 2022 at 04:57, Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > On 2022-06-20, Amir Goldstein <amir73il@gmail.com> wrote:
> > > To be a reasonable reaction to the currently broken API is
> > > to either accept the patch as is or request that setxattrat()
> > > will be added to provide the new functionality.
> >
> > Since the current functionality cannot be retroactively disabled as it
> > is being used already through /proc/self/fd/$n, adding
> > *xattrat(AT_EMPTY_PATH) doesn't really change what is currently possible
> > by userspace.
> >
> > I would say we should add *xattrat(2) and then we can add an upgrade
> > mask blocking it (and other operations) later.
> >
>
> It seems setxattrat() is the preferred way to continue.
> fsetxattr() would have one advantage though (w.r.t. SELinux):
>
> The steps to label a file are:
>   1. get the type of the file (via stat(2) family)
>   2. lookup the desired label from the label database via selabel_lookup(3)
>   3. assign the retrieved label to the file
>
> The label is sensitive to the file type, e.g.
>
>     $ matchpathcon -m file /etc/shadow
>     /etc/shadow     system_u:object_r:shadow_t:s0
>     $ matchpathcon -m lnk_file /etc/shadow
>     /etc/shadow     system_u:object_r:etc_t:s0
>
> Using the *at() family the file type could change between step 1. and 3.,
> which operating on an O_PATH file descriptor would prevent.

I don't understand the problem.

If you have an O_PATH fd, the object it represents does not change.
If you use fstatat(fd, ..., AT_EMPTY_PATH) (or fstat) and
setxattrat(fd, ..., AT_EMPTY_PATH), it prevents the race.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/xattr.c b/fs/xattr.c
index e8dd03e4561e..16360ac4eb1b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -656,7 +656,7 @@  SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
 SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 		const void __user *,value, size_t, size, int, flags)
 {
-	struct fd f = fdget(fd);
+	struct fd f = fdget_raw(fd);
 	int error = -EBADF;
 
 	if (!f.file)
@@ -768,7 +768,7 @@  SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
 SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 		void __user *, value, size_t, size)
 {
-	struct fd f = fdget(fd);
+	struct fd f = fdget_raw(fd);
 	ssize_t error = -EBADF;
 
 	if (!f.file)
@@ -844,7 +844,7 @@  SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list,
 
 SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
 {
-	struct fd f = fdget(fd);
+	struct fd f = fdget_raw(fd);
 	ssize_t error = -EBADF;
 
 	if (!f.file)
@@ -910,7 +910,7 @@  SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname,
 
 SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 {
-	struct fd f = fdget(fd);
+	struct fd f = fdget_raw(fd);
 	int error = -EBADF;
 
 	if (!f.file)