diff mbox series

[01/14] VFS: Add additional RESOLVE_* flags [ver #18]

Message ID 158376245699.344135.7522994074747336376.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series VFS: Filesystem information [ver #18] | expand

Commit Message

David Howells March 9, 2020, 2 p.m. UTC
Add additional RESOLVE_* flags to correspond to AT_* flags that aren't
currently implemented:

	RESOLVE_NO_TRAILING_SYMLINKS    for AT_SYMLINK_NOFOLLOW
	RESOLVE_NO_TRAILING_AUTOMOUNTS  for AT_NO_AUTOMOUNT
	RESOLVE_EMPTY_PATH              for AT_EMPTY_PATH

This is necessary for fsinfo() to use RESOLVE_* flags instead of AT_* flags
if the latter are to be considered deprecated for new system calls.

Also make openat2() handle RESOLVE_NO_TRAILING_SYMLINKS.

Automounting is currently forced by doing an open(), so adding support to
openat2() for RESOLVE_NO_TRAILING_AUTOMOUNTS is not trivial.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Aleksa Sarai <cyphar@cyphar.com>
---

 fs/open.c                    |    8 +++++---
 include/linux/fcntl.h        |    3 ++-
 include/uapi/linux/openat2.h |    8 +++++++-
 3 files changed, 14 insertions(+), 5 deletions(-)

Comments

Stefan Metzmacher March 9, 2020, 8:56 p.m. UTC | #1
Hi David,

> Add additional RESOLVE_* flags to correspond to AT_* flags that aren't
> currently implemented:
> 
> 	RESOLVE_NO_TRAILING_SYMLINKS    for AT_SYMLINK_NOFOLLOW
> 	RESOLVE_NO_TRAILING_AUTOMOUNTS  for AT_NO_AUTOMOUNT
> 	RESOLVE_EMPTY_PATH              for AT_EMPTY_PATH

Thanks for changing the names!

> This is necessary for fsinfo() to use RESOLVE_* flags instead of AT_* flags
> if the latter are to be considered deprecated for new system calls.
> 
> Also make openat2() handle RESOLVE_NO_TRAILING_SYMLINKS.
> 
> Automounting is currently forced by doing an open(), so adding support to
> openat2() for RESOLVE_NO_TRAILING_AUTOMOUNTS is not trivial.

lookup_flags &= ~LOOKUP_AUTOMOUNT won't work?

At least it should cause EINVAL (or something similar) instead of being
silently ignored. The same applies to RESOLVE_EMPTY_PATH, it should be
handled with some logic or also cause EINVAL.

vfs_statx()/vfs_stat_set_lookup_flags() seems to have a similar logic
using the AT_* flags.

metze
David Howells March 9, 2020, 9:13 p.m. UTC | #2
Stefan Metzmacher <metze@samba.org> wrote:

> > Automounting is currently forced by doing an open(), so adding support to
> > openat2() for RESOLVE_NO_TRAILING_AUTOMOUNTS is not trivial.
> 
> lookup_flags &= ~LOOKUP_AUTOMOUNT won't work?

No.  LOOKUP_OPEN overrides that.

David
Linus Torvalds March 10, 2020, 1:14 a.m. UTC | #3
On Mon, Mar 9, 2020 at 5:56 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> On 2020-03-09, David Howells <dhowells@redhat.com> wrote:
> > This is necessary for fsinfo() to use RESOLVE_* flags instead of AT_* flags
> > if the latter are to be considered deprecated for new system calls.
> >
> > Also make openat2() handle RESOLVE_NO_TRAILING_SYMLINKS.

No, please let's not do this.

We have O_NOFOLLOW, and we can't get rid of it.

So adding RESOLVE_NO_TRAILING_SYMLINKS isn't a cleanup. It's just
extra complexity for absolutely zero gain.


> After thinking about what Christian said some more, I reckon we
> shouldn't support both O_NOFOLLOW and RESOLVE_NO_TRAILING_SYMLINKS. But
> that means we'll need to cherry-pick this patch and get it into mainline
> before v5.6.

No.

It simply means that we shouldn't have RESOLVE_NO_TRAILING_SYMLINKS at all.

Adding that flag is a mistake. It causes problems like this, where
subtlenly people say "what if both flags are set".

Just don't do it.

There's no way in hell we can ever get rid of O_NOFOLLOW anyway, since
people will continue to use plain open() and openat().

So adding RESOLVE_NO_TRAILING_SYMLINKS is entirely redundant.

Don't deprecate the old flags that are going to always stay around,
don't add stupid new flags that add no value.

It's that easy.

              Linus
David Howells March 10, 2020, 7:25 a.m. UTC | #4
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > > Also make openat2() handle RESOLVE_NO_TRAILING_SYMLINKS.
> 
> No, please let's not do this.
> 
> We have O_NOFOLLOW, and we can't get rid of it.
> 
> So adding RESOLVE_NO_TRAILING_SYMLINKS isn't a cleanup. It's just
> extra complexity for absolutely zero gain.

Okay.  So what's the equivalent of AT_SYMLINK_NOFOLLOW in RESOLVE_* flag
terms?  RESOLVE_NO_SYMLINKS is not equivalent, though O_NOFOLLOW is.  The
reason I ask is that RESOLVE_* flags can't be easily extended to non-open
syscalls that don't take O_* flags without it.  Would you prefer that new
non-open syscalls continue to take AT_* and ignore RESOLVE_* flags?  That
would be fine by me.

David
Linus Torvalds March 11, 2020, 5:59 p.m. UTC | #5
On Tue, Mar 10, 2020 at 12:25 AM David Howells <dhowells@redhat.com> wrote:
?
> Okay.  So what's the equivalent of AT_SYMLINK_NOFOLLOW in RESOLVE_* flag
> terms?

Nothing.

openat2() takes two sets of flags. We'll never get rid of
AT_SYMLINK_NOFOLLOW / O_NOFOLLOW, and we've added RESOLVE_NO_SYMLINKS
to the new set of flags. It's just a separate namespace.

We will _not_ be adding a RESOLVE_XYZ flag for O_NOFOLLOW or
AT_SYMLINK_NOFOLLOW. At least not visible to user space - because as
people already figured out, that just causes problems with consistency
issues.

And yes, the fact that we then have three different user-visible
namespaces (O_xyz flags for open(), AT_xyz flags for linkat(), and now
RESOLVE_xyz flags for openat2()) is sad and messy. But it's an
inherent messiness from just how the world works. We can't get rid of
it.

If we need linkat2() and friends, so be it. Do we?

Could we have a _fourth_ set of flags that are simply for internal use
that is a superset of them all? Sure. But no, it's almost certainly
not worth it. Four is not better than three.

Now, some type-safety in the kernel to make sure that we can't mix
AT_xyz with O_xyz or RESOLVE_xyz - that might be worth it. Although
judging by past experience, not enough people run sparse for it to
really be worth it.

               Linus

PS. Yeah, we also have that LOOKUP_xyz namespace, and the access mode
namespace, so we already have those internal format versions too.
Stefan Metzmacher March 12, 2020, 9:08 a.m. UTC | #6
Hi Linus,

>> Okay.  So what's the equivalent of AT_SYMLINK_NOFOLLOW in RESOLVE_* flag
>> terms?
> 
> Nothing.
> 
> openat2() takes two sets of flags. We'll never get rid of
> AT_SYMLINK_NOFOLLOW / O_NOFOLLOW, and we've added RESOLVE_NO_SYMLINKS
> to the new set of flags. It's just a separate namespace.
> 
> We will _not_ be adding a RESOLVE_XYZ flag for O_NOFOLLOW or
> AT_SYMLINK_NOFOLLOW. At least not visible to user space - because as
> people already figured out, that just causes problems with consistency
> issues.
> 
> And yes, the fact that we then have three different user-visible
> namespaces (O_xyz flags for open(), AT_xyz flags for linkat(), and now
> RESOLVE_xyz flags for openat2()) is sad and messy. But it's an
> inherent messiness from just how the world works. We can't get rid of
> it.

For openat2() and other existing syscalls I agree, that it's good to
have just a single bit to control a feature.

The whole discussion was triggered by the introduction of a completely
new fsinfo()
call:

>> The new system call looks like:
>> 
>> 	int ret = fsinfo(int dfd,
>> 			 const char *pathname,
>> 			 const struct fsinfo_params *params,
>> 			 size_t params_size,
>> 			 void *result_buffer,
>> 			 size_t result_buf_size);
>> 
>> The params parameter optionally points to a block of parameters:
>> 
>> 	struct fsinfo_params {
>> 		__u32	resolve_flags;

If I remember correctly with was named at_flags initially.
And I thought it would be great to also have the new RESOLVE_YXZ feature
available for that new path based syscall.

Would you propose to have 'at_flags' and 'resolve_flags' passed in here?
Or is there something even better you would propose for new syscalls?

>> 		__u32	flags;
>> 		__u32	request;
>> 		__u32	Nth;
>> 		__u32	Mth;
>> 	};

> If we need linkat2() and friends, so be it. Do we?

Yes, I'm going to propose something like this, as it would make the life
much easier for Samba to have the new features available on all path
based syscalls.

In addition I'll propose to have a way to specify the source of
removeat and unlinkat also by fd in addition to the the source parent fd
and relative path, the reason are also to detect races of path
recycling. pidfd_open() solved a similar problem for pid recycling.
> Could we have a _fourth_ set of flags that are simply for internal use
> that is a superset of them all? Sure. But no, it's almost certainly
> not worth it. Four is not better than three.

As you pointed our below the LOOKUP_yxz namespace is already in place...
And the discussion was more about an possible single namespace for
completely new syscalls.

> Now, some type-safety in the kernel to make sure that we can't mix
> AT_xyz with O_xyz or RESOLVE_xyz - that might be worth it. Although
> judging by past experience, not enough people run sparse for it to
> really be worth it.

I'm new to all this and maybe too naive, but would a build bot running
sparse on linux-next be able to catch this early enough?

metze
Linus Torvalds March 12, 2020, 4:24 p.m. UTC | #7
On Thu, Mar 12, 2020 at 2:08 AM Stefan Metzmacher <metze@samba.org> wrote:
>
> The whole discussion was triggered by the introduction of a completely
> new fsinfo() call:
>
> Would you propose to have 'at_flags' and 'resolve_flags' passed in here?

Yes, I think that would be the way to go.

> > If we need linkat2() and friends, so be it. Do we?
>
> Yes, I'm going to propose something like this, as it would make the life
> much easier for Samba to have the new features available on all path
> based syscalls.

Will samba actually use them? I think we've had extensions before that
weren't worth the non-portability pain?

But yes, if we have a major package like samba use it, then by all
means let's add linkat2(). How many things are we talking about? We
have a number of system calls that do *not* take flags, but do do
pathname walking. I'm thinking things like "mkdirat()"?)

> In addition I'll propose to have a way to specify the source of
> removeat and unlinkat also by fd in addition to the the source parent fd
> and relative path, the reason are also to detect races of path
> recycling.

Would that be basically just an AT_EMPTY_PATH kind of thing? IOW,
you'd be able to remove a file by doing

   fd = open(path.., O_PATH);
   unlinkat(fd, "", AT_EMPTY_PATH);

Hmm. We have _not_ allowed filesystem changes without that last
component lookup. Of course, with our dentry model, we *can* do it,
but this smells fairly fundamental to me.

It might avoid some of the extra system calls (ie you could use
openat2() to do the path walking part, and then
unlinkat(AT_EMPTY_PATH) to remove it, and have a "fstat()" etc in
between the verify that it's the right type of file or whatever - and
you'd not need an unlinkat2() with resolve flags).

I think Al needs to ok this kind of change. Maybe you've already
discussed it with him and I just missed it.

            Linus
David Howells March 12, 2020, 4:56 p.m. UTC | #8
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > The whole discussion was triggered by the introduction of a completely
> > new fsinfo() call:
> >
> > Would you propose to have 'at_flags' and 'resolve_flags' passed in here?
> 
> Yes, I think that would be the way to go.

Okay, I can do that.

Any thoughts on which set of flags should override the other?  If we're making
RESOLVE_* flags the new definitive interface, then I feel they should probably
override the AT_* flags where there's a conflict, ie. RESOLVE_NO_SYMLINKS
should override AT_SYMLINK_FOLLOW for example.

David
Stefan Metzmacher March 12, 2020, 5:11 p.m. UTC | #9
Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> On Thu, Mar 12, 2020 at 2:08 AM Stefan Metzmacher <metze@samba.org> wrote:
>>
>> The whole discussion was triggered by the introduction of a completely
>> new fsinfo() call:
>>
>> Would you propose to have 'at_flags' and 'resolve_flags' passed in here?
> 
> Yes, I think that would be the way to go.

Ok, that's also fine for me:-)

>>> If we need linkat2() and friends, so be it. Do we?
>>
>> Yes, I'm going to propose something like this, as it would make the life
>> much easier for Samba to have the new features available on all path
>> based syscalls.
> 
> Will samba actually use them? I think we've had extensions before that
> weren't worth the non-portability pain?

Yes, we're currently moving to the portable *at() calls as a start.
And we already make use of Linux only feature for performance reasons
in other places. Having the new resolve flags will make it possible to
move some of the performance intensive work into non-linux specific
modules as fallback.

I hope that we'll use most of this through io_uring in the end,
that's the reason Jens added the IORING_REGISTER_PERSONALITY feature
used for IORING_OP_OPENAT2.

> But yes, if we have a major package like samba use it, then by all
> means let's add linkat2(). How many things are we talking about? We
> have a number of system calls that do *not* take flags, but do do
> pathname walking. I'm thinking things like "mkdirat()"?)

I haven't looked them up in detail yet.
Jeremy can you provide a list?

Do you think we could route some of them like mkdirat() and mknodat()
via openat2() instead of creating new syscalls?

>> In addition I'll propose to have a way to specify the source of
>> removeat and unlinkat also by fd in addition to the the source parent fd
>> and relative path, the reason are also to detect races of path
>> recycling.
> 
> Would that be basically just an AT_EMPTY_PATH kind of thing? IOW,
> you'd be able to remove a file by doing
> 
>    fd = open(path.., O_PATH);
>    unlinkat(fd, "", AT_EMPTY_PATH);
> 
> Hmm. We have _not_ allowed filesystem changes without that last
> component lookup. Of course, with our dentry model, we *can* do it,
> but this smells fairly fundamental to me.
>
> It might avoid some of the extra system calls (ie you could use
> openat2() to do the path walking part, and then
> unlinkat(AT_EMPTY_PATH) to remove it, and have a "fstat()" etc in
> between the verify that it's the right type of file or whatever - and
> you'd not need an unlinkat2() with resolve flags).

If that works safely for hardlinks and having another process doing a
rename between openat2() and unlinkat(), we could try that.

My initial naive idea was to have one syscall instead of
linkat2/renameat3/unlinkat2.

int xlinkat(int src_dfd, const char *src_path,
            int dst_dfd, const char *dst_path,
            const struct xlinkat_how *how, size_t how_size);

struct xlinkat_how {
       __u64 src_at_flags;
       __u64 src_resolve_flags;
       __u64 dst_at_flags;
       __u64 dst_resolve_flags;
       __u64 rename_flags;
       __s32 src_fd;
};

With src_dfd=-1, src_path=NULL, how.src_fd = -1, this would be like
linkat().
With dst_dfd=-1, dst_path=NULL, it would be like unlinkat().
Otherwise a renameat2().

If how.src_fd is not -1, it would be checked to be the same path as
specified by src_dfd and src_path.

> I think Al needs to ok this kind of change. Maybe you've already
> discussed it with him and I just missed it.

This is the first time I'm discussing this.

Thanks for the useful feedback!
metze
Linus Torvalds March 12, 2020, 6:09 p.m. UTC | #10
On Thu, Mar 12, 2020 at 9:56 AM David Howells <dhowells@redhat.com> wrote:
>
> Any thoughts on which set of flags should override the other?

Do we need to care? I don't think we actually have conflicts, because
the semantics aren't the same, and they are about independent issues.

>                  If we're making
> RESOLVE_* flags the new definitive interface, then I feel they should probably
> override the AT_* flags where there's a conflict, ie. RESOLVE_NO_SYMLINKS
> should override AT_SYMLINK_FOLLOW for example.

That's just for a linkat2() system call? I think the natural semantic
is the one that falls out directly: RESOLVE_NO_SYMLINKS will cause it
to fail with -ELOOP if it is a symlink.

NOTE! This isn't really a "conflict". It's actually two different and
independent things:

 - without AT_SYMLINK_FOLLOW, a linkat() simply won't even try to
follow the symlink, and will link to the symlink itself instead.

 - RESOLVE_NO_SYMLINKS says "never follow symlinks".

Note how one does *NOT* override the other, quite the reverse. They
are about different things. One is about the _behavior_ when the last
component is a symlink, and the other is about permission to follow
any symlinks.

So all combinations make sense:

 - no AT_SYMLINK_FOLLOW, no RESOLVE_NO_SYMLINKS: just link to the
target, whether it's a symlink or not

   This is obviously our historical link() behavior.

 - no AT_SYMLINK_FOLLOW, yes RESOLVE_NO_SYMLINKS: just link to the
target, whether it's a symlink or not, but if there's a symlink in the
middle, return -ELOOP

   Note how this case doesn't follow the last one, so
RESOLVE_NO_SYMLINKS isn't an issue for the last component, but _is_ an
issue for the components in the middle.

 - AT_SYMLINK_FOLLOW, no RESOLVE_NO_SYMLINKS: just link to the target,
following the symlink if it exists

   This is obviously the historical AT_SYMLINK_FOLLOW behavior

 - AT_SYMLINK_FOLLOW | RESOLVE_NO_SYMLINKS: just link to the target,
return -ELOOP if it's a symlink (of if there's a symlink on the way).

   This is the natural behavior for "refuse to follow any symlinks anywhere".

note how they are all completely sane versions, and in no case does
one flag really override the other.

If anything, we actually miss a third flag: the "don't allow linking
to a final symlink at all" (but allow intermediate symlinks). We've
never had that behavior, although I think POSIX makes that case
undefined (ie you're not guaranteed to be able to link to a symlink in
the first place in POSIX).

I guess that third case could be emulated with open(O_PATH) + fstat to
check it's not a symlink + linkat(fd,AT_EMPTY_PATH) if it turns
somebody would want something like that (and we decided that
AT_EMPTY_PATH is ok for linkat()).

I doubt anybody cares.

                 Linus
Al Viro March 12, 2020, 7:25 p.m. UTC | #11
On Thu, Mar 12, 2020 at 09:24:49AM -0700, Linus Torvalds wrote:
> Would that be basically just an AT_EMPTY_PATH kind of thing? IOW,
> you'd be able to remove a file by doing
> 
>    fd = open(path.., O_PATH);
>    unlinkat(fd, "", AT_EMPTY_PATH);
> 
> Hmm. We have _not_ allowed filesystem changes without that last
> component lookup. Of course, with our dentry model, we *can* do it,
> but this smells fairly fundamental to me.

That's a bloody bad idea.  It breeds fuckloads of corner cases, it does not
match the locking model at all and I don't want to even think of e.g.
the interplay with open-by-fhandle ("Parent?  What parent?"), etc.

Fundamentally, there are operations on objects and there are operations
on links to objects.  Mixing those is the recipe for massive headache.

> It might avoid some of the extra system calls (ie you could use
> openat2() to do the path walking part, and then
> unlinkat(AT_EMPTY_PATH) to remove it, and have a "fstat()" etc in
> between the verify that it's the right type of file or whatever - and
> you'd not need an unlinkat2() with resolve flags).
> 
> I think Al needs to ok this kind of change. Maybe you've already
> discussed it with him and I just missed it.

They have not.  And IME samba folks tend to present the set of
primitives they want without bothering to explain what do they
want to factorize that way, let alone why it should be factorized
that way...
Al Viro March 12, 2020, 7:37 p.m. UTC | #12
On Thu, Mar 12, 2020 at 06:11:09PM +0100, Stefan Metzmacher wrote:

> If that works safely for hardlinks and having another process doing a
> rename between openat2() and unlinkat(), we could try that.
> 
> My initial naive idea was to have one syscall instead of
> linkat2/renameat3/unlinkat2.
> 
> int xlinkat(int src_dfd, const char *src_path,
>             int dst_dfd, const char *dst_path,
>             const struct xlinkat_how *how, size_t how_size);
> 
> struct xlinkat_how {
>        __u64 src_at_flags;
>        __u64 src_resolve_flags;
>        __u64 dst_at_flags;
>        __u64 dst_resolve_flags;
>        __u64 rename_flags;
>        __s32 src_fd;
> };
> 
> With src_dfd=-1, src_path=NULL, how.src_fd = -1, this would be like
> linkat().
> With dst_dfd=-1, dst_path=NULL, it would be like unlinkat().
> Otherwise a renameat2().
>
> If how.src_fd is not -1, it would be checked to be the same path as
> specified by src_dfd and src_path.

"Checked" as in...?  And is that the same path or another link to the
same object, or...?

The idea of dumping all 3 into the same syscall looks wrong - compare
the effects of link() and rename() on the opened files, for starters,
and try to come up with documentation for all of that.  Multiplexors
tend to be very bad, in large part because they have so bloody
convoluted semantics...
Jeremy Allison March 12, 2020, 9:48 p.m. UTC | #13
On Thu, Mar 12, 2020 at 06:11:09PM +0100, Stefan Metzmacher wrote:
> Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> > On Thu, Mar 12, 2020 at 2:08 AM Stefan Metzmacher <metze@samba.org> wrote:
> >>
> >> The whole discussion was triggered by the introduction of a completely
> >> new fsinfo() call:
> >>
> >> Would you propose to have 'at_flags' and 'resolve_flags' passed in here?
> > 
> > Yes, I think that would be the way to go.
> 
> Ok, that's also fine for me:-)
> 
> >>> If we need linkat2() and friends, so be it. Do we?
> >>
> >> Yes, I'm going to propose something like this, as it would make the life
> >> much easier for Samba to have the new features available on all path
> >> based syscalls.
> > 
> > Will samba actually use them? I think we've had extensions before that
> > weren't worth the non-portability pain?
> 
> Yes, we're currently moving to the portable *at() calls as a start.
> And we already make use of Linux only feature for performance reasons
> in other places. Having the new resolve flags will make it possible to
> move some of the performance intensive work into non-linux specific
> modules as fallback.
> 
> I hope that we'll use most of this through io_uring in the end,
> that's the reason Jens added the IORING_REGISTER_PERSONALITY feature
> used for IORING_OP_OPENAT2.
> 
> > But yes, if we have a major package like samba use it, then by all
> > means let's add linkat2(). How many things are we talking about? We
> > have a number of system calls that do *not* take flags, but do do
> > pathname walking. I'm thinking things like "mkdirat()"?)
> 
> I haven't looked them up in detail yet.
> Jeremy can you provide a list?

Fixing the flags argument on fchmodat() to actually *implement*
AT_SYMLINK_NOFOLLOW would be a good start :-).

As for the syscalls that don't have
flags I'm thinking of the things like:

getxattr/setxattr/removexattr just off the top of my head.

Jeremy.
Jeremy Allison March 13, 2020, 4:48 p.m. UTC | #14
On Fri, Mar 13, 2020 at 08:59:01PM +1100, Aleksa Sarai wrote:
> 
> I have heard some folks asking for a way to create a directory and get a
> handle to it atomically -- so arguably this is something that could be
> inside openat2()'s feature set (O_MKDIR?). But I'm not sure how popular
> of an idea this is.

This would be very useful to prevent race conditions between making
a directory and EA's on it, as are needed by Samba for
DOS attributes and Windows/NFSv4 ACLS.
Al Viro March 13, 2020, 6:28 p.m. UTC | #15
On Fri, Mar 13, 2020 at 08:59:01PM +1100, Aleksa Sarai wrote:
> On 2020-03-12, Stefan Metzmacher <metze@samba.org> wrote:
> > Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> > > But yes, if we have a major package like samba use it, then by all
> > > means let's add linkat2(). How many things are we talking about? We
> > > have a number of system calls that do *not* take flags, but do do
> > > pathname walking. I'm thinking things like "mkdirat()"?)
> > 
> > I haven't looked them up in detail yet.
> > Jeremy can you provide a list?
> > 
> > Do you think we could route some of them like mkdirat() and mknodat()
> > via openat2() instead of creating new syscalls?
> 
> I have heard some folks asking for a way to create a directory and get a
> handle to it atomically -- so arguably this is something that could be
> inside openat2()'s feature set (O_MKDIR?). But I'm not sure how popular
> of an idea this is.

For fuck sake, *NO*!

We don't need any more multiplexors from hell.  mkdir() and open() have
deeply different interpretation of pathnames (and anyone who asks for
e.g. traversals of dangling symlinks on mkdir() is insane).  Don't try to
mix those; even O_TMPFILE had been a mistake.

Folks, we'd paid very dearly for the atomic_open() merge.  We are _still_
paying for it - and keep finding bugs induced by the convoluted horrors
in that thing (see yesterday pull from vfs.git#fixes for the latest crop).
I hope to get into more or less sane shape (part - this cycle, with
followups in the next one), but the last thing we need is more complexity
in the area.

Keep the semantics simple and regular; corner cases _suck_.  "Infinitely
extensible (without review)" is no virtue.  And having nowhere to hide
very special flags for very special kludges is a bloody good thing.

Every fucking time we had a multiplexed syscall, it had been a massive
source of trouble.  IF it has a uniform semantics - fine; we don't need
arseloads of read_this(2)/read_that(2).  But when you need pages upon
pages to describe the subtle differences in the interpretation of
its arguments, you have already lost.  It will be full of corner
cases, they will get zero testing and they will rot.  Inevitably.  All
the faster for the lack of people who would be able to keep all of that
in head.

We do have a mechanism for multiplexing; on amd64 it lives in do_syscall_64().
We really don't need openat2() turning into another one.  Syscall table
slots are not in a short supply, and the level of review one gets from
"new syscall added" is higher than from "make fubar(2) recognize a new
member in options->union_full_of_crap if it has RESOLVE_TO_WANK_WITH_RIGHT_HAND
set in options->flags, affecting its behaviour in some odd ways".
Which is a good thing, damnit.
Jeremy Allison March 13, 2020, 6:35 p.m. UTC | #16
On Fri, Mar 13, 2020 at 06:28:44PM +0000, Al Viro wrote:
> On Fri, Mar 13, 2020 at 08:59:01PM +1100, Aleksa Sarai wrote:
> > On 2020-03-12, Stefan Metzmacher <metze@samba.org> wrote:
> > > Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> > > > But yes, if we have a major package like samba use it, then by all
> > > > means let's add linkat2(). How many things are we talking about? We
> > > > have a number of system calls that do *not* take flags, but do do
> > > > pathname walking. I'm thinking things like "mkdirat()"?)
> > > 
> > > I haven't looked them up in detail yet.
> > > Jeremy can you provide a list?
> > > 
> > > Do you think we could route some of them like mkdirat() and mknodat()
> > > via openat2() instead of creating new syscalls?
> > 
> > I have heard some folks asking for a way to create a directory and get a
> > handle to it atomically -- so arguably this is something that could be
> > inside openat2()'s feature set (O_MKDIR?). But I'm not sure how popular
> > of an idea this is.
> 
> For fuck sake, *NO*!
> 
> We don't need any more multiplexors from hell.  mkdir() and open() have
> deeply different interpretation of pathnames (and anyone who asks for
> e.g. traversals of dangling symlinks on mkdir() is insane).  Don't try to
> mix those; even O_TMPFILE had been a mistake.
> 
> Folks, we'd paid very dearly for the atomic_open() merge.  We are _still_
> paying for it - and keep finding bugs induced by the convoluted horrors
> in that thing (see yesterday pull from vfs.git#fixes for the latest crop).
> I hope to get into more or less sane shape (part - this cycle, with
> followups in the next one), but the last thing we need is more complexity
> in the area.

Can we disentangle the laudable desire to keep kernel internals
simple (which I completely agree with :-) from the desire to
keep user-space interfaces simple ?

Having some way of doing a mkdir() that returns an open fd
on the new directory *is* a very useful thing for many applications,
but I really don't care how the kernel implements it. We have so much
Linux-specific code already that one more thing won't matter :-).
Aleksa Sarai March 16, 2020, 2:20 p.m. UTC | #17
On 2020-03-13, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Mar 13, 2020 at 08:59:01PM +1100, Aleksa Sarai wrote:
> > On 2020-03-12, Stefan Metzmacher <metze@samba.org> wrote:
> > > Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> > > > But yes, if we have a major package like samba use it, then by all
> > > > means let's add linkat2(). How many things are we talking about? We
> > > > have a number of system calls that do *not* take flags, but do do
> > > > pathname walking. I'm thinking things like "mkdirat()"?)
> > > 
> > > I haven't looked them up in detail yet.
> > > Jeremy can you provide a list?
> > > 
> > > Do you think we could route some of them like mkdirat() and mknodat()
> > > via openat2() instead of creating new syscalls?
> > 
> > I have heard some folks asking for a way to create a directory and get a
> > handle to it atomically -- so arguably this is something that could be
> > inside openat2()'s feature set (O_MKDIR?). But I'm not sure how popular
> > of an idea this is.
> 
> For fuck sake, *NO*!
> 
> We don't need any more multiplexors from hell.  mkdir() and open() have
> deeply different interpretation of pathnames (and anyone who asks for
> e.g. traversals of dangling symlinks on mkdir() is insane).  Don't try to
> mix those; even O_TMPFILE had been a mistake.

I agree that O_TMPFILE is a mess, and you're right that it wouldn't be a
good idea to fold it into open*(). But what is your opinion on a
hypothetical mkdirat2() which would let you get an fd to the directory
that was just created?

> We really don't need openat2() turning into another one.  Syscall table
> slots are not in a short supply, and the level of review one gets from
> "new syscall added" is higher than from "make fubar(2) recognize a new
> member in options->union_full_of_crap if it has RESOLVE_TO_WANK_WITH_RIGHT_HAND
> set in options->flags, affecting its behaviour in some odd ways".
> Which is a good thing, damnit.

You're quite right, and I don't intend openat2() to become another
ioctl-but-with-even-more-fun-semantics.
diff mbox series

Patch

diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..7c38a7605c21 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -977,7 +977,7 @@  inline struct open_how build_open_how(int flags, umode_t mode)
 inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 {
 	int flags = how->flags;
-	int lookup_flags = 0;
+	int lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
 	int acc_mode = ACC_MODE(flags);
 
 	/* Must never be set by userspace */
@@ -1055,8 +1055,8 @@  inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 
 	if (flags & O_DIRECTORY)
 		lookup_flags |= LOOKUP_DIRECTORY;
-	if (!(flags & O_NOFOLLOW))
-		lookup_flags |= LOOKUP_FOLLOW;
+	if (flags & O_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
 
 	if (how->resolve & RESOLVE_NO_XDEV)
 		lookup_flags |= LOOKUP_NO_XDEV;
@@ -1068,6 +1068,8 @@  inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 		lookup_flags |= LOOKUP_BENEATH;
 	if (how->resolve & RESOLVE_IN_ROOT)
 		lookup_flags |= LOOKUP_IN_ROOT;
+	if (how->resolve & RESOLVE_NO_TRAILING_SYMLINKS)
+		lookup_flags &= ~LOOKUP_FOLLOW;
 
 	op->lookup_flags = lookup_flags;
 	return 0;
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index 7bcdcf4f6ab2..eacf17a8ca34 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -19,7 +19,8 @@ 
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
 	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
-	 RESOLVE_BENEATH | RESOLVE_IN_ROOT)
+	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_NO_TRAILING_SYMLINKS | \
+	 RESOLVE_NO_TRAILING_AUTOMOUNTS | RESOLVE_EMPTY_PATH)
 
 /* List of all open_how "versions". */
 #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index 58b1eb711360..2647a108f116 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -22,7 +22,10 @@  struct open_how {
 	__u64 resolve;
 };
 
-/* how->resolve flags for openat2(2). */
+/*
+ * Path resolution paths to replace AT_* paths in all new syscalls that would
+ * use them.
+ */
 #define RESOLVE_NO_XDEV		0x01 /* Block mount-point crossings
 					(includes bind-mounts). */
 #define RESOLVE_NO_MAGICLINKS	0x02 /* Block traversal through procfs-style
@@ -35,5 +38,8 @@  struct open_how {
 #define RESOLVE_IN_ROOT		0x10 /* Make all jumps to "/" and ".."
 					be scoped inside the dirfd
 					(similar to chroot(2)). */
+#define RESOLVE_NO_TRAILING_SYMLINKS	0x20 /* Don't follow trailing symlinks in the path */
+#define RESOLVE_NO_TRAILING_AUTOMOUNTS	0x40 /* Don't follow trailing automounts in the path */
+#define RESOLVE_EMPTY_PATH	0x80	/* Permit a path of "" to indicate the dfd exactly */
 
 #endif /* _UAPI_LINUX_OPENAT2_H */