diff mbox series

vfs: Add AT_EMPTY_PATH_NOCHECK as unchecked AT_EMPTY_PATH

Message ID 20240622105621.7922-1-xry111@xry111.site (mailing list archive)
State New
Headers show
Series vfs: Add AT_EMPTY_PATH_NOCHECK as unchecked AT_EMPTY_PATH | expand

Commit Message

Xi Ruoyao June 22, 2024, 10:56 a.m. UTC
It's cheap to check if the path is empty in the userspace, but expensive
to check if a userspace string is empty from the kernel.  So using statx
and AT_EMPTY_PATH to implement fstat is slower than a "native" fstat
call.  But for arch/loongarch fstat does not exist so we have to use
statx, and on all 32-bit architectures we must use statx after 2037.
And seccomp also cannot audit AT_EMPTY_PATH properly because it cannot
check if path is empty.

To resolve these issues, add a relaxed version of AT_EMPTY_PATH: it does
not check if the path is empty, but just assumes the path is empty and
then behaves like AT_EMPTY_PATH.

Link: https://sourceware.org/pipermail/libc-alpha/2023-September/151364.html
Link: https://lore.kernel.org/loongarch/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name/
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alejandro Colomar <alx@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Icenowy Zheng <uwu@icenowy.me>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: loongarch@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 fs/namei.c                 | 8 +++++++-
 fs/stat.c                  | 4 +++-
 include/linux/namei.h      | 4 ++++
 include/trace/misc/fs.h    | 1 +
 include/uapi/linux/fcntl.h | 3 +++
 5 files changed, 18 insertions(+), 2 deletions(-)

Comments

Mateusz Guzik June 22, 2024, 9:25 p.m. UTC | #1
+cc Linus

On Sat, Jun 22, 2024 at 06:56:08PM +0800, Xi Ruoyao wrote:
> It's cheap to check if the path is empty in the userspace, but expensive
> to check if a userspace string is empty from the kernel.  So using statx
> and AT_EMPTY_PATH to implement fstat is slower than a "native" fstat
> call.  But for arch/loongarch fstat does not exist so we have to use
> statx, and on all 32-bit architectures we must use statx after 2037.
> And seccomp also cannot audit AT_EMPTY_PATH properly because it cannot
> check if path is empty.
> 
> To resolve these issues, add a relaxed version of AT_EMPTY_PATH: it does
> not check if the path is empty, but just assumes the path is empty and
> then behaves like AT_EMPTY_PATH.
> 
> Link: https://sourceware.org/pipermail/libc-alpha/2023-September/151364.html
> Link: https://lore.kernel.org/loongarch/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name/

imo the thing to do is to add fstat for your arch and add fstatx for
everyone.

My argument for fstatx specifically is that Rust uses statx instead of
fstat and is growing in popularity.

To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have
more work to do than fstat and its hypotethical statx counterpart:
- buf alloc/free for the path
- userspace access (very painful on x86_64 + SMAP)
- lockref acquire/release

(and other things concerning lookup itself which I'm going to ignore
here)

Your patch avoids the peek at userspace, but the other overhead remains.
In particular the lockref cycle, apart from adding work single-threaded,
adds avoidable serialization against other threads issuing stat(x) on
the same file. iow your patch still leaves performance on the table and
I don't think it is necessary.

If the flag is the way to go (I don't see why though), I would suggest
something like AT_NO_PATH and requring NULL as the path argument (or
some other predefined "there is nothing here" constant).

I wanted to type up a proposal for fstatx (+ patch) some time ago, but
some refactoring was needed to make it happen and put it on the back
burner.

Perhaps you would be willing to pick it up, assuming the vfs folk are
oke with it.

Regardless of what happens with statx or this patch you should probably
add fstat anyway.

If there are any other perf-sensitive syscalls which don't have their
fd-only variants they should be plugged to, but I can't think of
anything.

> ---
>  fs/namei.c                 | 8 +++++++-
>  fs/stat.c                  | 4 +++-
>  include/linux/namei.h      | 4 ++++
>  include/trace/misc/fs.h    | 1 +
>  include/uapi/linux/fcntl.h | 3 +++
>  5 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 37fb0a8aa09a..0c44a7ea5961 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -147,7 +147,13 @@ getname_flags(const char __user *filename, int flags, int *empty)
>  	kname = (char *)result->iname;
>  	result->name = kname;
>  
> -	len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> +	if (!(flags & LOOKUP_EMPTY_NOCHECK))
> +		len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> +	else {
> +		len = 0;
> +		kname[0] = '\0';
> +	}
> +
>  	if (unlikely(len < 0)) {
>  		__putname(result);
>  		return ERR_PTR(len);
> diff --git a/fs/stat.c b/fs/stat.c
> index 70bd3e888cfa..53944d3287cd 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -210,6 +210,8 @@ int getname_statx_lookup_flags(int flags)
>  		lookup_flags |= LOOKUP_AUTOMOUNT;
>  	if (flags & AT_EMPTY_PATH)
>  		lookup_flags |= LOOKUP_EMPTY;
> +	if (flags & AT_EMPTY_PATH_NOCHECK)
> +		lookup_flags |= LOOKUP_EMPTY | LOOKUP_EMPTY_NOCHECK;
>  
>  	return lookup_flags;
>  }
> @@ -237,7 +239,7 @@ static int vfs_statx(int dfd, struct filename *filename, int flags,
>  	int error;
>  
>  	if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH |
> -		      AT_STATX_SYNC_TYPE))
> +		      AT_STATX_SYNC_TYPE | AT_EMPTY_PATH_NOCHECK))
>  		return -EINVAL;
>  
>  retry:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 967aa9ea9f96..def6a8a1b531 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -45,9 +45,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
>  #define LOOKUP_IN_ROOT		0x100000 /* Treat dirfd as fs root. */
>  #define LOOKUP_CACHED		0x200000 /* Only do cached lookup */
>  #define LOOKUP_LINKAT_EMPTY	0x400000 /* Linkat request with empty path. */
> +
>  /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
>  #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
>  
> +/* If this is set, LOOKUP_EMPTY must be set as well. */
> +#define LOOKUP_EMPTY_NOCHECK	0x800000 /* Consider path empty. */
> +
>  extern int path_pts(struct path *path);
>  
>  extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
> diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h
> index 738b97f22f36..24aec7ed6b0b 100644
> --- a/include/trace/misc/fs.h
> +++ b/include/trace/misc/fs.h
> @@ -119,4 +119,5 @@
>  		{ LOOKUP_NO_XDEV,	"NO_XDEV" }, \
>  		{ LOOKUP_BENEATH,	"BENEATH" }, \
>  		{ LOOKUP_IN_ROOT,	"IN_ROOT" }, \
> +		{ LOOKUP_EMPTY_NOCHECK,	"EMPTY_NOCHECK" }, \
>  		{ LOOKUP_CACHED,	"CACHED" })
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..aa2f68d80820 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -113,6 +113,9 @@
>  #define AT_STATX_DONT_SYNC	0x4000	/* - Don't sync attributes with the server */
>  
>  #define AT_RECURSIVE		0x8000	/* Apply to the entire subtree */
> +#define AT_EMPTY_PATH_NOCHECK	0x10000	/* Like AT_EMPTY_PATH, but the path
> +                                           is not checked and it's just
> +                                           assumed to be empty */
>  
>  /* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
>  #define AT_HANDLE_FID		AT_REMOVEDIR	/* file handle is needed to
> -- 
> 2.45.2
>
Linus Torvalds June 22, 2024, 10:41 p.m. UTC | #2
On Sat, 22 Jun 2024 at 14:25, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> +cc Linus

Thanks.

> To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have
> more work to do than fstat and its hypotethical statx counterpart:
> - buf alloc/free for the path
> - userspace access (very painful on x86_64 + SMAP)
> - lockref acquire/release

Yes. That LOOKUP_EMPTY_NOCHECK is *not* the fix.

I do think that we should make AT_EMPTY_PATH with a NULL path
"JustWork(tm)", because the stupid "look if the pathname is empty" is
horrible.

But moving that check into getname() is *NOT* the right answer,
because by the time you get to getname(), you have already lost.

There's a very real reason why vfs_fstatat() catches this empty case
early, and never goes to filename lookup at all. You don't want to
generate a 'struct path' from the 'int fd', because you want to never
get anywhere close to that path, and instead only ever need a 'struct
fd' that can be looked up much more cheaply (particularly if not in a
threaded environment).

So the short-cut in vfs_fstatat() to never get a pathname is
disgusting - people should have used 'fstat()' - but it's _important_
disgusting.

This thing that tries to short-circuit things at the path level is too late.

              Linus
Xi Ruoyao June 23, 2024, 12:59 a.m. UTC | #3
On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote:

> I do think that we should make AT_EMPTY_PATH with a NULL path
> "JustWork(tm)", because the stupid "look if the pathname is empty" is
> horrible.
> 
> But moving that check into getname() is *NOT* the right answer,
> because by the time you get to getname(), you have already lost.

Oops.  I'll try to get around of getname() too...

> So the short-cut in vfs_fstatat() to never get a pathname is
> disgusting - people should have used 'fstat()' - but it's _important_
> disgusting.

The problem is we don't have fstat() for LoongArch, and it'll be
unusable on all 32-bit arch after 2037.

And Arnd hates the idea adding fstat() for LoongArch because there would
be one more 32-bit arch broken in 2037.

Or should we just add something like "fstat_2037()"?
Mateusz Guzik June 23, 2024, 1:07 a.m. UTC | #4
On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote:
>
> > I do think that we should make AT_EMPTY_PATH with a NULL path
> > "JustWork(tm)", because the stupid "look if the pathname is empty" is
> > horrible.
> >
> > But moving that check into getname() is *NOT* the right answer,
> > because by the time you get to getname(), you have already lost.
>
> Oops.  I'll try to get around of getname() too...
>
> > So the short-cut in vfs_fstatat() to never get a pathname is
> > disgusting - people should have used 'fstat()' - but it's _important_
> > disgusting.
>
> The problem is we don't have fstat() for LoongArch, and it'll be
> unusable on all 32-bit arch after 2037.
>
> And Arnd hates the idea adding fstat() for LoongArch because there would
> be one more 32-bit arch broken in 2037.
>
> Or should we just add something like "fstat_2037()"?
>

In that case fstat is out of the question, but no problem.

It was suggested to make AT_EMPTY_PATH + NULL pathname do the right
thing and have the syscalls short-circuit as needed.

for statx it would look like this (except you are going to have
implement do_statx_by_fd):

diff --git a/fs/stat.c b/fs/stat.c
index 16aa1f5ceec4..0afe72b320cc 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx,
        int ret;
        struct filename *name;

+       if (flags == AT_EMPTY_PATH && filename == NULL)
+               return do_statx_by_fd(...);
+
        name = getname_flags(filename, getname_statx_lookup_flags(flags));
        ret = do_statx(dfd, name, flags, mask, buffer);
        putname(name);

and so on

Personally I would prefer if fstatx was added instead, fwiw most
massaging in the area will be the same regardless.
Xi Ruoyao June 23, 2024, 1:22 a.m. UTC | #5
On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote:
> On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote:
> > 
> > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote:
> > 
> > > I do think that we should make AT_EMPTY_PATH with a NULL path
> > > "JustWork(tm)", because the stupid "look if the pathname is empty" is
> > > horrible.
> > > 
> > > But moving that check into getname() is *NOT* the right answer,
> > > because by the time you get to getname(), you have already lost.
> > 
> > Oops.  I'll try to get around of getname() too...
> > 
> > > So the short-cut in vfs_fstatat() to never get a pathname is
> > > disgusting - people should have used 'fstat()' - but it's _important_
> > > disgusting.
> > 
> > The problem is we don't have fstat() for LoongArch, and it'll be
> > unusable on all 32-bit arch after 2037.
> > 
> > And Arnd hates the idea adding fstat() for LoongArch because there would
> > be one more 32-bit arch broken in 2037.
> > 
> > Or should we just add something like "fstat_2037()"?
> > 
> 
> In that case fstat is out of the question, but no problem.
> 
> It was suggested to make AT_EMPTY_PATH + NULL pathname do the right
> thing and have the syscalls short-circuit as needed.
> 
> for statx it would look like this (except you are going to have
> implement do_statx_by_fd):
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 16aa1f5ceec4..0afe72b320cc 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx,
>         int ret;
>         struct filename *name;
> 
> +       if (flags == AT_EMPTY_PATH && filename == NULL)
> +               return do_statx_by_fd(...);
> +
>         name = getname_flags(filename, getname_statx_lookup_flags(flags));
>         ret = do_statx(dfd, name, flags, mask, buffer);
>         putname(name);
> 
> and so on
> 
> Personally I would prefer if fstatx was added instead, fwiw most
> massaging in the area will be the same regardless.

I do agree.  But if we do it *now* would it be "breaking the userspace"
if some stupid program relies on fstatx() to return some error when the
path is NULL?  The "stupid programs" may just exist in the wild...

I remember recently we have to pretend pidfd is stupid to please some
stupid programs...

https://lore.kernel.org/all/20240521-girlanden-zehnfach-1bff7eb9218c@brauner/
Mateusz Guzik June 23, 2024, 12:04 p.m. UTC | #6
On Sun, Jun 23, 2024 at 3:22 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote:
> > On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote:
> > >
> > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote:
> > >
> > > > I do think that we should make AT_EMPTY_PATH with a NULL path
> > > > "JustWork(tm)", because the stupid "look if the pathname is empty" is
> > > > horrible.
> > > >
> > > > But moving that check into getname() is *NOT* the right answer,
> > > > because by the time you get to getname(), you have already lost.
> > >
> > > Oops.  I'll try to get around of getname() too...
> > >
> > > > So the short-cut in vfs_fstatat() to never get a pathname is
> > > > disgusting - people should have used 'fstat()' - but it's _important_
> > > > disgusting.
> > >
> > > The problem is we don't have fstat() for LoongArch, and it'll be
> > > unusable on all 32-bit arch after 2037.
> > >
> > > And Arnd hates the idea adding fstat() for LoongArch because there would
> > > be one more 32-bit arch broken in 2037.
> > >
> > > Or should we just add something like "fstat_2037()"?
> > >
> >
> > In that case fstat is out of the question, but no problem.
> >
> > It was suggested to make AT_EMPTY_PATH + NULL pathname do the right
> > thing and have the syscalls short-circuit as needed.
> >
> > for statx it would look like this (except you are going to have
> > implement do_statx_by_fd):
> >
> > diff --git a/fs/stat.c b/fs/stat.c
> > index 16aa1f5ceec4..0afe72b320cc 100644
> > --- a/fs/stat.c
> > +++ b/fs/stat.c
> > @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx,
> >         int ret;
> >         struct filename *name;
> >
> > +       if (flags == AT_EMPTY_PATH && filename == NULL)
> > +               return do_statx_by_fd(...);
> > +
> >         name = getname_flags(filename, getname_statx_lookup_flags(flags));
> >         ret = do_statx(dfd, name, flags, mask, buffer);
> >         putname(name);
> >
> > and so on
> >
> > Personally I would prefer if fstatx was added instead, fwiw most
> > massaging in the area will be the same regardless.
>
> I do agree.  But if we do it *now* would it be "breaking the userspace"
> if some stupid program relies on fstatx() to return some error when the
> path is NULL?  The "stupid programs" may just exist in the wild...
>

You mean statx? fstatx would not accept a path to begin with.

Worry about some code breaking is why I suggested a dedicated flag
(AT_NO_PATH) myself in case fstatx is a no-go.

I am not convinced messing with AT_* flags is justified to begin with.
Any syscall which does not have a fd-only variant and is found to be
routinely used with AT_EMPTY_PATH should get one instead.

As far as I know that's only stat(due to a perf bug in glibc, now
fixed) and increasingly statx.

Suppose AT_EMPTY_PATH + NULL are to land and stat + statx get the
treatment. What about all the other syscalls? Sorting all that out is
quite a big of churn which is probably not worth it. But then there is
a feature gap in that they EFAULT for this pair while the stat* ones
don't and that's bound to raise confusion. Then one could add the
check in the bowels of path lookup in similar way you do did to
maintain the same behavior (but without per-syscall churn) and a big
fat warning that anyone getting there often needs to get patched with
short-circuiting the entire thing. So I think that's either a lot of
churn or nasty additions.

Regardless, as noted above, either making fstatx a thing or
short-circuiting mostly the same patching has to be done for
statx-related stuff.

However, this is not my call to make.
Xi Ruoyao June 23, 2024, 12:26 p.m. UTC | #7
On Sun, 2024-06-23 at 14:04 +0200, Mateusz Guzik wrote:
> On Sun, Jun 23, 2024 at 3:22 AM Xi Ruoyao <xry111@xry111.site> wrote:
> > 
> > On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote:
> > > On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site>
> > > wrote:
> > > > 
> > > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote:
> > > > 
> > > > > I do think that we should make AT_EMPTY_PATH with a NULL path
> > > > > "JustWork(tm)", because the stupid "look if the pathname is
> > > > > empty" is
> > > > > horrible.
> > > > > 
> > > > > But moving that check into getname() is *NOT* the right
> > > > > answer,
> > > > > because by the time you get to getname(), you have already
> > > > > lost.
> > > > 
> > > > Oops.  I'll try to get around of getname() too...
> > > > 
> > > > > So the short-cut in vfs_fstatat() to never get a pathname is
> > > > > disgusting - people should have used 'fstat()' - but it's
> > > > > _important_
> > > > > disgusting.
> > > > 
> > > > The problem is we don't have fstat() for LoongArch, and it'll be
> > > > unusable on all 32-bit arch after 2037.
> > > > 
> > > > And Arnd hates the idea adding fstat() for LoongArch because
> > > > there would
> > > > be one more 32-bit arch broken in 2037.
> > > > 
> > > > Or should we just add something like "fstat_2037()"?
> > > > 
> > > 
> > > In that case fstat is out of the question, but no problem.
> > > 
> > > It was suggested to make AT_EMPTY_PATH + NULL pathname do the
> > > right
> > > thing and have the syscalls short-circuit as needed.
> > > 
> > > for statx it would look like this (except you are going to have
> > > implement do_statx_by_fd):
> > > 
> > > diff --git a/fs/stat.c b/fs/stat.c
> > > index 16aa1f5ceec4..0afe72b320cc 100644
> > > --- a/fs/stat.c
> > > +++ b/fs/stat.c
> > > @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx,
> > >         int ret;
> > >         struct filename *name;
> > > 
> > > +       if (flags == AT_EMPTY_PATH && filename == NULL)
> > > +               return do_statx_by_fd(...);
> > > +
> > >         name = getname_flags(filename,
> > > getname_statx_lookup_flags(flags));
> > >         ret = do_statx(dfd, name, flags, mask, buffer);
> > >         putname(name);
> > > 
> > > and so on
> > > 
> > > Personally I would prefer if fstatx was added instead, fwiw most
> > > massaging in the area will be the same regardless.
> > 
> > I do agree.  But if we do it *now* would it be "breaking the
> > userspace"
> > if some stupid program relies on fstatx() to return some error when
> > the
> > path is NULL?  The "stupid programs" may just exist in the wild...
> > 
> 
> You mean statx? fstatx would not accept a path to begin with.

Yes I mean statx.

> Worry about some code breaking is why I suggested a dedicated flag
> (AT_NO_PATH) myself in case fstatx is a no-go.

I agree a dedicated flag will be better but I only came up with nasty
names like it AT_FORCE_EMPTY_PATH or AT_EMPTY_PATH_NOCHECK.  I think
your AT_NO_PATH is a better name.

> I am not convinced messing with AT_* flags is justified to begin with.
> Any syscall which does not have a fd-only variant and is found to be
> routinely used with AT_EMPTY_PATH should get one instead.
> 
> As far as I know that's only stat(due to a perf bug in glibc, now
> fixed) and increasingly statx.

And you mean fstatat instead of stat, I guess.

> Suppose AT_EMPTY_PATH + NULL are to land and stat + statx get the
> treatment. What about all the other syscalls? Sorting all that out is
> quite a big of churn which is probably not worth it. But then there is
> a feature gap in that they EFAULT for this pair while the stat* ones
> don't and that's bound to raise confusion. Then one could add the
> check in the bowels of path lookup in similar way you do did to
> maintain the same behavior (but without per-syscall churn) and a big
> fat warning that anyone getting there often needs to get patched with
> short-circuiting the entire thing. So I think that's either a lot of
> churn or nasty additions.
> 
> Regardless, as noted above, either making fstatx a thing or
> short-circuiting mostly the same patching has to be done for
> statx-related stuff.
> 
> However, this is not my call to make.
Christian Brauner June 25, 2024, 8:05 a.m. UTC | #8
On Sat, Jun 22, 2024 at 03:41:19PM GMT, Linus Torvalds wrote:
> On Sat, 22 Jun 2024 at 14:25, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > +cc Linus
> 
> Thanks.
> 
> > To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have
> > more work to do than fstat and its hypotethical statx counterpart:
> > - buf alloc/free for the path
> > - userspace access (very painful on x86_64 + SMAP)
> > - lockref acquire/release
> 
> Yes. That LOOKUP_EMPTY_NOCHECK is *not* the fix.

I have explicitly NAKed an approach like this months ago in [1]. So I'm
not sure why we're getting that patch in the first place tbh.

[1]: https://lore.kernel.org/lkml/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 37fb0a8aa09a..0c44a7ea5961 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -147,7 +147,13 @@  getname_flags(const char __user *filename, int flags, int *empty)
 	kname = (char *)result->iname;
 	result->name = kname;
 
-	len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
+	if (!(flags & LOOKUP_EMPTY_NOCHECK))
+		len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
+	else {
+		len = 0;
+		kname[0] = '\0';
+	}
+
 	if (unlikely(len < 0)) {
 		__putname(result);
 		return ERR_PTR(len);
diff --git a/fs/stat.c b/fs/stat.c
index 70bd3e888cfa..53944d3287cd 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -210,6 +210,8 @@  int getname_statx_lookup_flags(int flags)
 		lookup_flags |= LOOKUP_AUTOMOUNT;
 	if (flags & AT_EMPTY_PATH)
 		lookup_flags |= LOOKUP_EMPTY;
+	if (flags & AT_EMPTY_PATH_NOCHECK)
+		lookup_flags |= LOOKUP_EMPTY | LOOKUP_EMPTY_NOCHECK;
 
 	return lookup_flags;
 }
@@ -237,7 +239,7 @@  static int vfs_statx(int dfd, struct filename *filename, int flags,
 	int error;
 
 	if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH |
-		      AT_STATX_SYNC_TYPE))
+		      AT_STATX_SYNC_TYPE | AT_EMPTY_PATH_NOCHECK))
 		return -EINVAL;
 
 retry:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 967aa9ea9f96..def6a8a1b531 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -45,9 +45,13 @@  enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_IN_ROOT		0x100000 /* Treat dirfd as fs root. */
 #define LOOKUP_CACHED		0x200000 /* Only do cached lookup */
 #define LOOKUP_LINKAT_EMPTY	0x400000 /* Linkat request with empty path. */
+
 /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
 #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
 
+/* If this is set, LOOKUP_EMPTY must be set as well. */
+#define LOOKUP_EMPTY_NOCHECK	0x800000 /* Consider path empty. */
+
 extern int path_pts(struct path *path);
 
 extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h
index 738b97f22f36..24aec7ed6b0b 100644
--- a/include/trace/misc/fs.h
+++ b/include/trace/misc/fs.h
@@ -119,4 +119,5 @@ 
 		{ LOOKUP_NO_XDEV,	"NO_XDEV" }, \
 		{ LOOKUP_BENEATH,	"BENEATH" }, \
 		{ LOOKUP_IN_ROOT,	"IN_ROOT" }, \
+		{ LOOKUP_EMPTY_NOCHECK,	"EMPTY_NOCHECK" }, \
 		{ LOOKUP_CACHED,	"CACHED" })
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index c0bcc185fa48..aa2f68d80820 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -113,6 +113,9 @@ 
 #define AT_STATX_DONT_SYNC	0x4000	/* - Don't sync attributes with the server */
 
 #define AT_RECURSIVE		0x8000	/* Apply to the entire subtree */
+#define AT_EMPTY_PATH_NOCHECK	0x10000	/* Like AT_EMPTY_PATH, but the path
+                                           is not checked and it's just
+                                           assumed to be empty */
 
 /* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
 #define AT_HANDLE_FID		AT_REMOVEDIR	/* file handle is needed to