Message ID | 20240622105621.7922-1-xry111@xry111.site (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vfs: Add AT_EMPTY_PATH_NOCHECK as unchecked AT_EMPTY_PATH | expand |
+cc Linus On Sat, Jun 22, 2024 at 06:56:08PM +0800, Xi Ruoyao wrote: > It's cheap to check if the path is empty in the userspace, but expensive > to check if a userspace string is empty from the kernel. So using statx > and AT_EMPTY_PATH to implement fstat is slower than a "native" fstat > call. But for arch/loongarch fstat does not exist so we have to use > statx, and on all 32-bit architectures we must use statx after 2037. > And seccomp also cannot audit AT_EMPTY_PATH properly because it cannot > check if path is empty. > > To resolve these issues, add a relaxed version of AT_EMPTY_PATH: it does > not check if the path is empty, but just assumes the path is empty and > then behaves like AT_EMPTY_PATH. > > Link: https://sourceware.org/pipermail/libc-alpha/2023-September/151364.html > Link: https://lore.kernel.org/loongarch/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name/ imo the thing to do is to add fstat for your arch and add fstatx for everyone. My argument for fstatx specifically is that Rust uses statx instead of fstat and is growing in popularity. To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have more work to do than fstat and its hypotethical statx counterpart: - buf alloc/free for the path - userspace access (very painful on x86_64 + SMAP) - lockref acquire/release (and other things concerning lookup itself which I'm going to ignore here) Your patch avoids the peek at userspace, but the other overhead remains. In particular the lockref cycle, apart from adding work single-threaded, adds avoidable serialization against other threads issuing stat(x) on the same file. iow your patch still leaves performance on the table and I don't think it is necessary. If the flag is the way to go (I don't see why though), I would suggest something like AT_NO_PATH and requring NULL as the path argument (or some other predefined "there is nothing here" constant). I wanted to type up a proposal for fstatx (+ patch) some time ago, but some refactoring was needed to make it happen and put it on the back burner. Perhaps you would be willing to pick it up, assuming the vfs folk are oke with it. Regardless of what happens with statx or this patch you should probably add fstat anyway. If there are any other perf-sensitive syscalls which don't have their fd-only variants they should be plugged to, but I can't think of anything. > --- > fs/namei.c | 8 +++++++- > fs/stat.c | 4 +++- > include/linux/namei.h | 4 ++++ > include/trace/misc/fs.h | 1 + > include/uapi/linux/fcntl.h | 3 +++ > 5 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 37fb0a8aa09a..0c44a7ea5961 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -147,7 +147,13 @@ getname_flags(const char __user *filename, int flags, int *empty) > kname = (char *)result->iname; > result->name = kname; > > - len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX); > + if (!(flags & LOOKUP_EMPTY_NOCHECK)) > + len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX); > + else { > + len = 0; > + kname[0] = '\0'; > + } > + > if (unlikely(len < 0)) { > __putname(result); > return ERR_PTR(len); > diff --git a/fs/stat.c b/fs/stat.c > index 70bd3e888cfa..53944d3287cd 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -210,6 +210,8 @@ int getname_statx_lookup_flags(int flags) > lookup_flags |= LOOKUP_AUTOMOUNT; > if (flags & AT_EMPTY_PATH) > lookup_flags |= LOOKUP_EMPTY; > + if (flags & AT_EMPTY_PATH_NOCHECK) > + lookup_flags |= LOOKUP_EMPTY | LOOKUP_EMPTY_NOCHECK; > > return lookup_flags; > } > @@ -237,7 +239,7 @@ static int vfs_statx(int dfd, struct filename *filename, int flags, > int error; > > if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH | > - AT_STATX_SYNC_TYPE)) > + AT_STATX_SYNC_TYPE | AT_EMPTY_PATH_NOCHECK)) > return -EINVAL; > > retry: > diff --git a/include/linux/namei.h b/include/linux/namei.h > index 967aa9ea9f96..def6a8a1b531 100644 > --- a/include/linux/namei.h > +++ b/include/linux/namei.h > @@ -45,9 +45,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT}; > #define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */ > #define LOOKUP_CACHED 0x200000 /* Only do cached lookup */ > #define LOOKUP_LINKAT_EMPTY 0x400000 /* Linkat request with empty path. */ > + > /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ > #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT) > > +/* If this is set, LOOKUP_EMPTY must be set as well. */ > +#define LOOKUP_EMPTY_NOCHECK 0x800000 /* Consider path empty. */ > + > extern int path_pts(struct path *path); > > extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); > diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h > index 738b97f22f36..24aec7ed6b0b 100644 > --- a/include/trace/misc/fs.h > +++ b/include/trace/misc/fs.h > @@ -119,4 +119,5 @@ > { LOOKUP_NO_XDEV, "NO_XDEV" }, \ > { LOOKUP_BENEATH, "BENEATH" }, \ > { LOOKUP_IN_ROOT, "IN_ROOT" }, \ > + { LOOKUP_EMPTY_NOCHECK, "EMPTY_NOCHECK" }, \ > { LOOKUP_CACHED, "CACHED" }) > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > index c0bcc185fa48..aa2f68d80820 100644 > --- a/include/uapi/linux/fcntl.h > +++ b/include/uapi/linux/fcntl.h > @@ -113,6 +113,9 @@ > #define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ > > #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > +#define AT_EMPTY_PATH_NOCHECK 0x10000 /* Like AT_EMPTY_PATH, but the path > + is not checked and it's just > + assumed to be empty */ > > /* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > -- > 2.45.2 >
On Sat, 22 Jun 2024 at 14:25, Mateusz Guzik <mjguzik@gmail.com> wrote: > > +cc Linus Thanks. > To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have > more work to do than fstat and its hypotethical statx counterpart: > - buf alloc/free for the path > - userspace access (very painful on x86_64 + SMAP) > - lockref acquire/release Yes. That LOOKUP_EMPTY_NOCHECK is *not* the fix. I do think that we should make AT_EMPTY_PATH with a NULL path "JustWork(tm)", because the stupid "look if the pathname is empty" is horrible. But moving that check into getname() is *NOT* the right answer, because by the time you get to getname(), you have already lost. There's a very real reason why vfs_fstatat() catches this empty case early, and never goes to filename lookup at all. You don't want to generate a 'struct path' from the 'int fd', because you want to never get anywhere close to that path, and instead only ever need a 'struct fd' that can be looked up much more cheaply (particularly if not in a threaded environment). So the short-cut in vfs_fstatat() to never get a pathname is disgusting - people should have used 'fstat()' - but it's _important_ disgusting. This thing that tries to short-circuit things at the path level is too late. Linus
On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote: > I do think that we should make AT_EMPTY_PATH with a NULL path > "JustWork(tm)", because the stupid "look if the pathname is empty" is > horrible. > > But moving that check into getname() is *NOT* the right answer, > because by the time you get to getname(), you have already lost. Oops. I'll try to get around of getname() too... > So the short-cut in vfs_fstatat() to never get a pathname is > disgusting - people should have used 'fstat()' - but it's _important_ > disgusting. The problem is we don't have fstat() for LoongArch, and it'll be unusable on all 32-bit arch after 2037. And Arnd hates the idea adding fstat() for LoongArch because there would be one more 32-bit arch broken in 2037. Or should we just add something like "fstat_2037()"?
On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote: > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote: > > > I do think that we should make AT_EMPTY_PATH with a NULL path > > "JustWork(tm)", because the stupid "look if the pathname is empty" is > > horrible. > > > > But moving that check into getname() is *NOT* the right answer, > > because by the time you get to getname(), you have already lost. > > Oops. I'll try to get around of getname() too... > > > So the short-cut in vfs_fstatat() to never get a pathname is > > disgusting - people should have used 'fstat()' - but it's _important_ > > disgusting. > > The problem is we don't have fstat() for LoongArch, and it'll be > unusable on all 32-bit arch after 2037. > > And Arnd hates the idea adding fstat() for LoongArch because there would > be one more 32-bit arch broken in 2037. > > Or should we just add something like "fstat_2037()"? > In that case fstat is out of the question, but no problem. It was suggested to make AT_EMPTY_PATH + NULL pathname do the right thing and have the syscalls short-circuit as needed. for statx it would look like this (except you are going to have implement do_statx_by_fd): diff --git a/fs/stat.c b/fs/stat.c index 16aa1f5ceec4..0afe72b320cc 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx, int ret; struct filename *name; + if (flags == AT_EMPTY_PATH && filename == NULL) + return do_statx_by_fd(...); + name = getname_flags(filename, getname_statx_lookup_flags(flags)); ret = do_statx(dfd, name, flags, mask, buffer); putname(name); and so on Personally I would prefer if fstatx was added instead, fwiw most massaging in the area will be the same regardless.
On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote: > On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote: > > > > > I do think that we should make AT_EMPTY_PATH with a NULL path > > > "JustWork(tm)", because the stupid "look if the pathname is empty" is > > > horrible. > > > > > > But moving that check into getname() is *NOT* the right answer, > > > because by the time you get to getname(), you have already lost. > > > > Oops. I'll try to get around of getname() too... > > > > > So the short-cut in vfs_fstatat() to never get a pathname is > > > disgusting - people should have used 'fstat()' - but it's _important_ > > > disgusting. > > > > The problem is we don't have fstat() for LoongArch, and it'll be > > unusable on all 32-bit arch after 2037. > > > > And Arnd hates the idea adding fstat() for LoongArch because there would > > be one more 32-bit arch broken in 2037. > > > > Or should we just add something like "fstat_2037()"? > > > > In that case fstat is out of the question, but no problem. > > It was suggested to make AT_EMPTY_PATH + NULL pathname do the right > thing and have the syscalls short-circuit as needed. > > for statx it would look like this (except you are going to have > implement do_statx_by_fd): > > diff --git a/fs/stat.c b/fs/stat.c > index 16aa1f5ceec4..0afe72b320cc 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx, > int ret; > struct filename *name; > > + if (flags == AT_EMPTY_PATH && filename == NULL) > + return do_statx_by_fd(...); > + > name = getname_flags(filename, getname_statx_lookup_flags(flags)); > ret = do_statx(dfd, name, flags, mask, buffer); > putname(name); > > and so on > > Personally I would prefer if fstatx was added instead, fwiw most > massaging in the area will be the same regardless. I do agree. But if we do it *now* would it be "breaking the userspace" if some stupid program relies on fstatx() to return some error when the path is NULL? The "stupid programs" may just exist in the wild... I remember recently we have to pretend pidfd is stupid to please some stupid programs... https://lore.kernel.org/all/20240521-girlanden-zehnfach-1bff7eb9218c@brauner/
On Sun, Jun 23, 2024 at 3:22 AM Xi Ruoyao <xry111@xry111.site> wrote: > > On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote: > > On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote: > > > > > > > I do think that we should make AT_EMPTY_PATH with a NULL path > > > > "JustWork(tm)", because the stupid "look if the pathname is empty" is > > > > horrible. > > > > > > > > But moving that check into getname() is *NOT* the right answer, > > > > because by the time you get to getname(), you have already lost. > > > > > > Oops. I'll try to get around of getname() too... > > > > > > > So the short-cut in vfs_fstatat() to never get a pathname is > > > > disgusting - people should have used 'fstat()' - but it's _important_ > > > > disgusting. > > > > > > The problem is we don't have fstat() for LoongArch, and it'll be > > > unusable on all 32-bit arch after 2037. > > > > > > And Arnd hates the idea adding fstat() for LoongArch because there would > > > be one more 32-bit arch broken in 2037. > > > > > > Or should we just add something like "fstat_2037()"? > > > > > > > In that case fstat is out of the question, but no problem. > > > > It was suggested to make AT_EMPTY_PATH + NULL pathname do the right > > thing and have the syscalls short-circuit as needed. > > > > for statx it would look like this (except you are going to have > > implement do_statx_by_fd): > > > > diff --git a/fs/stat.c b/fs/stat.c > > index 16aa1f5ceec4..0afe72b320cc 100644 > > --- a/fs/stat.c > > +++ b/fs/stat.c > > @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx, > > int ret; > > struct filename *name; > > > > + if (flags == AT_EMPTY_PATH && filename == NULL) > > + return do_statx_by_fd(...); > > + > > name = getname_flags(filename, getname_statx_lookup_flags(flags)); > > ret = do_statx(dfd, name, flags, mask, buffer); > > putname(name); > > > > and so on > > > > Personally I would prefer if fstatx was added instead, fwiw most > > massaging in the area will be the same regardless. > > I do agree. But if we do it *now* would it be "breaking the userspace" > if some stupid program relies on fstatx() to return some error when the > path is NULL? The "stupid programs" may just exist in the wild... > You mean statx? fstatx would not accept a path to begin with. Worry about some code breaking is why I suggested a dedicated flag (AT_NO_PATH) myself in case fstatx is a no-go. I am not convinced messing with AT_* flags is justified to begin with. Any syscall which does not have a fd-only variant and is found to be routinely used with AT_EMPTY_PATH should get one instead. As far as I know that's only stat(due to a perf bug in glibc, now fixed) and increasingly statx. Suppose AT_EMPTY_PATH + NULL are to land and stat + statx get the treatment. What about all the other syscalls? Sorting all that out is quite a big of churn which is probably not worth it. But then there is a feature gap in that they EFAULT for this pair while the stat* ones don't and that's bound to raise confusion. Then one could add the check in the bowels of path lookup in similar way you do did to maintain the same behavior (but without per-syscall churn) and a big fat warning that anyone getting there often needs to get patched with short-circuiting the entire thing. So I think that's either a lot of churn or nasty additions. Regardless, as noted above, either making fstatx a thing or short-circuiting mostly the same patching has to be done for statx-related stuff. However, this is not my call to make.
On Sun, 2024-06-23 at 14:04 +0200, Mateusz Guzik wrote: > On Sun, Jun 23, 2024 at 3:22 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > On Sun, 2024-06-23 at 03:07 +0200, Mateusz Guzik wrote: > > > On Sun, Jun 23, 2024 at 2:59 AM Xi Ruoyao <xry111@xry111.site> > > > wrote: > > > > > > > > On Sat, 2024-06-22 at 15:41 -0700, Linus Torvalds wrote: > > > > > > > > > I do think that we should make AT_EMPTY_PATH with a NULL path > > > > > "JustWork(tm)", because the stupid "look if the pathname is > > > > > empty" is > > > > > horrible. > > > > > > > > > > But moving that check into getname() is *NOT* the right > > > > > answer, > > > > > because by the time you get to getname(), you have already > > > > > lost. > > > > > > > > Oops. I'll try to get around of getname() too... > > > > > > > > > So the short-cut in vfs_fstatat() to never get a pathname is > > > > > disgusting - people should have used 'fstat()' - but it's > > > > > _important_ > > > > > disgusting. > > > > > > > > The problem is we don't have fstat() for LoongArch, and it'll be > > > > unusable on all 32-bit arch after 2037. > > > > > > > > And Arnd hates the idea adding fstat() for LoongArch because > > > > there would > > > > be one more 32-bit arch broken in 2037. > > > > > > > > Or should we just add something like "fstat_2037()"? > > > > > > > > > > In that case fstat is out of the question, but no problem. > > > > > > It was suggested to make AT_EMPTY_PATH + NULL pathname do the > > > right > > > thing and have the syscalls short-circuit as needed. > > > > > > for statx it would look like this (except you are going to have > > > implement do_statx_by_fd): > > > > > > diff --git a/fs/stat.c b/fs/stat.c > > > index 16aa1f5ceec4..0afe72b320cc 100644 > > > --- a/fs/stat.c > > > +++ b/fs/stat.c > > > @@ -710,6 +710,9 @@ SYSCALL_DEFINE5(statx, > > > int ret; > > > struct filename *name; > > > > > > + if (flags == AT_EMPTY_PATH && filename == NULL) > > > + return do_statx_by_fd(...); > > > + > > > name = getname_flags(filename, > > > getname_statx_lookup_flags(flags)); > > > ret = do_statx(dfd, name, flags, mask, buffer); > > > putname(name); > > > > > > and so on > > > > > > Personally I would prefer if fstatx was added instead, fwiw most > > > massaging in the area will be the same regardless. > > > > I do agree. But if we do it *now* would it be "breaking the > > userspace" > > if some stupid program relies on fstatx() to return some error when > > the > > path is NULL? The "stupid programs" may just exist in the wild... > > > > You mean statx? fstatx would not accept a path to begin with. Yes I mean statx. > Worry about some code breaking is why I suggested a dedicated flag > (AT_NO_PATH) myself in case fstatx is a no-go. I agree a dedicated flag will be better but I only came up with nasty names like it AT_FORCE_EMPTY_PATH or AT_EMPTY_PATH_NOCHECK. I think your AT_NO_PATH is a better name. > I am not convinced messing with AT_* flags is justified to begin with. > Any syscall which does not have a fd-only variant and is found to be > routinely used with AT_EMPTY_PATH should get one instead. > > As far as I know that's only stat(due to a perf bug in glibc, now > fixed) and increasingly statx. And you mean fstatat instead of stat, I guess. > Suppose AT_EMPTY_PATH + NULL are to land and stat + statx get the > treatment. What about all the other syscalls? Sorting all that out is > quite a big of churn which is probably not worth it. But then there is > a feature gap in that they EFAULT for this pair while the stat* ones > don't and that's bound to raise confusion. Then one could add the > check in the bowels of path lookup in similar way you do did to > maintain the same behavior (but without per-syscall churn) and a big > fat warning that anyone getting there often needs to get patched with > short-circuiting the entire thing. So I think that's either a lot of > churn or nasty additions. > > Regardless, as noted above, either making fstatx a thing or > short-circuiting mostly the same patching has to be done for > statx-related stuff. > > However, this is not my call to make.
On Sat, Jun 22, 2024 at 03:41:19PM GMT, Linus Torvalds wrote: > On Sat, 22 Jun 2024 at 14:25, Mateusz Guzik <mjguzik@gmail.com> wrote: > > > > +cc Linus > > Thanks. > > > To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have > > more work to do than fstat and its hypotethical statx counterpart: > > - buf alloc/free for the path > > - userspace access (very painful on x86_64 + SMAP) > > - lockref acquire/release > > Yes. That LOOKUP_EMPTY_NOCHECK is *not* the fix. I have explicitly NAKed an approach like this months ago in [1]. So I'm not sure why we're getting that patch in the first place tbh. [1]: https://lore.kernel.org/lkml/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name
diff --git a/fs/namei.c b/fs/namei.c index 37fb0a8aa09a..0c44a7ea5961 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -147,7 +147,13 @@ getname_flags(const char __user *filename, int flags, int *empty) kname = (char *)result->iname; result->name = kname; - len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX); + if (!(flags & LOOKUP_EMPTY_NOCHECK)) + len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX); + else { + len = 0; + kname[0] = '\0'; + } + if (unlikely(len < 0)) { __putname(result); return ERR_PTR(len); diff --git a/fs/stat.c b/fs/stat.c index 70bd3e888cfa..53944d3287cd 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -210,6 +210,8 @@ int getname_statx_lookup_flags(int flags) lookup_flags |= LOOKUP_AUTOMOUNT; if (flags & AT_EMPTY_PATH) lookup_flags |= LOOKUP_EMPTY; + if (flags & AT_EMPTY_PATH_NOCHECK) + lookup_flags |= LOOKUP_EMPTY | LOOKUP_EMPTY_NOCHECK; return lookup_flags; } @@ -237,7 +239,7 @@ static int vfs_statx(int dfd, struct filename *filename, int flags, int error; if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH | - AT_STATX_SYNC_TYPE)) + AT_STATX_SYNC_TYPE | AT_EMPTY_PATH_NOCHECK)) return -EINVAL; retry: diff --git a/include/linux/namei.h b/include/linux/namei.h index 967aa9ea9f96..def6a8a1b531 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -45,9 +45,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT}; #define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */ #define LOOKUP_CACHED 0x200000 /* Only do cached lookup */ #define LOOKUP_LINKAT_EMPTY 0x400000 /* Linkat request with empty path. */ + /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT) +/* If this is set, LOOKUP_EMPTY must be set as well. */ +#define LOOKUP_EMPTY_NOCHECK 0x800000 /* Consider path empty. */ + extern int path_pts(struct path *path); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h index 738b97f22f36..24aec7ed6b0b 100644 --- a/include/trace/misc/fs.h +++ b/include/trace/misc/fs.h @@ -119,4 +119,5 @@ { LOOKUP_NO_XDEV, "NO_XDEV" }, \ { LOOKUP_BENEATH, "BENEATH" }, \ { LOOKUP_IN_ROOT, "IN_ROOT" }, \ + { LOOKUP_EMPTY_NOCHECK, "EMPTY_NOCHECK" }, \ { LOOKUP_CACHED, "CACHED" }) diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index c0bcc185fa48..aa2f68d80820 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -113,6 +113,9 @@ #define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +#define AT_EMPTY_PATH_NOCHECK 0x10000 /* Like AT_EMPTY_PATH, but the path + is not checked and it's just + assumed to be empty */ /* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
It's cheap to check if the path is empty in the userspace, but expensive to check if a userspace string is empty from the kernel. So using statx and AT_EMPTY_PATH to implement fstat is slower than a "native" fstat call. But for arch/loongarch fstat does not exist so we have to use statx, and on all 32-bit architectures we must use statx after 2037. And seccomp also cannot audit AT_EMPTY_PATH properly because it cannot check if path is empty. To resolve these issues, add a relaxed version of AT_EMPTY_PATH: it does not check if the path is empty, but just assumes the path is empty and then behaves like AT_EMPTY_PATH. Link: https://sourceware.org/pipermail/libc-alpha/2023-September/151364.html Link: https://lore.kernel.org/loongarch/599df4a3-47a4-49be-9c81-8e21ea1f988a@xen0n.name/ Cc: Christian Brauner <brauner@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alejandro Colomar <alx@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Icenowy Zheng <uwu@icenowy.me> Cc: linux-fsdevel@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: loongarch@lists.linux.dev Cc: linux-kernel@vger.kernel.org Signed-off-by: Xi Ruoyao <xry111@xry111.site> --- fs/namei.c | 8 +++++++- fs/stat.c | 4 +++- include/linux/namei.h | 4 ++++ include/trace/misc/fs.h | 1 + include/uapi/linux/fcntl.h | 3 +++ 5 files changed, 18 insertions(+), 2 deletions(-)