Message ID | 150216641255.11652.4204561328197919771.stgit@pluto.themaw.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 8, 2017, at 12:26 AM, Ian Kent wrote: > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, struct kstat *stat) > static inline int vfs_fstatat(int dfd, const char __user *filename, > struct kstat *stat, int flags) > { > - return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, > - stat, STATX_BASIC_STATS); > + return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); > } > static inline int vfs_fstat(int fd, struct kstat *stat) > { This is reverting the fstatat() prat of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=deccf497d804a4c5fca2dbfad2f104675a6f9102 Which itself seems weird to me - it looks like we were unconditionally forcing on AT_NO_AUTOMOUNT regardless of what userspace passed? So perhaps a Fixes: deccf497d804a4c5fca2dbfad2f104675a6f9102 is appropriate here? I understand that for stat()/lstat() we didn't expose the option to userspace, so the behavior was...ah, there's this note in man-pages (man-pages-4.09-3.fc26.noarch): > On Linux, lstat() will generally not trigger automounter action, whereas stat() will (but see fstatat(2)). I have no idea of the history here, but maybe it makes sense to drop the AT_NO_AUTOMOUNT from the vfs_stat() too?
On 08/08/17 21:11, Colin Walters wrote: > On Tue, Aug 8, 2017, at 12:26 AM, Ian Kent wrote: > >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, struct kstat *stat) >> static inline int vfs_fstatat(int dfd, const char __user *filename, >> struct kstat *stat, int flags) >> { >> - return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, >> - stat, STATX_BASIC_STATS); >> + return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); >> } >> static inline int vfs_fstat(int fd, struct kstat *stat) >> { > > This is reverting the fstatat() prat of > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=deccf497d804a4c5fca2dbfad2f104675a6f9102 > Which itself seems weird to me - it looks like we were unconditionally > forcing on AT_NO_AUTOMOUNT regardless of what userspace passed? > So perhaps a > Fixes: deccf497d804a4c5fca2dbfad2f104675a6f9102 > is appropriate here? David posted this at my request. I asked him to do it because, when I saw this, I thought restoring the semantics to what they were before they were changed needed to be done as quickly as possible. That was so that I could then work on fixing the AT_NO_AUTOMOUNT not being honored with fstatat(2). > > I understand that for stat()/lstat() we didn't expose the option to userspace, > so the behavior was...ah, there's this note in man-pages (man-pages-4.09-3.fc26.noarch): > >> On Linux, lstat() will generally not trigger automounter action, whereas stat() will (but see fstatat(2)). > > I have no idea of the history here, but maybe it makes sense to drop > the AT_NO_AUTOMOUNT from the vfs_stat() too? > I thought I had talked about the history in the patch description but I guess it's not clear and isn't detailed enough for people that haven't been close to the development over time. Historically stat family calls were not supposed to trigger automounts because that can easily lead to mount storms that are really bad for large autofs mount maps. But the mount storm problem was mostly only evident for autofs maps that used the "browse" option, the non-negative dentry case. The negative dentry case always triggered an automount regardless of the system call. Because of the move in user space to mostly always use proc filesystem mount tables where there can be many more mount entries than were present in the text based mount tables it's critical to not perform mount callbacks that aren't absolutely essentially. At this point that means to me going over the stat(2) system call behavior and making sure it is only calling back where necessary. That's because that is where it's expected automounts won't be triggered and so should have least impact. So it's the negative dentry handling in follow_automount() you should be thinking about in terms of impact rather than the actual fstatat(2) change. If man pages need to change then they need to change. AFAICT, as I said in the patch description, this should not cause regressions but I can't be certain. In any case it is in keeping with the historical "stat family system calls shouldn't trigger automounts" mantra needed from the beginning. Also notice that the negative dentry handling change should only affect autofs as other kernel uses will be triggering automounts for positive dentrys. Hopefully this doesn't sound to aggressive, I don't mean it to sound that way. Ian
Ian Kent <raven@themaw.net> wrote: > In order to handle the AT_NO_AUTOMOUNT for both system calls the > negative dentry case in follow_automount() needs to be changed to > return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other > required flags are clear). Should the be EREMOTE instead of ENOENT? David
On 09/08/17 16:39, David Howells wrote: > Ian Kent <raven@themaw.net> wrote: > >> In order to handle the AT_NO_AUTOMOUNT for both system calls the >> negative dentry case in follow_automount() needs to be changed to >> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other >> required flags are clear). > > Should the be EREMOTE instead of ENOENT? I thought about that and ended up thinking ENOENT was more sensible but I'll look at it again. Thanks Ian
On 09/08/17 17:51, Ian Kent wrote: > On 09/08/17 16:39, David Howells wrote: >> Ian Kent <raven@themaw.net> wrote: >> >>> In order to handle the AT_NO_AUTOMOUNT for both system calls the >>> negative dentry case in follow_automount() needs to be changed to >>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other >>> required flags are clear). >> >> Should the be EREMOTE instead of ENOENT? > > I thought about that and ended up thinking ENOENT was more sensible > but I'll look at it again. I think EREMOTE and ENOENT both are inaccurate. There's no way to know if the negative dentry corresponds to a valid map key, and we've seen increasing lookups from userspace applications for invalid directories, so I'm not sure. I went with ENOENT but I guess we could use EREMOTE, what's your thinking on why EREMOTE might be better than ENOENT? Ian
diff --git a/fs/namei.c b/fs/namei.c index ddb6a7c2b3d4..1180f9c58093 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct nameidata *nd, * of the daemon to instantiate them before they can be used. */ if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | - LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && - path->dentry->d_inode) - return -EISDIR; + LOOKUP_OPEN | LOOKUP_CREATE | + LOOKUP_AUTOMOUNT))) { + /* Positive dentry that isn't meant to trigger an + * automount, EISDIR will allow it to be used, + * otherwise there's no mount here "now" so return + * ENOENT. + */ + if (path->dentry->d_inode) + return -EISDIR; + else + return -ENOENT; + } if (path->dentry->d_sb->s_user_ns != &init_user_ns) return -EACCES; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6e1fd5d21248..37c96f52e48e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, struct kstat *stat) static inline int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, int flags) { - return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, - stat, STATX_BASIC_STATS); + return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); } static inline int vfs_fstat(int fd, struct kstat *stat) {
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an automount by the call. But this flag is unconditionally cleared for all stat family system calls except statx(). stat family system calls have always triggered mount requests for the negative dentry case in follow_automount() which is intended but prevents the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled. In order to handle the AT_NO_AUTOMOUNT for both system calls the negative dentry case in follow_automount() needs to be changed to return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are clear). AFAICT this change doesn't have any noticable side effects and may, in some use cases (although I didn't see it in testing) prevent unnecessary callbacks to the automount daemon. It's also possible that a stat family call has been made with a path that is in the process of being mounted by some other process. But stat family calls should return the automount state of the path as it is "now" so it shouldn't wait for mount completion. This is the same semantic as the positive dentry case already handled. Signed-off-by: Ian Kent <raven@themaw.net> Cc: David Howells <dhowells@redhat.com> Cc: Colin Walters <walters@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> --- fs/namei.c | 15 ++++++++++++--- include/linux/fs.h | 3 +-- 2 files changed, 13 insertions(+), 5 deletions(-)