Message ID | 1535374564-8257-7-git-send-email-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Overlayfs stacked f_op fixes | expand |
On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote: > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) > on an overlayfs file. > > Suggested-by: Miklos Szeredi <mszeredi@redhat.com> > Fixes: d1d04ef8572b ("ovl: stack file ops") > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/overlayfs/file.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > index a4acd84591d4..42d2d034d85c 100644 > --- a/fs/overlayfs/file.c > +++ b/fs/overlayfs/file.c > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len > return ret; > } > > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice) > +{ > + struct fd real; > + int ret; > + > + ret = ovl_real_fdget(file, &real); > + if (ret) > + return ret; > + > + /* XXX: do we need mounter credentials? */ Given we are switching creds to mounter for rest of the file operations, so I would think we need to do it here as well to be consistent with this security model. Thanks Vivek > + ret = vfs_fadvise(real.file, offset, len, advice); > + > + fdput(real); > + > + return ret; > +} > + > static long ovl_real_ioctl(struct file *file, unsigned int cmd, > unsigned long arg) > { > @@ -499,6 +516,7 @@ const struct file_operations ovl_file_operations = { > .fsync = ovl_fsync, > .mmap = ovl_mmap, > .fallocate = ovl_fallocate, > + .fadvise = ovl_fadvise, > .unlocked_ioctl = ovl_ioctl, > .compat_ioctl = ovl_compat_ioctl, > > -- > 2.7.4 >
On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote: > > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) > > on an overlayfs file. > > > > Suggested-by: Miklos Szeredi <mszeredi@redhat.com> > > Fixes: d1d04ef8572b ("ovl: stack file ops") > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > fs/overlayfs/file.c | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > index a4acd84591d4..42d2d034d85c 100644 > > --- a/fs/overlayfs/file.c > > +++ b/fs/overlayfs/file.c > > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len > > return ret; > > } > > > > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice) > > +{ > > + struct fd real; > > + int ret; > > + > > + ret = ovl_real_fdget(file, &real); > > + if (ret) > > + return ret; > > + > > + /* XXX: do we need mounter credentials? */ > > Given we are switching creds to mounter for rest of the file operations, > so I would think we need to do it here as well to be consistent with > this security model. > Yeh, I guess so, although I did not see any security checks in fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ on the open file fadvise doesn't even bother with that... Miklos, let me know if you want me to add override_creds or will you add it yourself. Thanks, Amir.
On Mon, Aug 27, 2018 at 9:05 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote: >> >> On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote: >> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) >> > on an overlayfs file. >> > >> > Suggested-by: Miklos Szeredi <mszeredi@redhat.com> >> > Fixes: d1d04ef8572b ("ovl: stack file ops") >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> >> > --- >> > fs/overlayfs/file.c | 18 ++++++++++++++++++ >> > 1 file changed, 18 insertions(+) >> > >> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c >> > index a4acd84591d4..42d2d034d85c 100644 >> > --- a/fs/overlayfs/file.c >> > +++ b/fs/overlayfs/file.c >> > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len >> > return ret; >> > } >> > >> > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice) >> > +{ >> > + struct fd real; >> > + int ret; >> > + >> > + ret = ovl_real_fdget(file, &real); >> > + if (ret) >> > + return ret; >> > + >> > + /* XXX: do we need mounter credentials? */ >> >> Given we are switching creds to mounter for rest of the file operations, >> so I would think we need to do it here as well to be consistent with >> this security model. >> > > Yeh, I guess so, although I did not see any security checks in > fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ > on the open file fadvise doesn't even bother with that... > > Miklos, let me know if you want me to add override_creds or will > you add it yourself. I'll add it. Thanks, Miklos
On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) > on an overlayfs file. I was just looking into the existence of the "new" fadvise() method in the VFS being able to communicate application hints directly to the filesystem to see if it could be used to address the word size issue in https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new syscall, and came across this patch and the 4/6 patch that adds the vfs_fadvise() function itself (copied below for clarity). It seems to me that this implementation is broken? Only vfs_fadvise() is called from the fadvise64() syscall, and it will call f_op->fadvise() if the filesystem provides this method. Only overlayfs provides the .fadvise method today. However, it looks that ovl_fadvise() calls back into vfs_fadvise() again, in a seemingly endless loop? It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any filesystem that implements its own .fadvise method can do its own thing, and then call generic_fadvise() to handle the remaining MM-specific work. Thoughts? > +int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice) > +{ > + if (file->f_op->fadvise) > + return file->f_op->fadvise(file, offset, len, advice); > + > + return generic_fadvise(file, offset, len, advice); > +} > +EXPORT_SYMBOL(vfs_fadvise); > > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice) > +{ > + struct fd real; > + int ret; > + > + ret = ovl_real_fdget(file, &real); > + if (ret) > + return ret; > + > + /* XXX: do we need mounter credentials? */ > + ret = vfs_fadvise(real.file, offset, len, advice); > + > + fdput(real); > + > + return ret; > +} Cheers, Andreas
On Sat, Dec 28, 2019 at 7:49 AM Andreas Dilger <adilger@dilger.ca> wrote: > > On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > > > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) > > on an overlayfs file. > > I was just looking into the existence of the "new" fadvise() method in > the VFS being able to communicate application hints directly to the > filesystem to see if it could be used to address the word size issue in > https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new > syscall, and came across this patch and the 4/6 patch that adds the > vfs_fadvise() function itself (copied below for clarity). > > It seems to me that this implementation is broken? Only vfs_fadvise() > is called from the fadvise64() syscall, and it will call f_op->fadvise() > if the filesystem provides this method. Only overlayfs provides the > .fadvise method today. However, it looks that ovl_fadvise() calls back > into vfs_fadvise() again, in a seemingly endless loop? > You are confusing endless loop with recursion that has a stop condition. The entire concept of stacked filesystem is recursion back into vfs. This is essentially what most of the ovl file operations do, but they recurse on the "real.file", which is supposed to be on a filesystem with lower sb->s_stack_depth (FILESYSTEM_MAX_STACK_DEPTH is 2). > It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any > filesystem that implements its own .fadvise method can do its own thing, > and then call generic_fadvise() to handle the remaining MM-specific work. > > Thoughts? Sure makes sense. Overlayfs just doesn't need to call the generic helper. Thanks, Amir.
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index a4acd84591d4..42d2d034d85c 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len return ret; } +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice) +{ + struct fd real; + int ret; + + ret = ovl_real_fdget(file, &real); + if (ret) + return ret; + + /* XXX: do we need mounter credentials? */ + ret = vfs_fadvise(real.file, offset, len, advice); + + fdput(real); + + return ret; +} + static long ovl_real_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -499,6 +516,7 @@ const struct file_operations ovl_file_operations = { .fsync = ovl_fsync, .mmap = ovl_mmap, .fallocate = ovl_fallocate, + .fadvise = ovl_fadvise, .unlocked_ioctl = ovl_ioctl, .compat_ioctl = ovl_compat_ioctl,
Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) on an overlayfs file. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: d1d04ef8572b ("ovl: stack file ops") Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/overlayfs/file.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)