diff mbox series

[v3,6/6] ovl: add ovl_fadvise()

Message ID 1535374564-8257-7-git-send-email-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series Overlayfs stacked f_op fixes | expand

Commit Message

Amir Goldstein Aug. 27, 2018, 12:56 p.m. UTC
Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
on an overlayfs file.

Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Vivek Goyal Aug. 27, 2018, 6:52 p.m. UTC | #1
On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
> Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> on an overlayfs file.
> 
> Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
> Fixes: d1d04ef8572b ("ovl: stack file ops")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/overlayfs/file.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index a4acd84591d4..42d2d034d85c 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
>  	return ret;
>  }
>  
> +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	struct fd real;
> +	int ret;
> +
> +	ret = ovl_real_fdget(file, &real);
> +	if (ret)
> +		return ret;
> +
> +	/* XXX: do we need mounter credentials? */

Given we are switching creds to mounter for rest of the file operations,
so I would think we need to do it here as well to be consistent with
this security model.

Thanks
Vivek
> +	ret = vfs_fadvise(real.file, offset, len, advice);
> +
> +	fdput(real);
> +
> +	return ret;
> +}
> +
>  static long ovl_real_ioctl(struct file *file, unsigned int cmd,
>  			   unsigned long arg)
>  {
> @@ -499,6 +516,7 @@ const struct file_operations ovl_file_operations = {
>  	.fsync		= ovl_fsync,
>  	.mmap		= ovl_mmap,
>  	.fallocate	= ovl_fallocate,
> +	.fadvise	= ovl_fadvise,
>  	.unlocked_ioctl	= ovl_ioctl,
>  	.compat_ioctl	= ovl_compat_ioctl,
>  
> -- 
> 2.7.4
>
Amir Goldstein Aug. 27, 2018, 7:05 p.m. UTC | #2
On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> > on an overlayfs file.
> >
> > Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
> > Fixes: d1d04ef8572b ("ovl: stack file ops")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/overlayfs/file.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index a4acd84591d4..42d2d034d85c 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
> >       return ret;
> >  }
> >
> > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> > +{
> > +     struct fd real;
> > +     int ret;
> > +
> > +     ret = ovl_real_fdget(file, &real);
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* XXX: do we need mounter credentials? */
>
> Given we are switching creds to mounter for rest of the file operations,
> so I would think we need to do it here as well to be consistent with
> this security model.
>

Yeh, I guess so, although I did not see any security checks in
fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ
on the open file fadvise doesn't even bother with that...

Miklos, let me know if you want me to add override_creds or will
you add it yourself.

Thanks,
Amir.
Miklos Szeredi Aug. 27, 2018, 7:29 p.m. UTC | #3
On Mon, Aug 27, 2018 at 9:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Aug 27, 2018 at 9:52 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> On Mon, Aug 27, 2018 at 03:56:04PM +0300, Amir Goldstein wrote:
>> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
>> > on an overlayfs file.
>> >
>> > Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
>> > Fixes: d1d04ef8572b ("ovl: stack file ops")
>> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> > ---
>> >  fs/overlayfs/file.c | 18 ++++++++++++++++++
>> >  1 file changed, 18 insertions(+)
>> >
>> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
>> > index a4acd84591d4..42d2d034d85c 100644
>> > --- a/fs/overlayfs/file.c
>> > +++ b/fs/overlayfs/file.c
>> > @@ -331,6 +331,23 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
>> >       return ret;
>> >  }
>> >
>> > +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>> > +{
>> > +     struct fd real;
>> > +     int ret;
>> > +
>> > +     ret = ovl_real_fdget(file, &real);
>> > +     if (ret)
>> > +             return ret;
>> > +
>> > +     /* XXX: do we need mounter credentials? */
>>
>> Given we are switching creds to mounter for rest of the file operations,
>> so I would think we need to do it here as well to be consistent with
>> this security model.
>>
>
> Yeh, I guess so, although I did not see any security checks in
> fadvise64(2) syscall. readahead(2) at least checks for FMODE_READ
> on the open file fadvise doesn't even bother with that...
>
> Miklos, let me know if you want me to add override_creds or will
> you add it yourself.

I'll add it.

Thanks,
Miklos
Andreas Dilger Dec. 28, 2019, 5:49 a.m. UTC | #4
On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> on an overlayfs file.

I was just looking into the existence of the "new" fadvise() method in
the VFS being able to communicate application hints directly to the
filesystem to see if it could be used to address the word size issue in
https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new
syscall, and came across this patch and the 4/6 patch that adds the
vfs_fadvise() function itself (copied below for clarity).

It seems to me that this implementation is broken?  Only vfs_fadvise()
is called from the fadvise64() syscall, and it will call f_op->fadvise()
if the filesystem provides this method.  Only overlayfs provides the
.fadvise method today.  However, it looks that ovl_fadvise() calls back
into vfs_fadvise() again, in a seemingly endless loop?

It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any
filesystem that implements its own .fadvise method can do its own thing,
and then call generic_fadvise() to handle the remaining MM-specific work.

Thoughts?

> +int vfs_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	if (file->f_op->fadvise)
> +		return file->f_op->fadvise(file, offset, len, advice);
> +
> +	return generic_fadvise(file, offset, len, advice);
> +}
> +EXPORT_SYMBOL(vfs_fadvise);
> 
> +int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	struct fd real;
> +	int ret;
> +
> +	ret = ovl_real_fdget(file, &real);
> +	if (ret)
> +		return ret;
> +
> +	/* XXX: do we need mounter credentials? */
> +	ret = vfs_fadvise(real.file, offset, len, advice);
> +
> +	fdput(real);
> +
> +	return ret;
> +}

Cheers, Andreas
Amir Goldstein Dec. 28, 2019, 10:10 a.m. UTC | #5
On Sat, Dec 28, 2019 at 7:49 AM Andreas Dilger <adilger@dilger.ca> wrote:
>
> On Aug 27, 2018, at 6:56 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2)
> > on an overlayfs file.
>
> I was just looking into the existence of the "new" fadvise() method in
> the VFS being able to communicate application hints directly to the
> filesystem to see if it could be used to address the word size issue in
> https://bugzilla.kernel.org/show_bug.cgi?id=205957 without adding a new
> syscall, and came across this patch and the 4/6 patch that adds the
> vfs_fadvise() function itself (copied below for clarity).
>
> It seems to me that this implementation is broken?  Only vfs_fadvise()
> is called from the fadvise64() syscall, and it will call f_op->fadvise()
> if the filesystem provides this method.  Only overlayfs provides the
> .fadvise method today.  However, it looks that ovl_fadvise() calls back
> into vfs_fadvise() again, in a seemingly endless loop?
>

You are confusing endless loop with recursion that has a stop condition.
The entire concept of stacked filesystem is recursion back into vfs.
This is essentially what most of the ovl file operations do, but they recurse
on the "real.file", which is supposed to be on a filesystem with lower
sb->s_stack_depth (FILESYSTEM_MAX_STACK_DEPTH is 2).

> It seems like generic_fadvise() should be EXPORT_SYMBOL() so that any
> filesystem that implements its own .fadvise method can do its own thing,
> and then call generic_fadvise() to handle the remaining MM-specific work.
>
> Thoughts?

Sure makes sense.
Overlayfs just doesn't need to call the generic helper.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index a4acd84591d4..42d2d034d85c 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -331,6 +331,23 @@  static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	return ret;
 }
 
+int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
+{
+	struct fd real;
+	int ret;
+
+	ret = ovl_real_fdget(file, &real);
+	if (ret)
+		return ret;
+
+	/* XXX: do we need mounter credentials? */
+	ret = vfs_fadvise(real.file, offset, len, advice);
+
+	fdput(real);
+
+	return ret;
+}
+
 static long ovl_real_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -499,6 +516,7 @@  const struct file_operations ovl_file_operations = {
 	.fsync		= ovl_fsync,
 	.mmap		= ovl_mmap,
 	.fallocate	= ovl_fallocate,
+	.fadvise	= ovl_fadvise,
 	.unlocked_ioctl	= ovl_ioctl,
 	.compat_ioctl	= ovl_compat_ioctl,