diff mbox series

[RESEND4,1/2] fuse: retrieve: cap requested size to negotiated max_write

Message ID 12f7d0d98555ee0d174d04bb47644f65c07f035a.1553680185.git.kirr@nexedi.com (mailing list archive)
State New, archived
Headers show
Series fuse: don't stuck clients on retrieve_notify with size > max_write | expand

Commit Message

Kirill Smelkov March 27, 2019, 10:15 a.m. UTC
FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header
+ that max_write bytes. A filesystem server is free to set its max_write
in anywhere in the range between [1·page, fc->max_pages·page]. In
particular go-fuse[2] sets max_write by default as 64K, wheres default
fc->max_pages corresponds to 128K. Libfuse also allows users to
configure max_write, but by default presets it to possible maximum.

If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract,
will be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server
request buffer. In turn the filesystem server could get stuck waiting
indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
which is understood by clients as that NOTIFY_REPLY was queued and will
be sent back.

-> Cap requested size to negotiate max_write to avoid the problem.
This aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages.
This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
data than was originally requested.

Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.

[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Cc: <stable@vger.kernel.org> # v2.6.36+
---
 fs/fuse/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Miklos Szeredi April 24, 2019, 10:44 a.m. UTC | #1
On Wed, Mar 27, 2019 at 11:15 AM Kirill Smelkov <kirr@nexedi.com> wrote:
>
> FUSE filesystem server and kernel client negotiate during initialization
> phase, what should be the maximum write size the client will ever issue.
> Correspondingly the filesystem server then queues sys_read calls to read
> requests with buffer capacity large enough to carry request header
> + that max_write bytes. A filesystem server is free to set its max_write
> in anywhere in the range between [1·page, fc->max_pages·page]. In
> particular go-fuse[2] sets max_write by default as 64K, wheres default
> fc->max_pages corresponds to 128K. Libfuse also allows users to
> configure max_write, but by default presets it to possible maximum.
>
> If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
> allow to retrieve more than max_write bytes, corresponding prepared
> NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
> filesystem server, in full correspondence with server/client contract,
> will be only queuing sys_read with ~max_write buffer capacity, and
> fuse_dev_do_read throws away requests that cannot fit into server
> request buffer. In turn the filesystem server could get stuck waiting
> indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
> which is understood by clients as that NOTIFY_REPLY was queued and will
> be sent back.
>
> -> Cap requested size to negotiate max_write to avoid the problem.
> This aligns with the way NOTIFY_RETRIEVE handler works, which already
> unconditionally caps requested retrieve size to fuse_conn->max_pages.
> This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
> data than was originally requested.
>
> Please see [1] for context where the problem of stuck filesystem was hit
> for real, how the situation was traced and for more involving patch that
> did not make it into the tree.
>
> [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
> [2] https://github.com/hanwen/go-fuse
>
> Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
> Cc: Han-Wen Nienhuys <hanwen@google.com>
> Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
> Cc: <stable@vger.kernel.org> # v2.6.36+
> ---
>  fs/fuse/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 8a63e52785e9..38e94bc43053 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
>         offset = outarg->offset & ~PAGE_MASK;
>         file_size = i_size_read(inode);
>
> -       num = outarg->size;
> +       num = min(outarg->size, fc->max_write);

This is wrong: the max_size limited num is overwritten if constrained
by file size.

Also the patch is whitespace damaged.

Thanks,
Miklos
Kirill Smelkov April 24, 2019, 11:56 a.m. UTC | #2
On Wed, Apr 24, 2019 at 12:44:50PM +0200, Miklos Szeredi wrote:
> On Wed, Mar 27, 2019 at 11:15 AM Kirill Smelkov <kirr@nexedi.com> wrote:
> >
> > FUSE filesystem server and kernel client negotiate during initialization
> > phase, what should be the maximum write size the client will ever issue.
> > Correspondingly the filesystem server then queues sys_read calls to read
> > requests with buffer capacity large enough to carry request header
> > + that max_write bytes. A filesystem server is free to set its max_write
> > in anywhere in the range between [1·page, fc->max_pages·page]. In
> > particular go-fuse[2] sets max_write by default as 64K, wheres default
> > fc->max_pages corresponds to 128K. Libfuse also allows users to
> > configure max_write, but by default presets it to possible maximum.
> >
> > If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
> > allow to retrieve more than max_write bytes, corresponding prepared
> > NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
> > filesystem server, in full correspondence with server/client contract,
> > will be only queuing sys_read with ~max_write buffer capacity, and
> > fuse_dev_do_read throws away requests that cannot fit into server
> > request buffer. In turn the filesystem server could get stuck waiting
> > indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
> > which is understood by clients as that NOTIFY_REPLY was queued and will
> > be sent back.
> >
> > -> Cap requested size to negotiate max_write to avoid the problem.
> > This aligns with the way NOTIFY_RETRIEVE handler works, which already
> > unconditionally caps requested retrieve size to fuse_conn->max_pages.
> > This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
> > data than was originally requested.
> >
> > Please see [1] for context where the problem of stuck filesystem was hit
> > for real, how the situation was traced and for more involving patch that
> > did not make it into the tree.
> >
> > [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
> > [2] https://github.com/hanwen/go-fuse
> >
> > Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
> > Cc: Han-Wen Nienhuys <hanwen@google.com>
> > Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
> > Cc: <stable@vger.kernel.org> # v2.6.36+
> > ---
> >  fs/fuse/dev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 8a63e52785e9..38e94bc43053 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
> >         offset = outarg->offset & ~PAGE_MASK;
> >         file_size = i_size_read(inode);
> >
> > -       num = outarg->size;
> > +       num = min(outarg->size, fc->max_write);
> 
> This is wrong: the max_size limited num is overwritten if constrained
> by file size.

I assume you are meaning this:

	--- a/fs/fuse/dev.c
	+++ b/fs/fuse/dev.c
	@@ -1745,15 +1745,15 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
	        unsigned int offset;
	        size_t total_len = 0;
	        unsigned int num_pages;
	 
	        offset = outarg->offset & ~PAGE_MASK;
	        file_size = i_size_read(inode);
	 
	-       num = outarg->size;
	+       num = min(outarg->size, fc->max_write);
	        if (outarg->offset > file_size)
	                num = 0;
	        else if (outarg->offset + num > file_size)
	                num = file_size - outarg->offset;		<-- THIS
	 
	        num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
	        num_pages = min(num_pages, fc->max_pages);

and then in this case (offset + num > file_size) num overwrite

	num = file_size - offset

can make num only smaller, right? And then the patch is not wrong because there
is no other num overwriting in this function except when num is being further
decremented in loop that prepares pages to retrieve.

Or am I missing something?  Would it be more clear to cap num to
max_write after all calculations?  But then - if we are not sure that
file_size check can only lower num - we have a problem: we are no longer
sure that num is <= outarg->size.


> Also the patch is whitespace damaged.

I've tried to do the following in my mutt on "RESEND4, PATCH 1/2"
message:

	|(cd ~/src/linux/linux && git am -)

and the patch applied successfully. So could you please clarify what
"whitespace damaged" means?

Attaching the patch once again just in case.

Kirill

---- 8< ----
From 000a5a6c91992f2a361a846cd050444964920f85 Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@nexedi.com>
Date: Wed, 27 Mar 2019 13:00:56 +0300
Subject: [PATCH] fuse: retrieve: cap requested size to negotiated max_write
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header
+ that max_write bytes. A filesystem server is free to set its max_write
in anywhere in the range between [1·page, fc->max_pages·page]. In
particular go-fuse[2] sets max_write by default as 64K, wheres default
fc->max_pages corresponds to 128K. Libfuse also allows users to
configure max_write, but by default presets it to possible maximum.

If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract,
will be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server
request buffer. In turn the filesystem server could get stuck waiting
indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
which is understood by clients as that NOTIFY_REPLY was queued and will
be sent back.

-> Cap requested size to negotiate max_write to avoid the problem.
This aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages.
This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
data than was originally requested.

Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.

[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Cc: <stable@vger.kernel.org> # v2.6.36+
---
 fs/fuse/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9971a35cf1ef..1e2efd873201 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
 	offset = outarg->offset & ~PAGE_MASK;
 	file_size = i_size_read(inode);
 
-	num = outarg->size;
+	num = min(outarg->size, fc->max_write);
 	if (outarg->offset > file_size)
 		num = 0;
 	else if (outarg->offset + num > file_size)
Miklos Szeredi April 24, 2019, 12:17 p.m. UTC | #3
On Wed, Apr 24, 2019 at 1:56 PM Kirill Smelkov <kirr@nexedi.com> wrote:

> I assume you are meaning this:
>
>         --- a/fs/fuse/dev.c
>         +++ b/fs/fuse/dev.c
>         @@ -1745,15 +1745,15 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
>                 unsigned int offset;
>                 size_t total_len = 0;
>                 unsigned int num_pages;
>
>                 offset = outarg->offset & ~PAGE_MASK;
>                 file_size = i_size_read(inode);
>
>         -       num = outarg->size;
>         +       num = min(outarg->size, fc->max_write);
>                 if (outarg->offset > file_size)
>                         num = 0;
>                 else if (outarg->offset + num > file_size)
>                         num = file_size - outarg->offset;               <-- THIS
>
>                 num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
>                 num_pages = min(num_pages, fc->max_pages);
>
> and then in this case (offset + num > file_size) num overwrite
>
>         num = file_size - offset
>
> can make num only smaller, right? And then the patch is not wrong because there
> is no other num overwriting in this function except when num is being further
> decremented in loop that prepares pages to retrieve.

You're right, of course.


> > Also the patch is whitespace damaged.
>
> I've tried to do the following in my mutt on "RESEND4, PATCH 1/2"
> message:
>
>         |(cd ~/src/linux/linux && git am -)
>
> and the patch applied successfully. So could you please clarify what
> "whitespace damaged" means?

Hmm, apparently this (and only this) message is "quoted-printable"
encoded.  git-am seems to handle it fine, but my script doesn't.
Anyway, I'll do it manually.

Thanks,
Miklos
Kirill Smelkov April 24, 2019, 12:31 p.m. UTC | #4
On Wed, Apr 24, 2019 at 02:17:27PM +0200, Miklos Szeredi wrote:
> On Wed, Apr 24, 2019 at 1:56 PM Kirill Smelkov <kirr@nexedi.com> wrote:
> 
> > I assume you are meaning this:
> >
> >         --- a/fs/fuse/dev.c
> >         +++ b/fs/fuse/dev.c
> >         @@ -1745,15 +1745,15 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
> >                 unsigned int offset;
> >                 size_t total_len = 0;
> >                 unsigned int num_pages;
> >
> >                 offset = outarg->offset & ~PAGE_MASK;
> >                 file_size = i_size_read(inode);
> >
> >         -       num = outarg->size;
> >         +       num = min(outarg->size, fc->max_write);
> >                 if (outarg->offset > file_size)
> >                         num = 0;
> >                 else if (outarg->offset + num > file_size)
> >                         num = file_size - outarg->offset;               <-- THIS
> >
> >                 num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >                 num_pages = min(num_pages, fc->max_pages);
> >
> > and then in this case (offset + num > file_size) num overwrite
> >
> >         num = file_size - offset
> >
> > can make num only smaller, right? And then the patch is not wrong because there
> > is no other num overwriting in this function except when num is being further
> > decremented in loop that prepares pages to retrieve.
> 
> You're right, of course.

Thanks. Does it mean that the patch is ok? Do I need to rework
something?


> > > Also the patch is whitespace damaged.
> >
> > I've tried to do the following in my mutt on "RESEND4, PATCH 1/2"
> > message:
> >
> >         |(cd ~/src/linux/linux && git am -)
> >
> > and the patch applied successfully. So could you please clarify what
> > "whitespace damaged" means?
> 
> Hmm, apparently this (and only this) message is "quoted-printable"
> encoded.  git-am seems to handle it fine, but my script doesn't.
> Anyway, I'll do it manually.

I see. Probably it is not "quoted-printable" as

	Content-Type: text/plain; charset=utf-8
	Content-Transfer-Encoding: 8bit

suggests and it is maybe due to UTF-8 characters (I used "·" several
times in patch description). Anyway if it helps you can pull the patch
from here

	https://lab.nexedi.com/kirr/linux.git	y/fuse-retrieve-cap-max_write
	
and then cherry-pick it (git cherry-pick fd482f96537a) to where needed.

Thanks again for feedback,

Kirill
Miklos Szeredi April 24, 2019, 1:19 p.m. UTC | #5
On Wed, Apr 24, 2019 at 2:31 PM Kirill Smelkov <kirr@nexedi.com> wrote:

> Thanks. Does it mean that the patch is ok? Do I need to rework
> something?

Pushed to #for-next with all the rest.  Made some changes to the
patches, so please verify.

> I see. Probably it is not "quoted-printable" as
>
>         Content-Type: text/plain; charset=utf-8
>         Content-Transfer-Encoding: 8bit

Well, google converts it to quoted-printable then:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

> suggests and it is maybe due to UTF-8 characters (I used "·" several
> times in patch description).

Please refrain from gratuitous use of non-ascii.   That middle-dot is
written as "*" in C (fixed the patch description).

Thanks,
Miklos
Kirill Smelkov April 24, 2019, 2:22 p.m. UTC | #6
On Wed, Apr 24, 2019 at 03:19:11PM +0200, Miklos Szeredi wrote:
> On Wed, Apr 24, 2019 at 2:31 PM Kirill Smelkov <kirr@nexedi.com> wrote:
> 
> > Thanks. Does it mean that the patch is ok? Do I need to rework
> > something?
> 
> Pushed to #for-next with all the rest.  Made some changes to the
> patches, so please verify.

Thanks a lot. I've verified all changes and it is indeed better to amend
something:

- FOPEN_STREAM:

    --- b/include/uapi/linux/fuse.h
    +++ b/include/uapi/linux/fuse.h
    @@ -232,7 +232,7 @@
      * FOPEN_KEEP_CACHE: don't invalidate the data cache on open
      * FOPEN_NONSEEKABLE: the file is not seekable
      * FOPEN_CACHE_DIR: allow caching this directory
    - * FOPEN_STREAM: the file is stream-like
    + * FOPEN_STREAM: the file is stream-like (no file position at all)
      */
     #define FOPEN_DIRECT_IO                (1 << 0)
     #define FOPEN_KEEP_CACHE       (1 << 1)

I agree, it is better this way (no amendment needed here).


- FUSE_PRECISE_INVAL_DATA:

    --- b/include/uapi/linux/fuse.h
    +++ b/include/uapi/linux/fuse.h
    @@ -266,7 +266,7 @@
      * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages
      * FUSE_CACHE_SYMLINKS: cache READLINK responses
      * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
    - * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for data cache invalidation
    + * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for invalidation
      */
     #define FUSE_ASYNC_READ                (1 << 0)
     #define FUSE_POSIX_LOCKS       (1 << 1)

the "data cache" in "for data cache invalidation" has particular meaning
and semantic: the filesystem promises to explicitly invalidate data of
the file, but it does not promise to explicitly invalidate attributes. I
understand it is a long line, and I myself tried to remove extra words,
but "data cache" here is semantically needed, so I left it. The
particular behaviour of FUSE_PRECISE_INVAL_DATA also covers only data
cache invalidations. For example file attributes, if not explicitly
invalidated by filesystem, are still invalidated by kernel by its
heuristic and due to negotiated attributes timeout, which is not
"precise".

If it is "precise invalidation for everything: data and attributes" we
should probably rename it to FUSE_PRECISE_INVAL and change the patch to
also cover attributes and not invalidate them automatically. However I
suggest to keep two things separate - data and attributes and not to
change the patch. By the way, description for "auto" invalidation mode is

	FUSE_AUTO_INVAL_DATA: automatically invalidate cached pages

which tells both from mode name and from its description that it is
about data.

uapi/linux/fuse.h is often used by userspace people as a document to
understand FUSE protocol (at least I used it this way). I thus suggest
to restore "data cache" since it makes semantic difference.

Your amendment for FOPEN_STREAM in uapi/linux/fuse.h (see above) also
suggests that it is better to be more explicit in that file.

    --- b/fs/fuse/inode.c
    +++ b/fs/fuse/inode.c
    @@ -913,13 +913,8 @@
                                    fc->dont_mask = 1;
                            if (arg->flags & FUSE_AUTO_INVAL_DATA)
                                    fc->auto_inval_data = 1;
    -                       if (arg->flags & FUSE_PRECISE_INVAL_DATA)
    +                       else if (arg->flags & FUSE_PRECISE_INVAL_DATA)
                                    fc->precise_inval_data = 1;
    -                       if (fc->auto_inval_data && fc->precise_inval_data) {
    -                               pr_warn("filesystem requested both auto and "
    -                                       "precise cache control - using auto\n");
    -                               fc->precise_inval_data = 0;
    -                       }
                            if (arg->flags & FUSE_DO_READDIRPLUS) {
                                    fc->do_readdirplus = 1;
                                    if (arg->flags & FUSE_READDIRPLUS_AUTO)

Even though it is ok for me personally (I could be careful and use only
FUSE_PRECISE_INVAL_DATA) I still think usage of both "auto" and "precise"
invalidation modes deserves a warning. It is only at filesystem init time. What
is the reason not to print it?

- "fuse: retrieve: cap requested size to negotiated max_write"

     Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
     Cc: Han-Wen Nienhuys <hanwen@google.com>
     Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
    -Cc: <stable@vger.kernel.org> # v2.6.36+

what is the reason not to include this patch into stable series?


- "fuse: require /dev/fuse reads to have enough buffer capacity"

    --- b/fs/fuse/dev.c
    +++ b/fs/fuse/dev.c
    @@ -1324,7 +1324,7 @@
             * to indicate that it is not following FUSE server/client contract.
             * Don't dequeue / abort any request.
             */
    -       if (nbytes < (fc->conn_init ? 4096 + fc->max_write : FUSE_MIN_READ_BUFFER))
    +       if (nbytes < max_t(size_t, FUSE_MIN_READ_BUFFER, 4096 + fc->max_write))
                    return -EINVAL;

ok, this seems to be correct, since fc, including fc->max_write, is
initialized as all zeros by fuse_conn_init (no amendment needed here).


> > I see. Probably it is not "quoted-printable" as
> >
> >         Content-Type: text/plain; charset=utf-8
> >         Content-Transfer-Encoding: 8bit
> 
> Well, google converts it to quoted-printable then:
> 
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable

I see.

> > suggests and it is maybe due to UTF-8 characters (I used "·" several
> > times in patch description).
> 
> Please refrain from gratuitous use of non-ascii.   That middle-dot is
> written as "*" in C (fixed the patch description).

Ok, I will try not to use unicode for my kernel fuse bits.

Thanks, again, a lot for merging the patches.

Kirill
Miklos Szeredi April 24, 2019, 3:02 p.m. UTC | #7
On Wed, Apr 24, 2019 at 4:22 PM Kirill Smelkov <kirr@nexedi.com> wrote:
> - FUSE_PRECISE_INVAL_DATA:
>
>     --- b/include/uapi/linux/fuse.h
>     +++ b/include/uapi/linux/fuse.h
>     @@ -266,7 +266,7 @@
>       * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages
>       * FUSE_CACHE_SYMLINKS: cache READLINK responses
>       * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
>     - * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for data cache invalidation
>     + * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for invalidation
>       */
>      #define FUSE_ASYNC_READ                (1 << 0)
>      #define FUSE_POSIX_LOCKS       (1 << 1)
>
> the "data cache" in "for data cache invalidation" has particular meaning
> and semantic: the filesystem promises to explicitly invalidate data of

Right; better name: FUSE_EXPLICIT_INVAL_DATA.  Will push fixed version.

> Your amendment for FOPEN_STREAM in uapi/linux/fuse.h (see above) also
> suggests that it is better to be more explicit in that file.
>
>     --- b/fs/fuse/inode.c
>     +++ b/fs/fuse/inode.c
>     @@ -913,13 +913,8 @@
>                                     fc->dont_mask = 1;
>                             if (arg->flags & FUSE_AUTO_INVAL_DATA)
>                                     fc->auto_inval_data = 1;
>     -                       if (arg->flags & FUSE_PRECISE_INVAL_DATA)
>     +                       else if (arg->flags & FUSE_PRECISE_INVAL_DATA)
>                                     fc->precise_inval_data = 1;
>     -                       if (fc->auto_inval_data && fc->precise_inval_data) {
>     -                               pr_warn("filesystem requested both auto and "
>     -                                       "precise cache control - using auto\n");
>     -                               fc->precise_inval_data = 0;
>     -                       }
>                             if (arg->flags & FUSE_DO_READDIRPLUS) {
>                                     fc->do_readdirplus = 1;
>                                     if (arg->flags & FUSE_READDIRPLUS_AUTO)
>
> Even though it is ok for me personally (I could be careful and use only
> FUSE_PRECISE_INVAL_DATA) I still think usage of both "auto" and "precise"
> invalidation modes deserves a warning. It is only at filesystem init time. What
> is the reason not to print it?

The warning makes no sense.  It should either be failure or silent override.

> - "fuse: retrieve: cap requested size to negotiated max_write"
>
>      Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
>      Cc: Han-Wen Nienhuys <hanwen@google.com>
>      Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
>     -Cc: <stable@vger.kernel.org> # v2.6.36+
>
> what is the reason not to include this patch into stable series?

This doens't fix any bugs out there, but there is a slight chance of
regression (so it might possibly have to be reverted in the future) so
it absolutely makes no sense to backport it to stable.

Thanks,
Miklos
Kirill Smelkov April 24, 2019, 6:10 p.m. UTC | #8
On Wed, Apr 24, 2019 at 05:02:42PM +0200, Miklos Szeredi wrote:
> On Wed, Apr 24, 2019 at 4:22 PM Kirill Smelkov <kirr@nexedi.com> wrote:
> > - FUSE_PRECISE_INVAL_DATA:
> >
> >     --- b/include/uapi/linux/fuse.h
> >     +++ b/include/uapi/linux/fuse.h
> >     @@ -266,7 +266,7 @@
> >       * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages
> >       * FUSE_CACHE_SYMLINKS: cache READLINK responses
> >       * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
> >     - * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for data cache invalidation
> >     + * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for invalidation
> >       */
> >      #define FUSE_ASYNC_READ                (1 << 0)
> >      #define FUSE_POSIX_LOCKS       (1 << 1)
> >
> > the "data cache" in "for data cache invalidation" has particular meaning
> > and semantic: the filesystem promises to explicitly invalidate data of
> 
> Right; better name: FUSE_EXPLICIT_INVAL_DATA.  Will push fixed version.

	- * FUSE_PRECISE_INVAL_DATA: filesystem is fully responsible for invalidation
	+ * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request

	...

	        /** Filesystem is fully reponsible for page cache invalidation. */
	-       unsigned precise_inval_data:1;
	+       unsigned explicit_inval_data:1;

Ok, let it be this way.


> > Your amendment for FOPEN_STREAM in uapi/linux/fuse.h (see above) also
> > suggests that it is better to be more explicit in that file.
> >
> >     --- b/fs/fuse/inode.c
> >     +++ b/fs/fuse/inode.c
> >     @@ -913,13 +913,8 @@
> >                                     fc->dont_mask = 1;
> >                             if (arg->flags & FUSE_AUTO_INVAL_DATA)
> >                                     fc->auto_inval_data = 1;
> >     -                       if (arg->flags & FUSE_PRECISE_INVAL_DATA)
> >     +                       else if (arg->flags & FUSE_PRECISE_INVAL_DATA)
> >                                     fc->precise_inval_data = 1;
> >     -                       if (fc->auto_inval_data && fc->precise_inval_data) {
> >     -                               pr_warn("filesystem requested both auto and "
> >     -                                       "precise cache control - using auto\n");
> >     -                               fc->precise_inval_data = 0;
> >     -                       }
> >                             if (arg->flags & FUSE_DO_READDIRPLUS) {
> >                                     fc->do_readdirplus = 1;
> >                                     if (arg->flags & FUSE_READDIRPLUS_AUTO)
> >
> > Even though it is ok for me personally (I could be careful and use only
> > FUSE_PRECISE_INVAL_DATA) I still think usage of both "auto" and "precise"
> > invalidation modes deserves a warning. It is only at filesystem init time. What
> > is the reason not to print it?
> 
> The warning makes no sense.  It should either be failure or silent override.

Ok.


> > - "fuse: retrieve: cap requested size to negotiated max_write"
> >
> >      Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
> >      Cc: Han-Wen Nienhuys <hanwen@google.com>
> >      Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
> >     -Cc: <stable@vger.kernel.org> # v2.6.36+
> >
> > what is the reason not to include this patch into stable series?
> 
> This doens't fix any bugs out there, but there is a slight chance of
> regression (so it might possibly have to be reverted in the future) so
> it absolutely makes no sense to backport it to stable.

Ok.


Thanks again for tossing the patches,

Kirill
Kirill Smelkov April 24, 2019, 6:59 p.m. UTC | #9
On Wed, Apr 24, 2019 at 09:09:58PM +0300, Kirill Smelkov wrote:
> Thanks again for tossing the patches,

I have to appologize here: I used the word "tossing" here as if it would
mean "processing" or "applying", but a friend of mine just said that it
means something like "throwing away" which completely changes the
message. Sorry, I did not meant that - it's just my english for which I
beg you pardon.

All I wanted to say is that I'm grateful that you reviewed and applied
the patches. So thanks and appologize for the inconveninece.

Kirill
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8a63e52785e9..38e94bc43053 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1749,7 +1749,7 @@  static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
 	offset = outarg->offset & ~PAGE_MASK;
 	file_size = i_size_read(inode);
 
-	num = outarg->size;
+	num = min(outarg->size, fc->max_write);
 	if (outarg->offset > file_size)
 		num = 0;
 	else if (outarg->offset + num > file_size)