diff mbox series

9p: Fix DIO read through netfs

Message ID 1229195.1723211769@warthog.procyon.org.uk (mailing list archive)
State New
Headers show
Series 9p: Fix DIO read through netfs | expand

Commit Message

David Howells Aug. 9, 2024, 1:56 p.m. UTC
From: Dominique Martinet <asmadeus@codewreck.org>

9p: Fix DIO read through netfs

If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Written-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/9p/vfs_addr.c     |    3 ++-
 fs/afs/file.c        |    3 ++-
 fs/ceph/addr.c       |    6 ++++--
 fs/netfs/io.c        |   17 +++++++++++------
 fs/nfs/fscache.c     |    3 ++-
 fs/smb/client/file.c |    3 ++-
 6 files changed, 23 insertions(+), 12 deletions(-)

Comments

Dominique Martinet Aug. 10, 2024, 3:36 a.m. UTC | #1
David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> From: Dominique Martinet <asmadeus@codewreck.org>
> 
> 9p: Fix DIO read through netfs

nitpick: now sure how that ended up here but this is duplicated with the
subject (the commit message ends up with this line twice)

> If a program is watching a file on a 9p mount, it won't see any change in
> size if the file being exported by the server is changed directly in the
> source filesystem, presumably because 9p doesn't have change notifications,
> and because netfs skips the reads if the file is empty.
> 
> Fix this by attempting to read the full size specified when a DIO read is
> requested (such as when 9p is operating in unbuffered mode) and dealing
> with a short read if the EOF was less than the expected read.
> 
> To make this work, filesystems using netfslib must not set
> NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> I don't want to mandatorily clear this flag in netfslib for DIO because,
> say, ceph might make a read from an object that is not completely filled,
> but does not reside at the end of file - and so we need to clear the
> excess.
> 
> This can be tested by watching an empty file over 9p within a VM (such as
> in the ktest framework):
> 
>         while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

(This is basically the same thing but if one wants to control the read
timing for more precise/verbose debugging:
  exec 3< /host/tmp/foo
  read -u 3 content && echo $content
  (repeat as appropriate)
  exec 3>&-
)

> then writing something into the empty file.  The watcher should immediately
> display the file content and break out of the loop.  Without this fix, it
> remains in the loop indefinitely.
> 
> Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> Written-by: Dominique Martinet <asmadeus@codewreck.org>

Thanks for adding extra comments & fixing other filesystems.

I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
hopefully shouldn't have further side effects, this sounds good to me:

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Van Hensbergen <ericvh@kernel.org>
> cc: Latchesar Ionkov <lucho@ionkov.net>
> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: Steve French <sfrench@samba.org>
> cc: Paulo Alcantara <pc@manguebit.com>
> cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: ceph-devel@vger.kernel.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-nfs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/9p/vfs_addr.c     |    3 ++-
>  fs/afs/file.c        |    3 ++-
>  fs/ceph/addr.c       |    6 ++++--
>  fs/netfs/io.c        |   17 +++++++++++------
>  fs/nfs/fscache.c     |    3 ++-
>  fs/smb/client/file.c |    3 ++-
>  6 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index a97ceb105cd8..24fdc74caeba 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
>  
>  	/* if we just extended the file size, any portion not in
>  	 * cache won't be on server and is zeroes */
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (subreq->rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  
>  	netfs_subreq_terminated(subreq, err ?: total, false);
>  }
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index c3f0c45ae9a9..ec1be0091fdb 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
>  
>  	req->error = error;
>  	if (subreq) {
> -		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +		if (subreq->rreq->origin != NETFS_DIO_READ)
> +			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  		netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
>  		req->subreq = NULL;
>  	} else if (req->done) {
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index cc0a2240de98..c4744a02db75 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
>  	if (err >= 0) {
>  		if (sparse && err > 0)
>  			err = ceph_sparse_ext_map_end(op);
> -		if (err < subreq->len)
> +		if (err < subreq->len &&
> +		    subreq->rreq->origin != NETFS_DIO_READ)
>  			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  		if (IS_ENCRYPTED(inode) && err > 0) {
>  			err = ceph_fscrypt_decrypt_extents(inode,
> @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
>  	size_t len;
>  	int mode;
>  
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  	__clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
>  
>  	if (subreq->start >= inode->i_size)
> diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> index c179a1c73fa7..5367caf3fa28 100644
> --- a/fs/netfs/io.c
> +++ b/fs/netfs/io.c
> @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
>  
>  	if (transferred_or_error == 0) {
>  		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> -			subreq->error = -ENODATA;
> +			if (rreq->origin != NETFS_DIO_READ)
> +				subreq->error = -ENODATA;
>  			goto failed;
>  		}
>  	} else {
> @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
>  			}
>  			if (subreq->len > ictx->zero_point - subreq->start)
>  				subreq->len = ictx->zero_point - subreq->start;
> +
> +			/* We limit buffered reads to the EOF, but let the
> +			 * server deal with larger-than-EOF DIO/unbuffered
> +			 * reads.
> +			 */
> +			if (subreq->len > rreq->i_size - subreq->start)
> +				subreq->len = rreq->i_size - subreq->start;
>  		}
> -		if (subreq->len > rreq->i_size - subreq->start)
> -			subreq->len = rreq->i_size - subreq->start;
>  		if (rreq->rsize && subreq->len > rreq->rsize)
>  			subreq->len = rreq->rsize;
>  
> @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
>  	do {
>  		_debug("submit %llx + %llx >= %llx",
>  		       rreq->start, rreq->submitted, rreq->i_size);
> -		if (rreq->origin == NETFS_DIO_READ &&
> -		    rreq->start + rreq->submitted >= rreq->i_size)
> -			break;
>  		if (!netfs_rreq_submit_slice(rreq, &io_iter))
>  			break;
> +		if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> +			break;
>  		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
>  		    test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
>  			break;
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> index bf29a65c5027..7a558dea75c4 100644
> --- a/fs/nfs/fscache.c
> +++ b/fs/nfs/fscache.c
> @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
>  		return;
>  
>  	sreq = netfs->sreq;
> -	if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> +	if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> +	    sreq->rreq->origin != NETFS_DIO_READ)
>  		__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
>  
>  	if (hdr->error)
> diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> index b2405dd4d4d4..3f3842e7b44a 100644
> --- a/fs/smb/client/file.c
> +++ b/fs/smb/client/file.c
> @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
>  			goto out;
>  	}
>  
> -	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> +	if (subreq->rreq->origin != NETFS_DIO_READ)
> +		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
>  
>  	rc = rdata->server->ops->async_readv(rdata);
>  out:
>
Steve French Aug. 19, 2024, 10:36 p.m. UTC | #2
I did some additional testing of this patch after seeing regressions
starting with 6.11-rc4 with xfstests generic/125 and generic/210 and
was able to bisect it to this patch which just went in:

commit e3786b29c54cdae3490b07180a54e2461f42144c
Author: Dominique Martinet <asmadeus@codewreck.org>

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..3f3842e7b44a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct
netfs_io_subrequest *subreq)
                        goto out;
        }

-       __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+       if (subreq->rreq->origin != NETFS_DIO_READ)
+               __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);

        rc = rdata->server->ops->async_readv(rdata);
 out:

It is very simple to repro.
I originally noticed it with tests to shares in Azure, but it also
fails locally to Samba with default cifs.ko mount options.


log message is below:

[ 7884.205037] Workqueue: cifsiod smb2_readv_worker [cifs]
[ 7884.205262] RIP: 0010:netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205299] Code: 01 00 00 e8 02 b4 07 df 4c 8b 4c 24 08 49 89 d8
4c 89 e9 41 8b b4 24 d4 01 00 00 44 89 f2 48 c7 c7 40 10 65 c1 e8 30
a9 b6 de <0f> 0b 48 8b 7c 24 18 4c 8d bd c0 00 00 00 e8 2d b5 07 df 48
8b 7c
[ 7884.205305] RSP: 0018:ff1100010705fce8 EFLAGS: 00010286
[ 7884.205312] RAX: dffffc0000000000 RBX: 0000000000001000 RCX: 0000000000000027
[ 7884.205317] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb1b1a08
[ 7884.205322] RBP: ff11000119450900 R08: ffffffffa03e346e R09: ffe21c0099636341
[ 7884.205326] R10: ff110004cb1b1a0b R11: 0000000000000001 R12: ff11000137b68a80
[ 7884.205330] R13: 000000000000012c R14: 0000000000000001 R15: ff11000126a96f78
[ 7884.205335] FS:  0000000000000000(0000) GS:ff110004cb180000(0000)
knlGS:0000000000000000
[ 7884.205339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7884.205344] CR2: 00007f0035f0a67c CR3: 000000000f664004 CR4: 0000000000371ef0
[ 7884.205354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7884.205359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7884.205363] Call Trace:
[ 7884.205367]  <TASK>
[ 7884.205373]  ? __warn+0xa4/0x220
[ 7884.205386]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205423]  ? report_bug+0x1d4/0x1e0
[ 7884.205436]  ? handle_bug+0x42/0x80
[ 7884.205442]  ? exc_invalid_op+0x18/0x50
[ 7884.205449]  ? asm_exc_invalid_op+0x1a/0x20
[ 7884.205464]  ? irq_work_claim+0x1e/0x40
[ 7884.205475]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205512]  ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs]
[ 7884.205554]  process_one_work+0x4cf/0xb80
[ 7884.205573]  ? __pfx_lock_acquire+0x10/0x10
[ 7884.205582]  ? __pfx_process_one_work+0x10/0x10
[ 7884.205599]  ? assign_work+0xd6/0x110
[ 7884.205609]  worker_thread+0x2cd/0x550
[ 7884.205622]  ? __pfx_worker_thread+0x10/0x10
[ 7884.205632]  kthread+0x187/0x1d0
[ 7884.205639]  ? __pfx_kthread+0x10/0x10
[ 7884.205648]  ret_from_fork+0x34/0x60
[ 7884.205655]  ? __pfx_kthread+0x10/0x10
[ 7884.205661]  ret_from_fork_asm+0x1a/0x30
[ 7884.205684]  </TASK>
[ 7884.205688] irq event stamp: 23635
[ 7884.205692] hardirqs last  enabled at (23641): [<ffffffffa022b58b>]
console_unlock+0x15b/0x170
[ 7884.205699] hardirqs last disabled at (23646): [<ffffffffa022b570>]
console_unlock+0x140/0x170
[ 7884.205705] softirqs last  enabled at (23402): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205712] softirqs last disabled at (23397): [<ffffffffa0131a6e>]
__irq_exit_rcu+0xfe/0x120
[ 7884.205718] ---[ end trace 0000000000000000 ]---

On Fri, Aug 9, 2024 at 10:43 PM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100:
> > From: Dominique Martinet <asmadeus@codewreck.org>
> >
> > 9p: Fix DIO read through netfs
>
> nitpick: now sure how that ended up here but this is duplicated with the
> subject (the commit message ends up with this line twice)
>
> > If a program is watching a file on a 9p mount, it won't see any change in
> > size if the file being exported by the server is changed directly in the
> > source filesystem, presumably because 9p doesn't have change notifications,
> > and because netfs skips the reads if the file is empty.
> >
> > Fix this by attempting to read the full size specified when a DIO read is
> > requested (such as when 9p is operating in unbuffered mode) and dealing
> > with a short read if the EOF was less than the expected read.
> >
> > To make this work, filesystems using netfslib must not set
> > NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
> > I don't want to mandatorily clear this flag in netfslib for DIO because,
> > say, ceph might make a read from an object that is not completely filled,
> > but does not reside at the end of file - and so we need to clear the
> > excess.
> >
> > This can be tested by watching an empty file over 9p within a VM (such as
> > in the ktest framework):
> >
> >         while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
>
> (This is basically the same thing but if one wants to control the read
> timing for more precise/verbose debugging:
>   exec 3< /host/tmp/foo
>   read -u 3 content && echo $content
>   (repeat as appropriate)
>   exec 3>&-
> )
>
> > then writing something into the empty file.  The watcher should immediately
> > display the file content and break out of the loop.  Without this fix, it
> > remains in the loop indefinitely.
> >
> > Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
> > Written-by: Dominique Martinet <asmadeus@codewreck.org>
>
> Thanks for adding extra comments & fixing other filesystems.
>
> I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so
> hopefully shouldn't have further side effects, this sounds good to me:
>
> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
>
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > cc: Eric Van Hensbergen <ericvh@kernel.org>
> > cc: Latchesar Ionkov <lucho@ionkov.net>
> > cc: Christian Schoenebeck <linux_oss@crudebyte.com>
> > cc: Marc Dionne <marc.dionne@auristor.com>
> > cc: Ilya Dryomov <idryomov@gmail.com>
> > cc: Steve French <sfrench@samba.org>
> > cc: Paulo Alcantara <pc@manguebit.com>
> > cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> > cc: v9fs@lists.linux.dev
> > cc: linux-afs@lists.infradead.org
> > cc: ceph-devel@vger.kernel.org
> > cc: linux-cifs@vger.kernel.org
> > cc: linux-nfs@vger.kernel.org
> > cc: netfs@lists.linux.dev
> > cc: linux-fsdevel@vger.kernel.org
> > ---
> >  fs/9p/vfs_addr.c     |    3 ++-
> >  fs/afs/file.c        |    3 ++-
> >  fs/ceph/addr.c       |    6 ++++--
> >  fs/netfs/io.c        |   17 +++++++++++------
> >  fs/nfs/fscache.c     |    3 ++-
> >  fs/smb/client/file.c |    3 ++-
> >  6 files changed, 23 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> > index a97ceb105cd8..24fdc74caeba 100644
> > --- a/fs/9p/vfs_addr.c
> > +++ b/fs/9p/vfs_addr.c
> > @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
> >
> >       /* if we just extended the file size, any portion not in
> >        * cache won't be on server and is zeroes */
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (subreq->rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> >       netfs_subreq_terminated(subreq, err ?: total, false);
> >  }
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index c3f0c45ae9a9..ec1be0091fdb 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
> >
> >       req->error = error;
> >       if (subreq) {
> > -             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +             if (subreq->rreq->origin != NETFS_DIO_READ)
> > +                     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >               netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
> >               req->subreq = NULL;
> >       } else if (req->done) {
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index cc0a2240de98..c4744a02db75 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req)
> >       if (err >= 0) {
> >               if (sparse && err > 0)
> >                       err = ceph_sparse_ext_map_end(op);
> > -             if (err < subreq->len)
> > +             if (err < subreq->len &&
> > +                 subreq->rreq->origin != NETFS_DIO_READ)
> >                       __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >               if (IS_ENCRYPTED(inode) && err > 0) {
> >                       err = ceph_fscrypt_decrypt_extents(inode,
> > @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
> >       size_t len;
> >       int mode;
> >
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >       __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
> >
> >       if (subreq->start >= inode->i_size)
> > diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> > index c179a1c73fa7..5367caf3fa28 100644
> > --- a/fs/netfs/io.c
> > +++ b/fs/netfs/io.c
> > @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
> >
> >       if (transferred_or_error == 0) {
> >               if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
> > -                     subreq->error = -ENODATA;
> > +                     if (rreq->origin != NETFS_DIO_READ)
> > +                             subreq->error = -ENODATA;
> >                       goto failed;
> >               }
> >       } else {
> > @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
> >                       }
> >                       if (subreq->len > ictx->zero_point - subreq->start)
> >                               subreq->len = ictx->zero_point - subreq->start;
> > +
> > +                     /* We limit buffered reads to the EOF, but let the
> > +                      * server deal with larger-than-EOF DIO/unbuffered
> > +                      * reads.
> > +                      */
> > +                     if (subreq->len > rreq->i_size - subreq->start)
> > +                             subreq->len = rreq->i_size - subreq->start;
> >               }
> > -             if (subreq->len > rreq->i_size - subreq->start)
> > -                     subreq->len = rreq->i_size - subreq->start;
> >               if (rreq->rsize && subreq->len > rreq->rsize)
> >                       subreq->len = rreq->rsize;
> >
> > @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
> >       do {
> >               _debug("submit %llx + %llx >= %llx",
> >                      rreq->start, rreq->submitted, rreq->i_size);
> > -             if (rreq->origin == NETFS_DIO_READ &&
> > -                 rreq->start + rreq->submitted >= rreq->i_size)
> > -                     break;
> >               if (!netfs_rreq_submit_slice(rreq, &io_iter))
> >                       break;
> > +             if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
> > +                     break;
> >               if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
> >                   test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
> >                       break;
> > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> > index bf29a65c5027..7a558dea75c4 100644
> > --- a/fs/nfs/fscache.c
> > +++ b/fs/nfs/fscache.c
> > @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
> >               return;
> >
> >       sreq = netfs->sreq;
> > -     if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
> > +     if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
> > +         sreq->rreq->origin != NETFS_DIO_READ)
> >               __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
> >
> >       if (hdr->error)
> > diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> > index b2405dd4d4d4..3f3842e7b44a 100644
> > --- a/fs/smb/client/file.c
> > +++ b/fs/smb/client/file.c
> > @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
> >                       goto out;
> >       }
> >
> > -     __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> > +     if (subreq->rreq->origin != NETFS_DIO_READ)
> > +             __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
> >
> >       rc = rdata->server->ops->async_readv(rdata);
> >  out:
> >
>
> --
> Dominique Martinet | Asmadeus
>
diff mbox series

Patch

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index a97ceb105cd8..24fdc74caeba 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -75,7 +75,8 @@  static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
 
 	/* if we just extended the file size, any portion not in
 	 * cache won't be on server and is zeroes */
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (subreq->rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 
 	netfs_subreq_terminated(subreq, err ?: total, false);
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index c3f0c45ae9a9..ec1be0091fdb 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -242,7 +242,8 @@  static void afs_fetch_data_notify(struct afs_operation *op)
 
 	req->error = error;
 	if (subreq) {
-		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+		if (subreq->rreq->origin != NETFS_DIO_READ)
+			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 		netfs_subreq_terminated(subreq, error ?: req->actual_len, false);
 		req->subreq = NULL;
 	} else if (req->done) {
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index cc0a2240de98..c4744a02db75 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -246,7 +246,8 @@  static void finish_netfs_read(struct ceph_osd_request *req)
 	if (err >= 0) {
 		if (sparse && err > 0)
 			err = ceph_sparse_ext_map_end(op);
-		if (err < subreq->len)
+		if (err < subreq->len &&
+		    subreq->rreq->origin != NETFS_DIO_READ)
 			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 		if (IS_ENCRYPTED(inode) && err > 0) {
 			err = ceph_fscrypt_decrypt_extents(inode,
@@ -282,7 +283,8 @@  static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq)
 	size_t len;
 	int mode;
 
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 	__clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
 
 	if (subreq->start >= inode->i_size)
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index c179a1c73fa7..5367caf3fa28 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -530,7 +530,8 @@  void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
 
 	if (transferred_or_error == 0) {
 		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
-			subreq->error = -ENODATA;
+			if (rreq->origin != NETFS_DIO_READ)
+				subreq->error = -ENODATA;
 			goto failed;
 		}
 	} else {
@@ -601,9 +602,14 @@  netfs_rreq_prepare_read(struct netfs_io_request *rreq,
 			}
 			if (subreq->len > ictx->zero_point - subreq->start)
 				subreq->len = ictx->zero_point - subreq->start;
+
+			/* We limit buffered reads to the EOF, but let the
+			 * server deal with larger-than-EOF DIO/unbuffered
+			 * reads.
+			 */
+			if (subreq->len > rreq->i_size - subreq->start)
+				subreq->len = rreq->i_size - subreq->start;
 		}
-		if (subreq->len > rreq->i_size - subreq->start)
-			subreq->len = rreq->i_size - subreq->start;
 		if (rreq->rsize && subreq->len > rreq->rsize)
 			subreq->len = rreq->rsize;
 
@@ -739,11 +745,10 @@  int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 	do {
 		_debug("submit %llx + %llx >= %llx",
 		       rreq->start, rreq->submitted, rreq->i_size);
-		if (rreq->origin == NETFS_DIO_READ &&
-		    rreq->start + rreq->submitted >= rreq->i_size)
-			break;
 		if (!netfs_rreq_submit_slice(rreq, &io_iter))
 			break;
+		if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags))
+			break;
 		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
 		    test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags))
 			break;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index bf29a65c5027..7a558dea75c4 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -363,7 +363,8 @@  void nfs_netfs_read_completion(struct nfs_pgio_header *hdr)
 		return;
 
 	sreq = netfs->sreq;
-	if (test_bit(NFS_IOHDR_EOF, &hdr->flags))
+	if (test_bit(NFS_IOHDR_EOF, &hdr->flags) &&
+	    sreq->rreq->origin != NETFS_DIO_READ)
 		__set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
 
 	if (hdr->error)
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..3f3842e7b44a 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -217,7 +217,8 @@  static void cifs_req_issue_read(struct netfs_io_subrequest *subreq)
 			goto out;
 	}
 
-	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (subreq->rreq->origin != NETFS_DIO_READ)
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 
 	rc = rdata->server->ops->async_readv(rdata);
 out: