Message ID | 1229195.1723211769@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | 9p: Fix DIO read through netfs | expand |
David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100: > From: Dominique Martinet <asmadeus@codewreck.org> > > 9p: Fix DIO read through netfs nitpick: now sure how that ended up here but this is duplicated with the subject (the commit message ends up with this line twice) > If a program is watching a file on a 9p mount, it won't see any change in > size if the file being exported by the server is changed directly in the > source filesystem, presumably because 9p doesn't have change notifications, > and because netfs skips the reads if the file is empty. > > Fix this by attempting to read the full size specified when a DIO read is > requested (such as when 9p is operating in unbuffered mode) and dealing > with a short read if the EOF was less than the expected read. > > To make this work, filesystems using netfslib must not set > NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF. > I don't want to mandatorily clear this flag in netfslib for DIO because, > say, ceph might make a read from an object that is not completely filled, > but does not reside at the end of file - and so we need to clear the > excess. > > This can be tested by watching an empty file over 9p within a VM (such as > in the ktest framework): > > while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo (This is basically the same thing but if one wants to control the read timing for more precise/verbose debugging: exec 3< /host/tmp/foo read -u 3 content && echo $content (repeat as appropriate) exec 3>&- ) > then writing something into the empty file. The watcher should immediately > display the file content and break out of the loop. Without this fix, it > remains in the loop indefinitely. > > Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter") > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916 > Written-by: Dominique Martinet <asmadeus@codewreck.org> Thanks for adding extra comments & fixing other filesystems. I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so hopefully shouldn't have further side effects, this sounds good to me: Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Eric Van Hensbergen <ericvh@kernel.org> > cc: Latchesar Ionkov <lucho@ionkov.net> > cc: Christian Schoenebeck <linux_oss@crudebyte.com> > cc: Marc Dionne <marc.dionne@auristor.com> > cc: Ilya Dryomov <idryomov@gmail.com> > cc: Steve French <sfrench@samba.org> > cc: Paulo Alcantara <pc@manguebit.com> > cc: Trond Myklebust <trond.myklebust@hammerspace.com> > cc: v9fs@lists.linux.dev > cc: linux-afs@lists.infradead.org > cc: ceph-devel@vger.kernel.org > cc: linux-cifs@vger.kernel.org > cc: linux-nfs@vger.kernel.org > cc: netfs@lists.linux.dev > cc: linux-fsdevel@vger.kernel.org > --- > fs/9p/vfs_addr.c | 3 ++- > fs/afs/file.c | 3 ++- > fs/ceph/addr.c | 6 ++++-- > fs/netfs/io.c | 17 +++++++++++------ > fs/nfs/fscache.c | 3 ++- > fs/smb/client/file.c | 3 ++- > 6 files changed, 23 insertions(+), 12 deletions(-) > > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > index a97ceb105cd8..24fdc74caeba 100644 > --- a/fs/9p/vfs_addr.c > +++ b/fs/9p/vfs_addr.c > @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) > > /* if we just extended the file size, any portion not in > * cache won't be on server and is zeroes */ > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > + if (subreq->rreq->origin != NETFS_DIO_READ) > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > netfs_subreq_terminated(subreq, err ?: total, false); > } > diff --git a/fs/afs/file.c b/fs/afs/file.c > index c3f0c45ae9a9..ec1be0091fdb 100644 > --- a/fs/afs/file.c > +++ b/fs/afs/file.c > @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op) > > req->error = error; > if (subreq) { > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > + if (subreq->rreq->origin != NETFS_DIO_READ) > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > netfs_subreq_terminated(subreq, error ?: req->actual_len, false); > req->subreq = NULL; > } else if (req->done) { > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index cc0a2240de98..c4744a02db75 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req) > if (err >= 0) { > if (sparse && err > 0) > err = ceph_sparse_ext_map_end(op); > - if (err < subreq->len) > + if (err < subreq->len && > + subreq->rreq->origin != NETFS_DIO_READ) > __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > if (IS_ENCRYPTED(inode) && err > 0) { > err = ceph_fscrypt_decrypt_extents(inode, > @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq) > size_t len; > int mode; > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > + if (rreq->origin != NETFS_DIO_READ) > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); > > if (subreq->start >= inode->i_size) > diff --git a/fs/netfs/io.c b/fs/netfs/io.c > index c179a1c73fa7..5367caf3fa28 100644 > --- a/fs/netfs/io.c > +++ b/fs/netfs/io.c > @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, > > if (transferred_or_error == 0) { > if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { > - subreq->error = -ENODATA; > + if (rreq->origin != NETFS_DIO_READ) > + subreq->error = -ENODATA; > goto failed; > } > } else { > @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq, > } > if (subreq->len > ictx->zero_point - subreq->start) > subreq->len = ictx->zero_point - subreq->start; > + > + /* We limit buffered reads to the EOF, but let the > + * server deal with larger-than-EOF DIO/unbuffered > + * reads. > + */ > + if (subreq->len > rreq->i_size - subreq->start) > + subreq->len = rreq->i_size - subreq->start; > } > - if (subreq->len > rreq->i_size - subreq->start) > - subreq->len = rreq->i_size - subreq->start; > if (rreq->rsize && subreq->len > rreq->rsize) > subreq->len = rreq->rsize; > > @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync) > do { > _debug("submit %llx + %llx >= %llx", > rreq->start, rreq->submitted, rreq->i_size); > - if (rreq->origin == NETFS_DIO_READ && > - rreq->start + rreq->submitted >= rreq->i_size) > - break; > if (!netfs_rreq_submit_slice(rreq, &io_iter)) > break; > + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags)) > + break; > if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && > test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) > break; > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c > index bf29a65c5027..7a558dea75c4 100644 > --- a/fs/nfs/fscache.c > +++ b/fs/nfs/fscache.c > @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr) > return; > > sreq = netfs->sreq; > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) > + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) && > + sreq->rreq->origin != NETFS_DIO_READ) > __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags); > > if (hdr->error) > diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c > index b2405dd4d4d4..3f3842e7b44a 100644 > --- a/fs/smb/client/file.c > +++ b/fs/smb/client/file.c > @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq) > goto out; > } > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > + if (subreq->rreq->origin != NETFS_DIO_READ) > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > rc = rdata->server->ops->async_readv(rdata); > out: >
I did some additional testing of this patch after seeing regressions starting with 6.11-rc4 with xfstests generic/125 and generic/210 and was able to bisect it to this patch which just went in: commit e3786b29c54cdae3490b07180a54e2461f42144c Author: Dominique Martinet <asmadeus@codewreck.org> diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index b2405dd4d4d4..3f3842e7b44a 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq) goto out; } - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); rc = rdata->server->ops->async_readv(rdata); out: It is very simple to repro. I originally noticed it with tests to shares in Azure, but it also fails locally to Samba with default cifs.ko mount options. log message is below: [ 7884.205037] Workqueue: cifsiod smb2_readv_worker [cifs] [ 7884.205262] RIP: 0010:netfs_subreq_terminated+0x3f0/0x4b0 [netfs] [ 7884.205299] Code: 01 00 00 e8 02 b4 07 df 4c 8b 4c 24 08 49 89 d8 4c 89 e9 41 8b b4 24 d4 01 00 00 44 89 f2 48 c7 c7 40 10 65 c1 e8 30 a9 b6 de <0f> 0b 48 8b 7c 24 18 4c 8d bd c0 00 00 00 e8 2d b5 07 df 48 8b 7c [ 7884.205305] RSP: 0018:ff1100010705fce8 EFLAGS: 00010286 [ 7884.205312] RAX: dffffc0000000000 RBX: 0000000000001000 RCX: 0000000000000027 [ 7884.205317] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb1b1a08 [ 7884.205322] RBP: ff11000119450900 R08: ffffffffa03e346e R09: ffe21c0099636341 [ 7884.205326] R10: ff110004cb1b1a0b R11: 0000000000000001 R12: ff11000137b68a80 [ 7884.205330] R13: 000000000000012c R14: 0000000000000001 R15: ff11000126a96f78 [ 7884.205335] FS: 0000000000000000(0000) GS:ff110004cb180000(0000) knlGS:0000000000000000 [ 7884.205339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7884.205344] CR2: 00007f0035f0a67c CR3: 000000000f664004 CR4: 0000000000371ef0 [ 7884.205354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7884.205359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 7884.205363] Call Trace: [ 7884.205367] <TASK> [ 7884.205373] ? __warn+0xa4/0x220 [ 7884.205386] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs] [ 7884.205423] ? report_bug+0x1d4/0x1e0 [ 7884.205436] ? handle_bug+0x42/0x80 [ 7884.205442] ? exc_invalid_op+0x18/0x50 [ 7884.205449] ? asm_exc_invalid_op+0x1a/0x20 [ 7884.205464] ? irq_work_claim+0x1e/0x40 [ 7884.205475] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs] [ 7884.205512] ? netfs_subreq_terminated+0x3f0/0x4b0 [netfs] [ 7884.205554] process_one_work+0x4cf/0xb80 [ 7884.205573] ? __pfx_lock_acquire+0x10/0x10 [ 7884.205582] ? __pfx_process_one_work+0x10/0x10 [ 7884.205599] ? assign_work+0xd6/0x110 [ 7884.205609] worker_thread+0x2cd/0x550 [ 7884.205622] ? __pfx_worker_thread+0x10/0x10 [ 7884.205632] kthread+0x187/0x1d0 [ 7884.205639] ? __pfx_kthread+0x10/0x10 [ 7884.205648] ret_from_fork+0x34/0x60 [ 7884.205655] ? __pfx_kthread+0x10/0x10 [ 7884.205661] ret_from_fork_asm+0x1a/0x30 [ 7884.205684] </TASK> [ 7884.205688] irq event stamp: 23635 [ 7884.205692] hardirqs last enabled at (23641): [<ffffffffa022b58b>] console_unlock+0x15b/0x170 [ 7884.205699] hardirqs last disabled at (23646): [<ffffffffa022b570>] console_unlock+0x140/0x170 [ 7884.205705] softirqs last enabled at (23402): [<ffffffffa0131a6e>] __irq_exit_rcu+0xfe/0x120 [ 7884.205712] softirqs last disabled at (23397): [<ffffffffa0131a6e>] __irq_exit_rcu+0xfe/0x120 [ 7884.205718] ---[ end trace 0000000000000000 ]--- On Fri, Aug 9, 2024 at 10:43 PM Dominique Martinet <asmadeus@codewreck.org> wrote: > > David Howells wrote on Fri, Aug 09, 2024 at 02:56:09PM +0100: > > From: Dominique Martinet <asmadeus@codewreck.org> > > > > 9p: Fix DIO read through netfs > > nitpick: now sure how that ended up here but this is duplicated with the > subject (the commit message ends up with this line twice) > > > If a program is watching a file on a 9p mount, it won't see any change in > > size if the file being exported by the server is changed directly in the > > source filesystem, presumably because 9p doesn't have change notifications, > > and because netfs skips the reads if the file is empty. > > > > Fix this by attempting to read the full size specified when a DIO read is > > requested (such as when 9p is operating in unbuffered mode) and dealing > > with a short read if the EOF was less than the expected read. > > > > To make this work, filesystems using netfslib must not set > > NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF. > > I don't want to mandatorily clear this flag in netfslib for DIO because, > > say, ceph might make a read from an object that is not completely filled, > > but does not reside at the end of file - and so we need to clear the > > excess. > > > > This can be tested by watching an empty file over 9p within a VM (such as > > in the ktest framework): > > > > while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo > > (This is basically the same thing but if one wants to control the read > timing for more precise/verbose debugging: > exec 3< /host/tmp/foo > read -u 3 content && echo $content > (repeat as appropriate) > exec 3>&- > ) > > > then writing something into the empty file. The watcher should immediately > > display the file content and break out of the loop. Without this fix, it > > remains in the loop indefinitely. > > > > Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter") > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916 > > Written-by: Dominique Martinet <asmadeus@codewreck.org> > > Thanks for adding extra comments & fixing other filesystems. > > I've checked this covers all cases of setting NETFS_SREQ_CLEAR_TAIL so > hopefully shouldn't have further side effects, this sounds good to me: > > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > > > Signed-off-by: David Howells <dhowells@redhat.com> > > cc: Eric Van Hensbergen <ericvh@kernel.org> > > cc: Latchesar Ionkov <lucho@ionkov.net> > > cc: Christian Schoenebeck <linux_oss@crudebyte.com> > > cc: Marc Dionne <marc.dionne@auristor.com> > > cc: Ilya Dryomov <idryomov@gmail.com> > > cc: Steve French <sfrench@samba.org> > > cc: Paulo Alcantara <pc@manguebit.com> > > cc: Trond Myklebust <trond.myklebust@hammerspace.com> > > cc: v9fs@lists.linux.dev > > cc: linux-afs@lists.infradead.org > > cc: ceph-devel@vger.kernel.org > > cc: linux-cifs@vger.kernel.org > > cc: linux-nfs@vger.kernel.org > > cc: netfs@lists.linux.dev > > cc: linux-fsdevel@vger.kernel.org > > --- > > fs/9p/vfs_addr.c | 3 ++- > > fs/afs/file.c | 3 ++- > > fs/ceph/addr.c | 6 ++++-- > > fs/netfs/io.c | 17 +++++++++++------ > > fs/nfs/fscache.c | 3 ++- > > fs/smb/client/file.c | 3 ++- > > 6 files changed, 23 insertions(+), 12 deletions(-) > > > > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > > index a97ceb105cd8..24fdc74caeba 100644 > > --- a/fs/9p/vfs_addr.c > > +++ b/fs/9p/vfs_addr.c > > @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) > > > > /* if we just extended the file size, any portion not in > > * cache won't be on server and is zeroes */ > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > + if (subreq->rreq->origin != NETFS_DIO_READ) > > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > > > netfs_subreq_terminated(subreq, err ?: total, false); > > } > > diff --git a/fs/afs/file.c b/fs/afs/file.c > > index c3f0c45ae9a9..ec1be0091fdb 100644 > > --- a/fs/afs/file.c > > +++ b/fs/afs/file.c > > @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op) > > > > req->error = error; > > if (subreq) { > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > + if (subreq->rreq->origin != NETFS_DIO_READ) > > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > netfs_subreq_terminated(subreq, error ?: req->actual_len, false); > > req->subreq = NULL; > > } else if (req->done) { > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index cc0a2240de98..c4744a02db75 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req) > > if (err >= 0) { > > if (sparse && err > 0) > > err = ceph_sparse_ext_map_end(op); > > - if (err < subreq->len) > > + if (err < subreq->len && > > + subreq->rreq->origin != NETFS_DIO_READ) > > __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > if (IS_ENCRYPTED(inode) && err > 0) { > > err = ceph_fscrypt_decrypt_extents(inode, > > @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq) > > size_t len; > > int mode; > > > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > + if (rreq->origin != NETFS_DIO_READ) > > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); > > > > if (subreq->start >= inode->i_size) > > diff --git a/fs/netfs/io.c b/fs/netfs/io.c > > index c179a1c73fa7..5367caf3fa28 100644 > > --- a/fs/netfs/io.c > > +++ b/fs/netfs/io.c > > @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, > > > > if (transferred_or_error == 0) { > > if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { > > - subreq->error = -ENODATA; > > + if (rreq->origin != NETFS_DIO_READ) > > + subreq->error = -ENODATA; > > goto failed; > > } > > } else { > > @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq, > > } > > if (subreq->len > ictx->zero_point - subreq->start) > > subreq->len = ictx->zero_point - subreq->start; > > + > > + /* We limit buffered reads to the EOF, but let the > > + * server deal with larger-than-EOF DIO/unbuffered > > + * reads. > > + */ > > + if (subreq->len > rreq->i_size - subreq->start) > > + subreq->len = rreq->i_size - subreq->start; > > } > > - if (subreq->len > rreq->i_size - subreq->start) > > - subreq->len = rreq->i_size - subreq->start; > > if (rreq->rsize && subreq->len > rreq->rsize) > > subreq->len = rreq->rsize; > > > > @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync) > > do { > > _debug("submit %llx + %llx >= %llx", > > rreq->start, rreq->submitted, rreq->i_size); > > - if (rreq->origin == NETFS_DIO_READ && > > - rreq->start + rreq->submitted >= rreq->i_size) > > - break; > > if (!netfs_rreq_submit_slice(rreq, &io_iter)) > > break; > > + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags)) > > + break; > > if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && > > test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) > > break; > > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c > > index bf29a65c5027..7a558dea75c4 100644 > > --- a/fs/nfs/fscache.c > > +++ b/fs/nfs/fscache.c > > @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr) > > return; > > > > sreq = netfs->sreq; > > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) > > + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) && > > + sreq->rreq->origin != NETFS_DIO_READ) > > __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags); > > > > if (hdr->error) > > diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c > > index b2405dd4d4d4..3f3842e7b44a 100644 > > --- a/fs/smb/client/file.c > > +++ b/fs/smb/client/file.c > > @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq) > > goto out; > > } > > > > - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > + if (subreq->rreq->origin != NETFS_DIO_READ) > > + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); > > > > rc = rdata->server->ops->async_readv(rdata); > > out: > > > > -- > Dominique Martinet | Asmadeus >
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a97ceb105cd8..24fdc74caeba 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) /* if we just extended the file size, any portion not in * cache won't be on server and is zeroes */ - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, err ?: total, false); } diff --git a/fs/afs/file.c b/fs/afs/file.c index c3f0c45ae9a9..ec1be0091fdb 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op) req->error = error; if (subreq) { - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, error ?: req->actual_len, false); req->subreq = NULL; } else if (req->done) { diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index cc0a2240de98..c4744a02db75 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req) if (err >= 0) { if (sparse && err > 0) err = ceph_sparse_ext_map_end(op); - if (err < subreq->len) + if (err < subreq->len && + subreq->rreq->origin != NETFS_DIO_READ) __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); if (IS_ENCRYPTED(inode) && err > 0) { err = ceph_fscrypt_decrypt_extents(inode, @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq) size_t len; int mode; - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); if (subreq->start >= inode->i_size) diff --git a/fs/netfs/io.c b/fs/netfs/io.c index c179a1c73fa7..5367caf3fa28 100644 --- a/fs/netfs/io.c +++ b/fs/netfs/io.c @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, if (transferred_or_error == 0) { if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { - subreq->error = -ENODATA; + if (rreq->origin != NETFS_DIO_READ) + subreq->error = -ENODATA; goto failed; } } else { @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq, } if (subreq->len > ictx->zero_point - subreq->start) subreq->len = ictx->zero_point - subreq->start; + + /* We limit buffered reads to the EOF, but let the + * server deal with larger-than-EOF DIO/unbuffered + * reads. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; } - if (subreq->len > rreq->i_size - subreq->start) - subreq->len = rreq->i_size - subreq->start; if (rreq->rsize && subreq->len > rreq->rsize) subreq->len = rreq->rsize; @@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync) do { _debug("submit %llx + %llx >= %llx", rreq->start, rreq->submitted, rreq->i_size); - if (rreq->origin == NETFS_DIO_READ && - rreq->start + rreq->submitted >= rreq->i_size) - break; if (!netfs_rreq_submit_slice(rreq, &io_iter)) break; + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags)) + break; if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) break; diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index bf29a65c5027..7a558dea75c4 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -363,7 +363,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr) return; sreq = netfs->sreq; - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) && + sreq->rreq->origin != NETFS_DIO_READ) __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags); if (hdr->error) diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index b2405dd4d4d4..3f3842e7b44a 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -217,7 +217,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq) goto out; } - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); rc = rdata->server->ops->async_readv(rdata); out: