Message ID | 20191001202000.13248-2-trond.myklebust@hammerspace.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix O_DIRECT error handling | expand |
Hi Trond, After this patch we have new problem. Sometimes generic/465 test failed. We capture the packets and find that "read" will read file hole data when "write" is going on. Something like this situation as below, |xxxxxxxxxxxx| |xxxxxxxxxxxxxxxxx| \-------w-----/\---hole---/\----------w-------/ ^ R When "read"/"write" needs more rpc calls, read rpc call in middle will read file hole data. Because rpc calls in "write" are async, once rpc calls in "write" are out of order then "read" may read hole data. Maybe we need serialize the r/w in directio mode, what do you think? Thanks 在 2019/10/2 4:19, Trond Myklebust 写道: > When a series of O_DIRECT reads or writes are truncated, either due to > eof or due to an error, then we should return the number of contiguous > bytes that were received/sent starting at the offset specified by the > application. > > Currently, we are failing to correctly check contiguity, and so we're > failing the generic/465 in xfstests when the race between the read > and write RPCs causes the file to get extended while the 2 reads are > outstanding. If the first read RPC call wins the race and returns with > eof set, we should treat the second read RPC as being truncated. > > Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> > Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") > Cc: stable@vger.kernel.org # 4.1+ > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/direct.c | 82 ++++++++++++++++++++++++++++--------------------- > 1 file changed, 47 insertions(+), 35 deletions(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index 222d7115db71..62cb4a1a87f0 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -123,32 +123,53 @@ static inline int put_dreq(struct nfs_direct_req *dreq) > } > > static void > -nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr) > +nfs_direct_handle_truncated(struct nfs_direct_req *dreq, > + const struct nfs_pgio_header *hdr, > + ssize_t dreq_len) > { > - int i; > - ssize_t count; > + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; > + loff_t hdr_arg = hdr->io_start + hdr->args.count; > + ssize_t expected = 0; > + > + if (hdr_arg > dreq->io_start) > + expected = hdr_arg - dreq->io_start; > + > + if (dreq_len == expected) > + return; > + if (dreq->max_count > dreq_len) { > + dreq->max_count = dreq_len; > + if (dreq->count > dreq_len) > + dreq->count = dreq_len; > + > + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > + dreq->error = hdr->error; > + else if (hdr->good_bytes > 0) > + dreq->error = 0; > + } > + if (mirror->count > dreq_len) > + mirror->count = dreq_len; > +} > > - WARN_ON_ONCE(dreq->count >= dreq->max_count); > +static void > +nfs_direct_count_bytes(struct nfs_direct_req *dreq, > + const struct nfs_pgio_header *hdr) > +{ > + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; > + loff_t hdr_end = hdr->io_start + hdr->good_bytes; > + ssize_t dreq_len = 0; > > - if (dreq->mirror_count == 1) { > - dreq->mirrors[hdr->pgio_mirror_idx].count += hdr->good_bytes; > - dreq->count += hdr->good_bytes; > - } else { > - /* mirrored writes */ > - count = dreq->mirrors[hdr->pgio_mirror_idx].count; > - if (count + dreq->io_start < hdr->io_start + hdr->good_bytes) { > - count = hdr->io_start + hdr->good_bytes - dreq->io_start; > - dreq->mirrors[hdr->pgio_mirror_idx].count = count; > - } > - /* update the dreq->count by finding the minimum agreed count from all > - * mirrors */ > - count = dreq->mirrors[0].count; > + if (hdr_end > dreq->io_start) > + dreq_len = hdr_end - dreq->io_start; > > - for (i = 1; i < dreq->mirror_count; i++) > - count = min(count, dreq->mirrors[i].count); > + nfs_direct_handle_truncated(dreq, hdr, dreq_len); > > - dreq->count = count; > - } > + if (dreq_len > dreq->max_count) > + dreq_len = dreq->max_count; > + > + if (mirror->count < dreq_len) > + mirror->count = dreq_len; > + if (dreq->count < dreq_len) > + dreq->count = dreq_len; > } > > /* > @@ -402,20 +423,12 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr) > struct nfs_direct_req *dreq = hdr->dreq; > > spin_lock(&dreq->lock); > - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > - dreq->error = hdr->error; > - > if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { > spin_unlock(&dreq->lock); > goto out_put; > } > > - if (hdr->good_bytes != 0) > - nfs_direct_good_bytes(dreq, hdr); > - > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) > - dreq->error = 0; > - > + nfs_direct_count_bytes(dreq, hdr); > spin_unlock(&dreq->lock); > > while (!list_empty(&hdr->pages)) { > @@ -652,6 +665,9 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) > nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); > > dreq->count = 0; > + dreq->max_count = 0; > + list_for_each_entry(req, &reqs, wb_list) > + dreq->max_count += req->wb_bytes; > dreq->verf.committed = NFS_INVALID_STABLE_HOW; > nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); > for (i = 0; i < dreq->mirror_count; i++) > @@ -791,17 +807,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) > nfs_init_cinfo_from_dreq(&cinfo, dreq); > > spin_lock(&dreq->lock); > - > - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > - dreq->error = hdr->error; > - > if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { > spin_unlock(&dreq->lock); > goto out_put; > } > > + nfs_direct_count_bytes(dreq, hdr); > if (hdr->good_bytes != 0) { > - nfs_direct_good_bytes(dreq, hdr); > if (nfs_write_need_commit(hdr)) { > if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) > request_commit = true;
On Tue, 2019-10-08 at 16:11 +0800, Su Yanjun wrote: > Hi Trond, > > After this patch we have new problem. Sometimes generic/465 test > failed. > > We capture the packets and find that "read" will read file hole data > when "write" is going on. > > Something like this situation as below, > > > xxxxxxxxxxxx| |xxxxxxxxxxxxxxxxx| > > \-------w-----/\---hole---/\----------w-------/ > > ^ > > R > > > When "read"/"write" needs more rpc calls, read rpc call in middle > will > read file hole data. > > Because rpc calls in "write" are async, once rpc calls in "write" > are > out of order then "read" may read hole data. > > Maybe we need serialize the r/w in directio mode, what do you think? No. Basic O_DIRECT does not guarantee atomicity of requests, which is why we do not have generic locking at the VFS level when reading and writing. The only guarantee being offered is that O_DIRECT and buffered writes do not collide. IOW: I think the basic premise for this test is just broken (as I commented in the patch series I sent) because it is assuming a behaviour that is simply not guaranteed. BTW: note that buffered writes have the same property. They are ordered when being written into the page cache, meaning that reads on the same client will see no holes, however if you try to read from another client, then you will see the same behaviour, with temporary holes magically appearing in the file. > > Thanks > > 在 2019/10/2 4:19, Trond Myklebust 写道: > > When a series of O_DIRECT reads or writes are truncated, either due > > to > > eof or due to an error, then we should return the number of > > contiguous > > bytes that were received/sent starting at the offset specified by > > the > > application. > > > > Currently, we are failing to correctly check contiguity, and so > > we're > > failing the generic/465 in xfstests when the race between the read > > and write RPCs causes the file to get extended while the 2 reads > > are > > outstanding. If the first read RPC call wins the race and returns > > with > > eof set, we should treat the second read RPC as being truncated. > > > > Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> > > Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") > > Cc: stable@vger.kernel.org # 4.1+ > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/direct.c | 82 ++++++++++++++++++++++++++++-------------- > > ------- > > 1 file changed, 47 insertions(+), 35 deletions(-) > > > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > > index 222d7115db71..62cb4a1a87f0 100644 > > --- a/fs/nfs/direct.c > > +++ b/fs/nfs/direct.c > > @@ -123,32 +123,53 @@ static inline int put_dreq(struct > > nfs_direct_req *dreq) > > } > > > > static void > > -nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct > > nfs_pgio_header *hdr) > > +nfs_direct_handle_truncated(struct nfs_direct_req *dreq, > > + const struct nfs_pgio_header *hdr, > > + ssize_t dreq_len) > > { > > - int i; > > - ssize_t count; > > + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr- > > >pgio_mirror_idx]; > > + loff_t hdr_arg = hdr->io_start + hdr->args.count; > > + ssize_t expected = 0; > > + > > + if (hdr_arg > dreq->io_start) > > + expected = hdr_arg - dreq->io_start; > > + > > + if (dreq_len == expected) > > + return; > > + if (dreq->max_count > dreq_len) { > > + dreq->max_count = dreq_len; > > + if (dreq->count > dreq_len) > > + dreq->count = dreq_len; > > + > > + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > > + dreq->error = hdr->error; > > + else if (hdr->good_bytes > 0) > > + dreq->error = 0; > > + } > > + if (mirror->count > dreq_len) > > + mirror->count = dreq_len; > > +} > > > > - WARN_ON_ONCE(dreq->count >= dreq->max_count); > > +static void > > +nfs_direct_count_bytes(struct nfs_direct_req *dreq, > > + const struct nfs_pgio_header *hdr) > > +{ > > + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr- > > >pgio_mirror_idx]; > > + loff_t hdr_end = hdr->io_start + hdr->good_bytes; > > + ssize_t dreq_len = 0; > > > > - if (dreq->mirror_count == 1) { > > - dreq->mirrors[hdr->pgio_mirror_idx].count += hdr- > > >good_bytes; > > - dreq->count += hdr->good_bytes; > > - } else { > > - /* mirrored writes */ > > - count = dreq->mirrors[hdr->pgio_mirror_idx].count; > > - if (count + dreq->io_start < hdr->io_start + hdr- > > >good_bytes) { > > - count = hdr->io_start + hdr->good_bytes - dreq- > > >io_start; > > - dreq->mirrors[hdr->pgio_mirror_idx].count = > > count; > > - } > > - /* update the dreq->count by finding the minimum agreed > > count from all > > - * mirrors */ > > - count = dreq->mirrors[0].count; > > + if (hdr_end > dreq->io_start) > > + dreq_len = hdr_end - dreq->io_start; > > > > - for (i = 1; i < dreq->mirror_count; i++) > > - count = min(count, dreq->mirrors[i].count); > > + nfs_direct_handle_truncated(dreq, hdr, dreq_len); > > > > - dreq->count = count; > > - } > > + if (dreq_len > dreq->max_count) > > + dreq_len = dreq->max_count; > > + > > + if (mirror->count < dreq_len) > > + mirror->count = dreq_len; > > + if (dreq->count < dreq_len) > > + dreq->count = dreq_len; > > } > > > > /* > > @@ -402,20 +423,12 @@ static void nfs_direct_read_completion(struct > > nfs_pgio_header *hdr) > > struct nfs_direct_req *dreq = hdr->dreq; > > > > spin_lock(&dreq->lock); > > - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > > - dreq->error = hdr->error; > > - > > if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { > > spin_unlock(&dreq->lock); > > goto out_put; > > } > > > > - if (hdr->good_bytes != 0) > > - nfs_direct_good_bytes(dreq, hdr); > > - > > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) > > - dreq->error = 0; > > - > > + nfs_direct_count_bytes(dreq, hdr); > > spin_unlock(&dreq->lock); > > > > while (!list_empty(&hdr->pages)) { > > @@ -652,6 +665,9 @@ static void nfs_direct_write_reschedule(struct > > nfs_direct_req *dreq) > > nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); > > > > dreq->count = 0; > > + dreq->max_count = 0; > > + list_for_each_entry(req, &reqs, wb_list) > > + dreq->max_count += req->wb_bytes; > > dreq->verf.committed = NFS_INVALID_STABLE_HOW; > > nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); > > for (i = 0; i < dreq->mirror_count; i++) > > @@ -791,17 +807,13 @@ static void > > nfs_direct_write_completion(struct nfs_pgio_header *hdr) > > nfs_init_cinfo_from_dreq(&cinfo, dreq); > > > > spin_lock(&dreq->lock); > > - > > - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) > > - dreq->error = hdr->error; > > - > > if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { > > spin_unlock(&dreq->lock); > > goto out_put; > > } > > > > + nfs_direct_count_bytes(dreq, hdr); > > if (hdr->good_bytes != 0) { > > - nfs_direct_good_bytes(dreq, hdr); > > if (nfs_write_need_commit(hdr)) { > > if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) > > request_commit = true; > >
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 222d7115db71..62cb4a1a87f0 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -123,32 +123,53 @@ static inline int put_dreq(struct nfs_direct_req *dreq) } static void -nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr) +nfs_direct_handle_truncated(struct nfs_direct_req *dreq, + const struct nfs_pgio_header *hdr, + ssize_t dreq_len) { - int i; - ssize_t count; + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; + loff_t hdr_arg = hdr->io_start + hdr->args.count; + ssize_t expected = 0; + + if (hdr_arg > dreq->io_start) + expected = hdr_arg - dreq->io_start; + + if (dreq_len == expected) + return; + if (dreq->max_count > dreq_len) { + dreq->max_count = dreq_len; + if (dreq->count > dreq_len) + dreq->count = dreq_len; + + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) + dreq->error = hdr->error; + else if (hdr->good_bytes > 0) + dreq->error = 0; + } + if (mirror->count > dreq_len) + mirror->count = dreq_len; +} - WARN_ON_ONCE(dreq->count >= dreq->max_count); +static void +nfs_direct_count_bytes(struct nfs_direct_req *dreq, + const struct nfs_pgio_header *hdr) +{ + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; + loff_t hdr_end = hdr->io_start + hdr->good_bytes; + ssize_t dreq_len = 0; - if (dreq->mirror_count == 1) { - dreq->mirrors[hdr->pgio_mirror_idx].count += hdr->good_bytes; - dreq->count += hdr->good_bytes; - } else { - /* mirrored writes */ - count = dreq->mirrors[hdr->pgio_mirror_idx].count; - if (count + dreq->io_start < hdr->io_start + hdr->good_bytes) { - count = hdr->io_start + hdr->good_bytes - dreq->io_start; - dreq->mirrors[hdr->pgio_mirror_idx].count = count; - } - /* update the dreq->count by finding the minimum agreed count from all - * mirrors */ - count = dreq->mirrors[0].count; + if (hdr_end > dreq->io_start) + dreq_len = hdr_end - dreq->io_start; - for (i = 1; i < dreq->mirror_count; i++) - count = min(count, dreq->mirrors[i].count); + nfs_direct_handle_truncated(dreq, hdr, dreq_len); - dreq->count = count; - } + if (dreq_len > dreq->max_count) + dreq_len = dreq->max_count; + + if (mirror->count < dreq_len) + mirror->count = dreq_len; + if (dreq->count < dreq_len) + dreq->count = dreq_len; } /* @@ -402,20 +423,12 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr) struct nfs_direct_req *dreq = hdr->dreq; spin_lock(&dreq->lock); - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) - dreq->error = hdr->error; - if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { spin_unlock(&dreq->lock); goto out_put; } - if (hdr->good_bytes != 0) - nfs_direct_good_bytes(dreq, hdr); - - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) - dreq->error = 0; - + nfs_direct_count_bytes(dreq, hdr); spin_unlock(&dreq->lock); while (!list_empty(&hdr->pages)) { @@ -652,6 +665,9 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); dreq->count = 0; + dreq->max_count = 0; + list_for_each_entry(req, &reqs, wb_list) + dreq->max_count += req->wb_bytes; dreq->verf.committed = NFS_INVALID_STABLE_HOW; nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); for (i = 0; i < dreq->mirror_count; i++) @@ -791,17 +807,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) nfs_init_cinfo_from_dreq(&cinfo, dreq); spin_lock(&dreq->lock); - - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) - dreq->error = hdr->error; - if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { spin_unlock(&dreq->lock); goto out_put; } + nfs_direct_count_bytes(dreq, hdr); if (hdr->good_bytes != 0) { - nfs_direct_good_bytes(dreq, hdr); if (nfs_write_need_commit(hdr)) { if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) request_commit = true;
When a series of O_DIRECT reads or writes are truncated, either due to eof or due to an error, then we should return the number of contiguous bytes that were received/sent starting at the offset specified by the application. Currently, we are failing to correctly check contiguity, and so we're failing the generic/465 in xfstests when the race between the read and write RPCs causes the file to get extended while the 2 reads are outstanding. If the first read RPC call wins the race and returns with eof set, we should treat the second read RPC as being truncated. Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/direct.c | 82 ++++++++++++++++++++++++++++--------------------- 1 file changed, 47 insertions(+), 35 deletions(-)