Message ID | 1465931115-30784-11-git-send-email-trond.myklebust@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Explanation of why reads are more special than writes here or in general why they are safe? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/15/16, 03:16, "Christoph Hellwig" <hch@infradead.org> wrote: >Explanation of why reads are more special than writes here or in >general why they are safe? > With the new locking, we already have exclusion between buffered I/O and direct I/O, and so the only remaining use case for inode_dio_wait() is to wait for writes to complete in operations like fsync(). Cheers Trond
On Wed, Jun 15, 2016 at 02:36:04PM +0000, Trond Myklebust wrote: > On 6/15/16, 03:16, "Christoph Hellwig" <hch@infradead.org> wrote: > > >Explanation of why reads are more special than writes here or in > >general why they are safe? > > > > With the new locking, we already have exclusion between buffered I/O and direct I/O, and so the only remaining use case for inode_dio_wait() is to wait for writes to complete in operations like fsync(). There is no need to wait for pending dio in fsync - fsync is only guarantee to flush out I/O that's alreayd been completed. inode_dio_wait and friends were introduces to protect aio that doesn't hold i_mutex against truncate. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCk9uIDYvMTUvMTYsIDEwOjQxLCAiQ2hyaXN0b3BoIEhlbGx3aWciIDxoY2hAaW5mcmFkZWFk Lm9yZz4gd3JvdGU6DQoNCj5PbiBXZWQsIEp1biAxNSwgMjAxNiBhdCAwMjozNjowNFBNICswMDAw LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+PiBPbiA2LzE1LzE2LCAwMzoxNiwgIkNocmlzdG9w aCBIZWxsd2lnIiA8aGNoQGluZnJhZGVhZC5vcmc+IHdyb3RlOg0KPj4gDQo+PiA+RXhwbGFuYXRp b24gb2Ygd2h5IHJlYWRzIGFyZSBtb3JlIHNwZWNpYWwgdGhhbiB3cml0ZXMgaGVyZSBvciBpbg0K Pj4gPmdlbmVyYWwgd2h5IHRoZXkgYXJlIHNhZmU/DQo+PiA+DQo+PiANCj4+IFdpdGggdGhlIG5l dyBsb2NraW5nLCB3ZSBhbHJlYWR5IGhhdmUgZXhjbHVzaW9uIGJldHdlZW4gYnVmZmVyZWQgSS9P IGFuZCBkaXJlY3QgSS9PLCBhbmQgc28gdGhlIG9ubHkgcmVtYWluaW5nIHVzZSBjYXNlIGZvciBp bm9kZV9kaW9fd2FpdCgpIGlzIHRvIHdhaXQgZm9yIHdyaXRlcyB0byBjb21wbGV0ZSBpbiBvcGVy YXRpb25zIGxpa2UgZnN5bmMoKS4NCj4NCj5UaGVyZSBpcyBubyBuZWVkIHRvIHdhaXQgZm9yIHBl bmRpbmcgZGlvIGluIGZzeW5jIC0gZnN5bmMgaXMgb25seQ0KPmd1YXJhbnRlZSB0byBmbHVzaCBv dXQgSS9PIHRoYXQncyBhbHJlYXlkIGJlZW4gY29tcGxldGVkLg0KPg0KPmlub2RlX2Rpb193YWl0 IGFuZCBmcmllbmRzIHdlcmUgaW50cm9kdWNlcyB0byBwcm90ZWN0IGFpbyB0aGF0IGRvZXNuJ3QN Cj5ob2xkIGlfbXV0ZXggYWdhaW5zdCB0cnVuY2F0ZS4NCg0KRmFpciBlbm91Z2guIFNvIGl0IHNv dW5kcyBhcyBpZiB5b3UgYXJlIHN1Z2dlc3Rpbmcgd2UgY2FuIGp1c3QgZHJvcCB1c2luZyBpbm9k ZV9kaW9fKigpIGFsdG9nZXRoZXI/DQoNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 15, 2016 at 02:50:07PM +0000, Trond Myklebust wrote:
> Fair enough. So it sounds as if you are suggesting we can just drop using inode_dio_*() altogether?
Maybe. The big question is how a direct write vs truncate race is
handled. Either way I think a patch like this deserves a detailed
analysis documented in the changelog and/or comments in the code.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 81b19c0fd3a3..e1376538b473 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -385,11 +385,6 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write) spin_unlock(&inode->i_lock); } - if (write) - nfs_zap_mapping(inode, inode->i_mapping); - - inode_dio_end(inode); - if (dreq->iocb) { long res = (long) dreq->error; if (!res) @@ -488,7 +483,6 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, &nfs_direct_read_completion_ops); get_dreq(dreq); desc.pg_dreq = dreq; - inode_dio_begin(inode); while (iov_iter_count(iter)) { struct page **pagevec; @@ -540,7 +534,6 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * generic layer handle the completion. */ if (requested_bytes == 0) { - inode_dio_end(inode); nfs_direct_req_release(dreq); return result < 0 ? result : -EIO; } @@ -771,6 +764,7 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq) static void nfs_direct_write_schedule_work(struct work_struct *work) { struct nfs_direct_req *dreq = container_of(work, struct nfs_direct_req, work); + struct inode *inode = dreq->inode; int flags = dreq->flags; dreq->flags = 0; @@ -782,6 +776,9 @@ static void nfs_direct_write_schedule_work(struct work_struct *work) nfs_direct_write_reschedule(dreq); break; default: + nfs_zap_mapping(inode, inode->i_mapping); + inode_dio_end(inode); + nfs_direct_complete(dreq, true); } }
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)