diff mbox

[11/12] NFS: Don't count O_DIRECT reads in the inode->i_dio_count

Message ID 1465931115-30784-11-git-send-email-trond.myklebust@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust June 14, 2016, 7:05 p.m. UTC
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig June 15, 2016, 7:16 a.m. UTC | #1
Explanation of why reads are more special than writes here or in
general why they are safe?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust June 15, 2016, 2:36 p.m. UTC | #2
On 6/15/16, 03:16, "Christoph Hellwig" <hch@infradead.org> wrote:

>Explanation of why reads are more special than writes here or in

>general why they are safe?

>


With the new locking, we already have exclusion between buffered I/O and direct I/O, and so the only remaining use case for inode_dio_wait() is to wait for writes to complete in operations like fsync().

Cheers
  Trond
Christoph Hellwig June 15, 2016, 2:41 p.m. UTC | #3
On Wed, Jun 15, 2016 at 02:36:04PM +0000, Trond Myklebust wrote:
> On 6/15/16, 03:16, "Christoph Hellwig" <hch@infradead.org> wrote:
> 
> >Explanation of why reads are more special than writes here or in
> >general why they are safe?
> >
> 
> With the new locking, we already have exclusion between buffered I/O and direct I/O, and so the only remaining use case for inode_dio_wait() is to wait for writes to complete in operations like fsync().

There is no need to wait for pending dio in fsync - fsync is only
guarantee to flush out I/O that's alreayd been completed.

inode_dio_wait and friends were introduces to protect aio that doesn't
hold i_mutex against truncate.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust June 15, 2016, 2:50 p.m. UTC | #4
DQoNCk9uIDYvMTUvMTYsIDEwOjQxLCAiQ2hyaXN0b3BoIEhlbGx3aWciIDxoY2hAaW5mcmFkZWFk
Lm9yZz4gd3JvdGU6DQoNCj5PbiBXZWQsIEp1biAxNSwgMjAxNiBhdCAwMjozNjowNFBNICswMDAw
LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+PiBPbiA2LzE1LzE2LCAwMzoxNiwgIkNocmlzdG9w
aCBIZWxsd2lnIiA8aGNoQGluZnJhZGVhZC5vcmc+IHdyb3RlOg0KPj4gDQo+PiA+RXhwbGFuYXRp
b24gb2Ygd2h5IHJlYWRzIGFyZSBtb3JlIHNwZWNpYWwgdGhhbiB3cml0ZXMgaGVyZSBvciBpbg0K
Pj4gPmdlbmVyYWwgd2h5IHRoZXkgYXJlIHNhZmU/DQo+PiA+DQo+PiANCj4+IFdpdGggdGhlIG5l
dyBsb2NraW5nLCB3ZSBhbHJlYWR5IGhhdmUgZXhjbHVzaW9uIGJldHdlZW4gYnVmZmVyZWQgSS9P
IGFuZCBkaXJlY3QgSS9PLCBhbmQgc28gdGhlIG9ubHkgcmVtYWluaW5nIHVzZSBjYXNlIGZvciBp
bm9kZV9kaW9fd2FpdCgpIGlzIHRvIHdhaXQgZm9yIHdyaXRlcyB0byBjb21wbGV0ZSBpbiBvcGVy
YXRpb25zIGxpa2UgZnN5bmMoKS4NCj4NCj5UaGVyZSBpcyBubyBuZWVkIHRvIHdhaXQgZm9yIHBl
bmRpbmcgZGlvIGluIGZzeW5jIC0gZnN5bmMgaXMgb25seQ0KPmd1YXJhbnRlZSB0byBmbHVzaCBv
dXQgSS9PIHRoYXQncyBhbHJlYXlkIGJlZW4gY29tcGxldGVkLg0KPg0KPmlub2RlX2Rpb193YWl0
IGFuZCBmcmllbmRzIHdlcmUgaW50cm9kdWNlcyB0byBwcm90ZWN0IGFpbyB0aGF0IGRvZXNuJ3QN
Cj5ob2xkIGlfbXV0ZXggYWdhaW5zdCB0cnVuY2F0ZS4NCg0KRmFpciBlbm91Z2guIFNvIGl0IHNv
dW5kcyBhcyBpZiB5b3UgYXJlIHN1Z2dlc3Rpbmcgd2UgY2FuIGp1c3QgZHJvcCB1c2luZyBpbm9k
ZV9kaW9fKigpIGFsdG9nZXRoZXI/DQoNCg0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig June 15, 2016, 2:53 p.m. UTC | #5
On Wed, Jun 15, 2016 at 02:50:07PM +0000, Trond Myklebust wrote:
> Fair enough. So it sounds as if you are suggesting we can just drop using inode_dio_*() altogether?

Maybe.  The big question is how a direct write vs truncate race is
handled.  Either way I think a patch like this deserves a detailed
analysis documented in the changelog and/or comments in the code.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 81b19c0fd3a3..e1376538b473 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -385,11 +385,6 @@  static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
 		spin_unlock(&inode->i_lock);
 	}
 
-	if (write)
-		nfs_zap_mapping(inode, inode->i_mapping);
-
-	inode_dio_end(inode);
-
 	if (dreq->iocb) {
 		long res = (long) dreq->error;
 		if (!res)
@@ -488,7 +483,6 @@  static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			     &nfs_direct_read_completion_ops);
 	get_dreq(dreq);
 	desc.pg_dreq = dreq;
-	inode_dio_begin(inode);
 
 	while (iov_iter_count(iter)) {
 		struct page **pagevec;
@@ -540,7 +534,6 @@  static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	 * generic layer handle the completion.
 	 */
 	if (requested_bytes == 0) {
-		inode_dio_end(inode);
 		nfs_direct_req_release(dreq);
 		return result < 0 ? result : -EIO;
 	}
@@ -771,6 +764,7 @@  static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 static void nfs_direct_write_schedule_work(struct work_struct *work)
 {
 	struct nfs_direct_req *dreq = container_of(work, struct nfs_direct_req, work);
+	struct inode *inode = dreq->inode;
 	int flags = dreq->flags;
 
 	dreq->flags = 0;
@@ -782,6 +776,9 @@  static void nfs_direct_write_schedule_work(struct work_struct *work)
 			nfs_direct_write_reschedule(dreq);
 			break;
 		default:
+			nfs_zap_mapping(inode, inode->i_mapping);
+			inode_dio_end(inode);
+
 			nfs_direct_complete(dreq, true);
 	}
 }