From patchwork Fri Feb 17 18:46:20 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Benjamin Coddington <bcodding@redhat.com>
X-Patchwork-Id: 9580565
Return-Path: <linux-nfs-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	03714600C5 for <patchwork-linux-nfs@patchwork.kernel.org>;
	Fri, 17 Feb 2017 18:46:26 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E98FC28755
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Fri, 17 Feb 2017 18:46:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id DE82828758; Fri, 17 Feb 2017 18:46:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76E542875C
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Fri, 17 Feb 2017 18:46:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934924AbdBQSqX (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Fri, 17 Feb 2017 13:46:23 -0500
Received: from mx1.redhat.com ([209.132.183.28]:34020 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S934925AbdBQSqV (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Fri, 17 Feb 2017 13:46:21 -0500
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 0FE766AAC7;
	Fri, 17 Feb 2017 18:46:22 +0000 (UTC)
Received: from bcodding.csb (vpn-61-251.rdu2.redhat.com [10.10.61.251])
	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id v1HIkLcU014387; Fri, 17 Feb 2017 13:46:21 -0500
Received: by bcodding.csb (Postfix, from userid 24008)
	id 4CC6410C2C24; Fri, 17 Feb 2017 13:46:20 -0500 (EST)
From: Benjamin Coddington <bcodding@redhat.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 4/4] NFS: Always wait for I/O completion before unlock
Date: Fri, 17 Feb 2017 13:46:20 -0500
Message-Id: 
 <d81a853ea1ee4ba1294c85bbed7aa7298852b180.1487356965.git.bcodding@redhat.com>
In-Reply-To: <cover.1487356965.git.bcodding@redhat.com>
References: <cover.1487356965.git.bcodding@redhat.com>
In-Reply-To: <cover.1487356965.git.bcodding@redhat.com>
References: <cover.1487356965.git.bcodding@redhat.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.26]);
	Fri, 17 Feb 2017 18:46:22 +0000 (UTC)
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

NFS attempts to wait for read and write completion before unlocking in
order to ensure that the data returned was protected by the lock.  When
this waiting is interrupted by a signal, the unlock may never be sent, and
messages similar to the following are seen in the kernel ring buffer:

[20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
[20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183
[20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185

For NFSv3, the missing unlock will cause the server to refuse conflicting
locks indefinitely.  For NFSv4, the leftover lock will be removed by the
server after the lease timeout.

This patch fixes this for NFSv3 by skipping the wait in order to
immediately send the unlock if the FL_CLOSE flag is set when signaled.  For
NFSv4, this approach may cause the server to see the I/O as arriving with
an old stateid, so, for the v4 case the fix is different: the wait on I/O
completion is moved into nfs4_locku_ops' rpc_call_prepare().  This will
cause the sleep to happen in rpciod context, and a signal to the originally
waiting process will not cause the unlock to be skipped.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/file.c     | 13 -------------
 fs/nfs/nfs3proc.c | 13 +++++++++++++
 fs/nfs/nfs4proc.c |  7 +++++++
 fs/nfs/pagelist.c |  1 +
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a490f45df4db..adc67fe762e3 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -684,7 +684,6 @@ static int
 do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 {
 	struct inode *inode = filp->f_mapping->host;
-	struct nfs_lock_context *l_ctx;
 	int status;
 
 	/*
@@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 	 */
 	vfs_fsync(filp, 0);
 
-	l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
-	if (!IS_ERR(l_ctx)) {
-		status = nfs_iocounter_wait(l_ctx);
-		nfs_put_lock_context(l_ctx);
-		if (status < 0)
-			return status;
-	}
-
-	/* NOTE: special case
-	 * 	If we're signalled while cleaning up locks on process exit, we
-	 * 	still need to complete the unlock.
-	 */
 	/*
 	 * Use local locking if mounted with "-onolock" or with appropriate
 	 * "-olocal_lock="
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index dc925b531f32..ec3f12571c82 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -869,6 +869,19 @@ static int
 nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
 {
 	struct inode *inode = file_inode(filp);
+	int status;
+	struct nfs_lock_context *l_ctx;
+
+	/* For v3, always send the unlock on FL_CLOSE */
+	if (fl->fl_type == F_UNLCK) {
+		l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
+		if (!IS_ERR(l_ctx)) {
+			status = nfs_iocounter_wait(l_ctx);
+			nfs_put_lock_context(l_ctx);
+			if (status < 0 && !(fl->fl_flags & FL_CLOSE))
+				return status;
+		}
+	}
 
 	return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
 }
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 91f88bfbbe79..052b97fd653d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5906,6 +5906,7 @@ struct nfs4_unlockdata {
 	struct nfs_locku_res res;
 	struct nfs4_lock_state *lsp;
 	struct nfs_open_context *ctx;
+	struct nfs_lock_context *l_ctx;
 	struct file_lock fl;
 	struct nfs_server *server;
 	unsigned long timestamp;
@@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
 	atomic_inc(&lsp->ls_count);
 	/* Ensure we don't close file until we're done freeing locks! */
 	p->ctx = get_nfs_open_context(ctx);
+	p->l_ctx = nfs_get_lock_context(ctx);
 	memcpy(&p->fl, fl, sizeof(p->fl));
 	p->server = NFS_SERVER(inode);
 	return p;
@@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
 		/* Note: exit _without_ running nfs4_locku_done */
 		goto out_no_action;
 	}
+
+	if (!IS_ERR(calldata->l_ctx)) {
+		nfs_iocounter_wait(calldata->l_ctx);
+		nfs_put_lock_context(calldata->l_ctx);
+	}
 	calldata->timestamp = jiffies;
 	if (nfs4_setup_sequence(calldata->server,
 				&calldata->arg.seq_args,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6e629b856a00..8bf3cefdaa96 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context *l_ctx)
 	return wait_on_atomic_t(&l_ctx->io_count, nfs_wait_atomic_killable,
 			TASK_KILLABLE);
 }
+EXPORT_SYMBOL(nfs_iocounter_wait);
 
 /*
  * nfs_page_group_lock - lock the head of the page group