From patchwork Wed Jan 29 13:39:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13953767 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 721BA1B87EE; Wed, 29 Jan 2025 13:40:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738158017; cv=none; b=PcfldRMyc/gED03VC9fF4fBHLAicO/Zsx/8FD4SAaK/93O/IebW9mTOdvkpKIdcxqhgHvJgjlWviBtZppkv0ulr7Yh/EEZ6toFPkJ0yGmW7XvSo3qpKCXKew0ZpkxUa7grFL9BjbawCh6hhTb0laVnNNqzpQFdunt8xb/6l3WoM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738158017; c=relaxed/simple; bh=NbJxpKAEYs/HAN2fTpo7awYOhoPSyHXKWtM0VOeNu7s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=f7feIDFs3fh7gFXWJdpASniS58LKOoSugzHSo6fqWoF2bFsumk2Wgc1XezgSvrWSJ1vTeV4ocBFCoxnoToTl06V9E+UGxwSMbQ+C5VC9Tsnvtp12ZJ5U5SBaWwwo7c9NPuHiihBOGV7e+nAQUcMnLL+bMVdDoHesyTYL7C7rr2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fXKR+3DT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fXKR+3DT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E353CC4CEE6; Wed, 29 Jan 2025 13:40:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738158016; bh=NbJxpKAEYs/HAN2fTpo7awYOhoPSyHXKWtM0VOeNu7s=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=fXKR+3DTmOg8nuPtE3hhUAvf5c07NkL9NnmfT4RoWl6dEYNuZkYhpKJve+PDAnVJq Iq18AVU+8gbo0hnMIHcph7BHqwucHWEsfVuVgfI5o2m+FPrsiCnJzZHqJg0JuHwtzs y/9tNyKRI0X0Dll9rjfrqibpIpxr5jHAlG6OoWLqjj1MoAjoFteWmRx4DDQpZvIXoL aX2AJeyvg+rL6SF/V/IDoibvLBZcTgTgItBYglYklo/5RZriCZBRk92Ip6ISIEZwxU /zSi9D47tP6IPhOzj1ABsZiWs4sp4KiFn28rLt0rfqOM9T3fB6WL8IiWT1dRf+zxJO Ce8j2ROIgVtDw== From: Jeff Layton Date: Wed, 29 Jan 2025 08:39:57 -0500 Subject: [PATCH v2 4/7] nfsd: overhaul CB_SEQUENCE error handling Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250129-nfsd-6-14-v2-4-2700c92f3e44@kernel.org> References: <20250129-nfsd-6-14-v2-0-2700c92f3e44@kernel.org> In-Reply-To: <20250129-nfsd-6-14-v2-0-2700c92f3e44@kernel.org> To: Chuck Lever , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey , "J. Bruce Fields" , Kinglong Mee , Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=4840; i=jlayton@kernel.org; h=from:subject:message-id; bh=NbJxpKAEYs/HAN2fTpo7awYOhoPSyHXKWtM0VOeNu7s=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBnmi+65z4cMIUssnxsrHOSeprTUHhdasqrzUY+n Ad4S0Go8OGJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZ5ovugAKCRAADmhBGVaC FQh7EACXW0Lwz4npWxwgr/66sDBkBQ+zaYBh9tLi1aLJG4MSzpIwX/pI41mcMWqLyhnDVZrDlKh Y40akHp0xL5AwTHtD4kN/pgWwPVB3/bxJaYEdOc5yqnxcukXY+blyQtXqlSSnRx5tVucq25N2jJ dKyFarP1jQKBgWocViqPQ5TlaBnUdr7pZHsfuNB/UKzEdL5MuDAHDjdOirV4UPbixZTLvKA/YNG zECz0Qvv+6Leg/EdSZciqlAEsgAuZCukLY1i8lIij+REPC2t+Dv/J7BcbS6eFa0+3lcMn4xnMIE lFCbbIXcw8lPBVk0881EVL8nTILixpnpo5WKKW6XIfQNF7+5s58TZ0S2Hhq96Ybtd8oJ+GVDgbE J9oxmSBQr1v8LCNRuV8vntzLWrOnA1K5wcNRebIpOi1+bAwdeqzfzn4P1TfMLaE+WVjs24F8dh5 7w0NP4Btc1BdzrrQwTfKInF38MFCuuShDdPBp25tC7kbpK3Xi+OUhuf8cn786PQW+2nqZJtGzZ1 gJYdEcawliJSccJAlU+H+bFLsDJyhl2eP/lf0yhE1iXShvNISjGKjBY7LLiFDKjZceO52HdbfPQ xgfqVWLXNJvRMMt8T56jp/8cNyIMUo1vXXt3MwUWUiYMHvb5B94BnlFnYapnGwTmX4b1D+B2wpC KrMdwhJexQO+uzw== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Some of the existing error handling in nfsd4_cb_sequence_done is incorrect. This patch first does some general cleanup and then reworks some of the error handling cases. There is only one case where we want to proceed with processing the rest of the CB_COMPOUND, and that's when the cb_seq_status is 0. Make the default return value be false, and only set it to true in that case. Now that there is a clear record of the session used to handle the CB_COMPOUND, we can take changes to the cl_cb_session into account when reattempting a call. When restarting the RPC (but not the entire callback), test RPC_SIGNALLED(). If it has been, then fall through to restart the callback instead of just restarting the RPC. Whenever restarting the entire callback, release the slot and session, in case there have been changes in the interim. Signed-off-by: Jeff Layton --- fs/nfsd/nfs4callback.c | 78 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 58 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 9f4838a6d9c668cdf66a77793f63c064586f2b22..e70a7a03933b1f8a48d31b0ef226c3f157d14ed1 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1381,11 +1381,16 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) rpc_call_start(task); } +static bool cb_session_changed(struct nfsd4_callback *cb) +{ + return cb->cb_ses != rcu_access_pointer(cb->cb_clp->cl_cb_session); +} + static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; struct nfsd4_session *session = cb->cb_ses; - bool ret = true; + bool ret = false; if (!clp->cl_minorversion) { /* @@ -1418,11 +1423,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback * (sequence ID, cached reply) MUST NOT change. */ ++session->se_cb_seq_nr[cb->cb_held_slot]; + ret = true; break; case -ESERVERFAULT: + /* + * Call succeeded, but CB_SEQUENCE reply failed sanity checks. + * The client has gone insane. Mark the BC faulty, since there + * isn't much else we can do. + */ ++session->se_cb_seq_nr[cb->cb_held_slot]; nfsd4_mark_cb_fault(cb->cb_clp); - ret = false; break; case 1: /* @@ -1433,39 +1443,67 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback */ fallthrough; case -NFS4ERR_BADSESSION: - nfsd4_mark_cb_fault(cb->cb_clp); - ret = false; + /* + * If the session hasn't changed, mark it faulty. Restart + * the call. + */ + if (!cb_session_changed(cb)) + nfsd4_mark_cb_fault(cb->cb_clp); goto need_restart; case -NFS4ERR_DELAY: + /* + * If the rpc_clnt is being torn down, then we must restart + * the call from scratch. + */ + if (RPC_SIGNALLED(task)) + goto need_restart; cb->cb_seq_status = 1; - if (!rpc_restart_call(task)) - goto out; - - rpc_delay(task, 2 * HZ); + if (rpc_restart_call(task)) + rpc_delay(task, 2 * HZ); return false; - case -NFS4ERR_BADSLOT: - goto retry_nowait; case -NFS4ERR_SEQ_MISORDERED: - if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) { - session->se_cb_seq_nr[cb->cb_held_slot] = 1; + /* + * Reattempt once with seq_nr 1. If that fails, treat this + * like BADSLOT. + */ + if (!cb_session_changed(cb)) { + if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) { + session->se_cb_seq_nr[cb->cb_held_slot] = 1; + goto retry_nowait; + } + } + fallthrough; + case -NFS4ERR_BADSLOT: + /* + * BADSLOT means that the client and server are out of sync + * on the BC parameters. In this case, we want to mark the + * backchannel faulty and then try it again, but _leak_ the + * slot so no one uses it. If the callback session has + * changed since then though, don't mark it. + */ + if (!cb_session_changed(cb)) { + nfsd4_mark_cb_fault(cb->cb_clp); + cb->cb_held_slot = -1; goto retry_nowait; } - break; + goto need_restart; default: nfsd4_mark_cb_fault(cb->cb_clp); } trace_nfsd_cb_free_slot(task, cb); nfsd41_cb_release_slot(cb); - - if (RPC_SIGNALLED(task)) - goto need_restart; -out: return ret; retry_nowait: - if (rpc_restart_call_prepare(task)) - ret = false; - goto out; + /* + * RPC_SIGNALLED() means that the rpc_client is being torn down and + * (possibly) recreated. Restart the call completely in that case. + */ + if (!RPC_SIGNALLED(task)) { + rpc_restart_call_prepare(task); + return false; + } need_restart: + nfsd41_cb_release_slot(cb); if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { trace_nfsd_cb_restart(clp, cb); task->tk_status = 0;