diff mbox series

[23/42] lustre: ptlrpc: don't panic during reconnection

Message ID 1674514855-15399-24-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: sync to OpenSFS tree as of Jan 22 2023 | expand

Commit Message

James Simmons Jan. 23, 2023, 11 p.m. UTC
From: Alexander Boyko <alexander.boyko@hpe.com>

ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

HPE-bug-id: LUS-10985
WC-bug-id: https://jira.whamcloud.com/browse/LU-16297
Lustre-commit: df31c4c0b39b88459 ("LU-16297 ptlrpc: don't panic during reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/niobuf.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 670bfb0de02f..09f68157b883 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -579,13 +579,20 @@  int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 
 	/**
 	 * For enabled AT all request should have AT_SUPPORT in the
-	 * FULL import state when OBD_CONNECT_AT is set
+	 * FULL import state when OBD_CONNECT_AT is set.
+	 * This check has a race with ptlrpc_connect_import_locked()
+	 * with low chance, don't panic, only report.
 	 */
-	LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
-		(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
-		!(imp->imp_connect_data.ocd_connect_flags &
-		OBD_CONNECT_AT));
-
+	if (!(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
+	    (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
+	    !(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT))) {
+		DEBUG_REQ(D_HA, request,
+			  "Wrong state of import detected, AT=%d, imp=%d, msghdr=%d, conn=%d\n",
+			  AT_OFF, imp->imp_state != LUSTRE_IMP_FULL,
+			  (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT),
+			  !(imp->imp_connect_data.ocd_connect_flags &
+			    OBD_CONNECT_AT));
+	}
 	if (request->rq_resend)
 		lustre_msg_add_flags(request->rq_reqmsg, MSG_RESENT);