diff mbox series

[10/50] lustre: fld: repeat rpc in fld_client_rpc after EAGAIN

Message ID 1647783064-20688-11-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: update to OpenSFS tree as of March 20, 2022 | expand

Commit Message

James Simmons March 20, 2022, 1:30 p.m. UTC
From: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>

Timeout-ed rpc sent by fld_client_rpc() may lead to client operation
failure.

Have fld_client_rpc() to repeat rpc in case of EAGAIN after a while.

Typo in fld_client_rpc() in failure simulation is fixed.

HPE-bug-id: LUS-8652
WC-bug-id: https://jira.whamcloud.com/browse/LU-13468
Lustre-commit: b1acf734f31c13d29 ("LU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38302
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/fld/fld_request.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index 7260a14..4180bcf 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -39,7 +39,8 @@ 
 #define DEBUG_SUBSYSTEM S_FLD
 
 #include <linux/module.h>
-#include <asm/div64.h>
+#include <linux/math64.h>
+#include <linux/delay.h>
 
 #include <obd.h>
 #include <obd_class.h>
@@ -314,6 +315,7 @@  int fld_client_rpc(struct obd_export *exp,
 	LASSERT(exp);
 
 	imp = class_exp2cliimp(exp);
+again:
 	switch (fld_op) {
 	case FLD_QUERY:
 		req = ptlrpc_request_alloc_pack(imp, &RQF_FLD_QUERY,
@@ -361,7 +363,7 @@  int fld_client_rpc(struct obd_export *exp,
 	req->rq_reply_portal = MDC_REPLY_PORTAL;
 	ptlrpc_at_set_req_timeout(req);
 
-	if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ && req->rq_no_delay)) {
+	if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ) && req->rq_no_delay) {
 		/* the same error returned by ptlrpc_import_delay_req */
 		rc = -EAGAIN;
 		req->rq_status = rc;
@@ -373,12 +375,18 @@  int fld_client_rpc(struct obd_export *exp,
 
 	if (rc != 0) {
 		if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) {
-			/*
-			 * Since LWP is not replayable, so notify the caller
-			 * to retry if needed after a while.
-			 */
+			/* LWP is not replayable, retry after a while. */
 			rc = -EAGAIN;
 		}
+		if (rc == -EAGAIN) {
+			ptlrpc_req_finished(req);
+			if (msleep_interruptible(2 * MSEC_PER_SEC)) {
+				rc = -EINTR;
+				goto out_req;
+			}
+			rc = 0;
+			goto again;
+		}
 		goto out_req;
 	}