diff mbox series

[23/49] lustre: ptlrpc: fix ASSERTION on scp_rqbd_posted

Message ID 1618459361-17909-24-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: sync to OpenSFS as of March 30 2021 | expand

Commit Message

James Simmons April 15, 2021, 4:02 a.m. UTC
From: Yang Sheng <ys@whamcloud.com>

The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
    ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
  [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
  [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
  [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
  [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
  [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
  [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
  [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
  [<a08f8030>] class_decref+0x80/0x160 [obdclass]
  [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
  [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
  [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
  [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
  [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
  [<8121068a>] generic_shutdown_super+0x6a/0xf0
  [<81210a62>] kill_anon_super+0x12/0x20
  [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
  [<81210e59>] deactivate_locked_super+0x49/0x60
  [<812115a6>] deactivate_super+0x46/0x60
  [<8123019f>] cleanup_mnt+0x3f/0x80
  [<81230232>] __cleanup_mnt+0x12/0x20
  [<810ab085>] task_work_run+0xb5/0xf0
  [<8102ac12>] do_notify_resume+0x92/0xb0
  [<81783c83>] int_signal+0x12/0x17
   Kernel panic - not syncing: LBUG

WC-bug-id: https://jira.whamcloud.com/browse/LU-11289
Lustre-commit: b635a0435d13d843 ("LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41936
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/service.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index f3f94d4..427215c 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -2922,7 +2922,23 @@  static void ptlrpc_wait_replies(struct ptlrpc_service_part *svcpt)
 			ptlrpc_server_finish_active_request(svcpt, req);
 		}
 
-		LASSERT(list_empty(&svcpt->scp_rqbd_posted));
+		/*
+		 * The portal may be shared by several services (eg:OUT_PORTAL).
+		 * So the request could be referenced by other target. So we
+		 * have to wait the ptlrpc_server_drop_request invoked.
+		 *
+		 * TODO: move the req_buffer as global rather than per service.
+		 */
+		spin_lock(&svcpt->scp_lock);
+		while (!list_empty(&svcpt->scp_rqbd_posted)) {
+			spin_unlock(&svcpt->scp_lock);
+			wait_event_idle_timeout(svcpt->scp_waitq,
+				list_empty(&svcpt->scp_rqbd_posted),
+				HZ);
+			spin_lock(&svcpt->scp_lock);
+		}
+		spin_unlock(&svcpt->scp_lock);
+
 		LASSERT(svcpt->scp_nreqs_incoming == 0);
 		LASSERT(svcpt->scp_nreqs_active == 0);
 		/*