From patchwork Thu Jul 27 14:03:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 13330147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CAADC04A6A for ; Thu, 27 Jul 2023 14:10:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231215AbjG0OKl (ORCPT ); Thu, 27 Jul 2023 10:10:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232815AbjG0OKk (ORCPT ); Thu, 27 Jul 2023 10:10:40 -0400 Received: from out-86.mta0.migadu.com (out-86.mta0.migadu.com [IPv6:2001:41d0:1004:224b::56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AFAD30C7 for ; Thu, 27 Jul 2023 07:10:38 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690466644; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JOPZ39/5EC5YSntQUhlwxq5R8ADpit4B5fPorVnB/Tw=; b=W6WBsKZcis0XhG9u3Hw4q2PfmXWRj2c/4/tEySAh/JljwANbx8MJnEqFRNBd7ZtnoZQyl2 YEp4mjpfvDhf5Boo6+PGitd+D/VVUvjkgJUREzPAFMkqyQPtX/MJ7KN5K9Gda/qio1fSVm KAPXOOvuJMtDynxg2CjndBpXyPkqPcE= From: Guoqing Jiang To: bmt@zurich.ibm.com, jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org Subject: [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed Date: Thu, 27 Jul 2023 22:03:45 +0800 Message-Id: <20230727140349.25369-2-guoqing.jiang@linux.dev> In-Reply-To: <20230727140349.25369-1-guoqing.jiang@linux.dev> References: <20230727140349.25369-1-guoqing.jiang@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In case siw module can't be inserted successfully, after that remove the module from kernel, then both siw_cm_exit and the failure path in siw_init_module call siw_cm_exit, which cause below issue. [ 73.561312] BUG: unable to handle page fault for address: 000000040000004c [ 73.561317] #PF: supervisor read access in kernel mode [ 73.561319] #PF: error_code(0x0000) - not-present page [ 73.561320] PGD 0 P4D 0 [ 73.561322] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 73.561324] CPU: 1 PID: 1693 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16 [ 73.561326] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 73.561327] RIP: 0010:device_del+0x22/0x3d0 ... [ 73.561347] Call Trace: [ 73.561348] [ 73.561350] ? show_regs+0x72/0x90 [ 73.561353] ? __die+0x25/0x80 [ 73.561355] ? page_fault_oops+0x154/0x4d0 [ 73.561357] ? lockdep_unlock+0x63/0xe0 [ 73.561361] ? do_user_addr_fault+0x381/0x8d0 [ 73.561363] ? rcu_is_watching+0x13/0x70 [ 73.561365] ? exc_page_fault+0x87/0x240 [ 73.561369] ? asm_exc_page_fault+0x27/0x30 [ 73.561373] ? device_del+0x22/0x3d0 [ 73.561374] ? __this_cpu_preempt_check+0x13/0x20 [ 73.561377] device_unregister+0x18/0x70 [ 73.561378] destroy_workqueue+0x33/0x2d0 [ 73.561381] siw_cm_exit+0x1a/0x30 [siw] [ 73.561387] siw_exit_module+0x96/0x5a0 [siw] So we need to set the workqueue to NULL after it is destroyed. Signed-off-by: Guoqing Jiang --- drivers/infiniband/sw/siw/siw_cm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c index da530c0404da..758ac8a22f7a 100644 --- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -1958,6 +1958,8 @@ int siw_cm_init(void) void siw_cm_exit(void) { - if (siw_cm_wq) + if (siw_cm_wq) { destroy_workqueue(siw_cm_wq); + siw_cm_wq = NULL; + } } From patchwork Thu Jul 27 14:03:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 13330146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FFFDC001E0 for ; Thu, 27 Jul 2023 14:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232772AbjG0OKk (ORCPT ); Thu, 27 Jul 2023 10:10:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231842AbjG0OKj (ORCPT ); Thu, 27 Jul 2023 10:10:39 -0400 X-Greylist: delayed 390 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 27 Jul 2023 07:10:37 PDT Received: from out-101.mta0.migadu.com (out-101.mta0.migadu.com [91.218.175.101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA1D31BD6 for ; Thu, 27 Jul 2023 07:10:37 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690466645; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gAA4pfJA29nzPxcpA9hPAR83WDofKRCLlu29O9SRpkk=; b=PRpVeEVdYXY1BEQWr+zckLp3mxHqIR8rME65914weXKnFXVFVir+JU85szAh5fGABOp5sk aDPH91vdb8bkhbOuMlskE9WptqIer6Cs3EuIP1OUG8dO0W50tXD3Pc73OQztfTYHePiJmx PGBypUxWd0t9ohUecwYWVZpjWfQc8TI= From: Guoqing Jiang To: bmt@zurich.ibm.com, jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org Subject: [PATCH 2/5] RDMA/siw: Ensure siw_destroy_cpulist can be called more than once Date: Thu, 27 Jul 2023 22:03:46 +0800 Message-Id: <20230727140349.25369-3-guoqing.jiang@linux.dev> In-Reply-To: <20230727140349.25369-1-guoqing.jiang@linux.dev> References: <20230727140349.25369-1-guoqing.jiang@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In case siw module can't be inserted successfully, then if remove the module from kernel, then both siw_cm_exit and the failure path in siw_init_module call siw_destroy_cpulist. Let's set tx_valid_cpus and num_nodes to prevent double free issues. [ 32.197293] general protection fault, probably for non-canonical address 0xb4965e5a58a488: 0000 [#1] PREEMPT SMP NOPTI [ 32.197300] CPU: 0 PID: 1676 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16 [ 32.197304] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 32.197306] RIP: 0010:kfree+0x62/0x150 ... [ 32.197339] Call Trace: [ 32.197341] [ 32.197343] ? show_regs+0x72/0x90 [ 32.197348] ? die_addr+0x38/0xb0 [ 32.197351] ? exc_general_protection+0x1bf/0x4a0 [ 32.197357] ? asm_exc_general_protection+0x27/0x30 [ 32.197362] ? kfree+0x62/0x150 [ 32.197366] siw_exit_module+0xb8/0x590 [siw] [ 32.197376] __do_sys_delete_module.constprop.0+0x18f/0x300 So let's set tx_valid_cpus and num_nodes to prevent the issue. Signed-off-by: Guoqing Jiang --- drivers/infiniband/sw/siw/siw_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c index 65b5cda5457b..b3547253c099 100644 --- a/drivers/infiniband/sw/siw/siw_main.c +++ b/drivers/infiniband/sw/siw/siw_main.c @@ -178,6 +178,8 @@ static void siw_destroy_cpulist(void) kfree(siw_cpu_info.tx_valid_cpus[i++]); kfree(siw_cpu_info.tx_valid_cpus); + siw_cpu_info.tx_valid_cpus = NULL; + siw_cpu_info.num_nodes = 0; } /* From patchwork Thu Jul 27 14:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 13330148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CBE7C0015E for ; Thu, 27 Jul 2023 14:10:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231842AbjG0OKk (ORCPT ); Thu, 27 Jul 2023 10:10:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232774AbjG0OKj (ORCPT ); Thu, 27 Jul 2023 10:10:39 -0400 X-Greylist: delayed 393 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 27 Jul 2023 07:10:38 PDT Received: from out-101.mta0.migadu.com (out-101.mta0.migadu.com [IPv6:2001:41d0:1004:224b::65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AECA30C0 for ; Thu, 27 Jul 2023 07:10:38 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690466647; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1aYJKaIIBdZyN0JBze4ryf7XRpEcse3z5//WfNqFzEA=; b=skMkIeTeSx8lUz7q/t0JOhXZTy+c/WkxyZglyiWkhCSZz0tw/vealc+O4HiqEqyfveRhjf 5eKTkQZb1nCXdNi7kSQjFhgeh59GpvpmVn3VhhjDnEdDNfl+0hjYbO8jhZkEAlIZ2o70q7 DmoALl+wijeP5r3orD8qwzFgzpD8uW8= From: Guoqing Jiang To: bmt@zurich.ibm.com, jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org Subject: [PATCH 3/5] RDMA/siw: Initialize siw_link_ops.list Date: Thu, 27 Jul 2023 22:03:47 +0800 Message-Id: <20230727140349.25369-4-guoqing.jiang@linux.dev> In-Reply-To: <20230727140349.25369-1-guoqing.jiang@linux.dev> References: <20230727140349.25369-1-guoqing.jiang@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In case siw module can't be inserted successfully, then if remove the module from kernel, then siw_cm_exit can trigger below trace because siw_link_ops.list is still NULL since rdma_link_register is not called. So we need to init the list earlier. [ 45.306864] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 45.306869] #PF: supervisor write access in kernel mode [ 45.306871] #PF: error_code(0x0002) - not-present page [ 45.306872] PGD 0 P4D 0 [ 45.306874] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 45.306876] CPU: 1 PID: 1742 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16 [ 45.306879] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 45.306880] RIP: 0010:rdma_link_unregister+0x27/0x60 [ib_core] ... [ 45.306916] Call Trace: [ 45.306917] [ 45.306918] ? show_regs+0x72/0x90 [ 45.306922] ? __die+0x25/0x80 [ 45.306924] ? page_fault_oops+0x154/0x4d0 [ 45.306927] ? __this_cpu_preempt_check+0x13/0x20 [ 45.306929] ? lockdep_unlock+0x63/0xe0 [ 45.306933] ? do_user_addr_fault+0x381/0x8d0 [ 45.306934] ? rcu_is_watching+0x13/0x70 [ 45.306937] ? exc_page_fault+0x87/0x240 [ 45.306940] ? asm_exc_page_fault+0x27/0x30 [ 45.306944] ? rdma_link_unregister+0x27/0x60 [ib_core] [ 45.306956] ? rdma_link_unregister+0x19/0x60 [ib_core] [ 45.306967] siw_exit_module+0x87/0x590 [siw] [ 45.306973] __do_sys_delete_module.constprop.0+0x18f/0x300 [ 45.306975] ? syscall_enter_from_user_mode+0x21/0x70 [ 45.306977] ? __this_cpu_preempt_check+0x13/0x20 [ 45.306978] ? lockdep_hardirqs_on+0x86/0x120 [ 45.306980] __x64_sys_delete_module+0x12/0x20 [ 45.306982] do_syscall_64+0x5c/0x90 Signed-off-by: Guoqing Jiang --- drivers/infiniband/sw/siw/siw_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c index b3547253c099..6709ed0de3a4 100644 --- a/drivers/infiniband/sw/siw/siw_main.c +++ b/drivers/infiniband/sw/siw/siw_main.c @@ -526,6 +526,7 @@ static int siw_newlink(const char *basedev_name, struct net_device *netdev) } static struct rdma_link_ops siw_link_ops = { + .list = LIST_HEAD_INIT(siw_link_ops.list), .type = "siw", .newlink = siw_newlink, }; From patchwork Thu Jul 27 14:03:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 13330150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51F5BC04A6A for ; Thu, 27 Jul 2023 14:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233235AbjG0OPT (ORCPT ); Thu, 27 Jul 2023 10:15:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233033AbjG0OPS (ORCPT ); Thu, 27 Jul 2023 10:15:18 -0400 Received: from out-101.mta0.migadu.com (out-101.mta0.migadu.com [IPv6:2001:41d0:1004:224b::65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C704C2685 for ; Thu, 27 Jul 2023 07:15:17 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690466649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hNyt4TpfRCu2QkmN2Quw5XUjlAzJWTllAAJ+BYuaXKM=; b=J8RPxjMx6vNlCIxfFPuKgyHmsk1SkcIxZ682lU7Z8O9aDYPTiRJ42S5R4kf5M55D5YG2gh I58p9L4ghCBPW2LOPPuDHP7nhyHHKzpNOlzgPsOuhbRuBEbzhignkV67GgUZxX2Dahhuqy nxFRpwjHCKLfcHicNahj1ApJStWN874= From: Guoqing Jiang To: bmt@zurich.ibm.com, jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org Subject: [PATCH 4/5] RDMA/siw: Set siw_crypto_shash to NULL after it is freed Date: Thu, 27 Jul 2023 22:03:48 +0800 Message-Id: <20230727140349.25369-5-guoqing.jiang@linux.dev> In-Reply-To: <20230727140349.25369-1-guoqing.jiang@linux.dev> References: <20230727140349.25369-1-guoqing.jiang@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In case siw module can't be inserted successfully, then remove the module from kernel, which means both siw_cm_exit and the failure path in siw_init_module call crypto_free_shash. We can see below call trace appears. [ 72.349344] ------------[ cut here ]------------ [ 72.349348] refcount_t: underflow; use-after-free. [ 72.349386] WARNING: CPU: 1 PID: 1737 at lib/refcount.c:28 refcount_warn_saturate+0xfb/0x150 ... [ 72.349469] RIP: 0010:refcount_warn_saturate+0xfb/0x150 ... [ 72.349487] Call Trace: [ 72.349488] [ 72.349490] ? show_regs+0x72/0x90 [ 72.349493] ? refcount_warn_saturate+0xfb/0x150 [ 72.349495] ? __warn+0x8d/0x1a0 [ 72.349498] ? refcount_warn_saturate+0xfb/0x150 [ 72.349500] ? report_bug+0x1f9/0x250 [ 72.349505] ? handle_bug+0x46/0x90 [ 72.349508] ? exc_invalid_op+0x19/0x80 [ 72.349511] ? asm_exc_invalid_op+0x1b/0x20 [ 72.349517] ? refcount_warn_saturate+0xfb/0x150 [ 72.349519] ? refcount_warn_saturate+0xfb/0x150 [ 72.349521] crypto_destroy_tfm+0x9b/0xe0 [ 72.349525] siw_exit_module+0xf6/0x590 [siw] So we need to set siw_crypto_shash to null in the failure path of siw_init_module. Signed-off-by: Guoqing Jiang --- drivers/infiniband/sw/siw/siw_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c index 6709ed0de3a4..f8549d01887f 100644 --- a/drivers/infiniband/sw/siw/siw_main.c +++ b/drivers/infiniband/sw/siw/siw_main.c @@ -589,8 +589,10 @@ static __init int siw_init_module(void) siw_tx_thread[nr_cpu] = NULL; } } - if (siw_crypto_shash) + if (siw_crypto_shash) { crypto_free_shash(siw_crypto_shash); + siw_crypto_shash = NULL; + } pr_info("SoftIWARP attach failed. Error: %d\n", rv); From patchwork Thu Jul 27 14:03:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guoqing Jiang X-Patchwork-Id: 13330151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAF33C001E0 for ; Thu, 27 Jul 2023 14:15:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232402AbjG0OPT (ORCPT ); Thu, 27 Jul 2023 10:15:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233234AbjG0OPS (ORCPT ); Thu, 27 Jul 2023 10:15:18 -0400 Received: from out-100.mta0.migadu.com (out-100.mta0.migadu.com [91.218.175.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD2E42D5E for ; Thu, 27 Jul 2023 07:15:17 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690466650; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bRgBlOvgMtJkKPvZ+rZA/rDDK35v+k0caMBtftQig+8=; b=obcU1LVOyFROKIouJPPv/Hfx/g7xRbWF4X3W681fBfOPXvccqvl0AKD8EIPdFIEuHKBlAp dK+hVU1Igrb0Naoe+kTo7khTMI6kJPfbvcmWYYjZzEJQsM9BZQCBk2BIRN+Y+rg9U/3iOB Zy8RTjcVSmRouLzTurdN8EJ17BcixyM= From: Guoqing Jiang To: bmt@zurich.ibm.com, jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org Subject: [PATCH 5/5] RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread Date: Thu, 27 Jul 2023 22:03:49 +0800 Message-Id: <20230727140349.25369-6-guoqing.jiang@linux.dev> In-Reply-To: <20230727140349.25369-1-guoqing.jiang@linux.dev> References: <20230727140349.25369-1-guoqing.jiang@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In case siw module can't be inserted successfully, and if the kthread (siw_run_sq) is not run which means wait_queue_head (tx_task->waiting) is not initialized. Then siw_stop_tx_thread is called from siw_init_module, so below trace appeared. kernel: BUG: spinlock bad magic on CPU#0, modprobe/2073 kernel: lock: 0xffff88babbd380e8, .magic: 00000000, .owner: /-1, .owner_cpu: 0 kernel: CPU: 0 PID: 2073 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 kernel: Call Trace: kernel: kernel: dump_stack_lvl+0x77/0xd0 kernel: dump_stack+0x10/0x20 kernel: spin_bug+0xa5/0xd0 kernel: do_raw_spin_lock+0x90/0xd0 kernel: _raw_spin_lock_irqsave+0x56/0x80 kernel: ? __wake_up_common_lock+0x63/0xd0 kernel: __wake_up_common_lock+0x63/0xd0 kernel: __wake_up+0x13/0x30 kernel: siw_stop_tx_thread+0x49/0x70 [siw] kernel: siw_init_module+0x15b/0xff0 [siw] kernel: ? __pfx_siw_init_module+0x10/0x10 [siw] kernel: do_one_initcall+0x60/0x390 ... kernel: kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 To prevent the issue, add 'running' to tx_task_t, which is set to after siw_run_sq is triggered. Then only wake up waitqueue after it is true. Signed-off-by: Guoqing Jiang --- drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 7c7a51d36d0c..70acc4cd553f 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -1204,14 +1204,18 @@ static void siw_sq_resume(struct siw_qp *qp) struct tx_task_t { struct llist_head active; wait_queue_head_t waiting; + bool running; }; static DEFINE_PER_CPU(struct tx_task_t, siw_tx_task_g); void siw_stop_tx_thread(int nr_cpu) { + struct tx_task_t *tx_task = &per_cpu(siw_tx_task_g, nr_cpu); + kthread_stop(siw_tx_thread[nr_cpu]); - wake_up(&per_cpu(siw_tx_task_g, nr_cpu).waiting); + if (tx_task->running) + wake_up(&per_cpu(siw_tx_task_g, nr_cpu).waiting); } int siw_run_sq(void *data) @@ -1223,6 +1227,7 @@ int siw_run_sq(void *data) init_llist_head(&tx_task->active); init_waitqueue_head(&tx_task->waiting); + tx_task->running = true; while (1) { struct llist_node *fifo_list = NULL;