From patchwork Tue Apr 11 15:11:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Niklas Schnelle X-Patchwork-Id: 13207739 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57408C76196 for ; Tue, 11 Apr 2023 15:12:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230412AbjDKPMM (ORCPT ); Tue, 11 Apr 2023 11:12:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230363AbjDKPLw (ORCPT ); Tue, 11 Apr 2023 11:11:52 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B9E55BB0 for ; Tue, 11 Apr 2023 08:11:35 -0700 (PDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33BEifkQ018691; Tue, 11 Apr 2023 15:11:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : subject : from : to : cc : date : content-type : mime-version; s=pp1; bh=MZIuF1ZdwWLML83NcTlwp7ajykIeJakmPFKTVxj72Pc=; b=HafzvLAhO5KHgLBad2/lys6B4u6kN8wO8L/OyAbTRV4/c0QogdQL8Wup5ugxXefzyhPm SwCqMAZ4edTwki/bBia79uK03XKZ/Yb7IAwvNwG108TJYNz3p2NzoexS2V5c0RZLoeou Pfta38V/KyvniHJl2ZIqTTC3U/VdzWoiVYs8U2G9suvyleEr51mYFYxaOP++UVGknxBD 6WdUBqXE4JcM7dD0s9f70PVRVQntvTKwVLlKx20DB0MbHaLJI0JYuI8LJYGqGSkF77dy +QW/pxDKco1Oc5cObvji0moCYpMemGBXix+GiyDRldaxbTSXPJcdYY1uFQc16j95HNrE 2w== Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pw9juh8yd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 15:11:19 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33AGIfXI032371; Tue, 11 Apr 2023 15:11:17 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3pu0m19t1k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 15:11:17 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33BFBCfP31130106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Apr 2023 15:11:12 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A6F572004D; Tue, 11 Apr 2023 15:11:12 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 20FB72004B; Tue, 11 Apr 2023 15:11:12 +0000 (GMT) Received: from [9.171.53.122] (unknown [9.171.53.122]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 11 Apr 2023 15:11:12 +0000 (GMT) Message-ID: <90e1efad457f40c1f9f7b8cb56852072d8ea00fd.camel@linux.ibm.com> Subject: Kernel crash after FLR reset of a ConnectX-5 PF in switchdev mode From: Niklas Schnelle To: Saeed Mahameed , Leon Romanovsky Cc: Gerd Bayer , "alexander.sschmidt" , Alexandra Winter , netdev@vger.kernel.org Date: Tue, 11 Apr 2023 17:11:11 +0200 User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: W2poOy14EOV2-gv1StO9HX1DzIMyx5cj X-Proofpoint-GUID: W2poOy14EOV2-gv1StO9HX1DzIMyx5cj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-11_10,2023-04-11_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 impostorscore=0 adultscore=0 spamscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304110133 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Saeed, Hi Leon, While testing PCI recovery with a ConnectX-5 card (MT28800, fw 16.35.1012) and vanilla 6.3-rc4/5/6 on s390 I've run into a kernel crash (stacktrace attached) when the card is in switchdev mode. No crash occurs and the recovery succeeds in legacy mode (with VFs). I found that the same crash occurs also with a simple Function Level Reset instead of the s390 specific PCI recovery, see instructions below. From the looks of it I think this could affect non-s390 too but I don't have a proper x86 test system with a ConnectX card to test with. Anyway, I tried to analyze further but got stuck after figuring out that in mlx5e_remove() deep down from mlx5_fw_fatal_reporter_err_work() (see trace) the mlx5e_dev->priv pointer is valid but the pointed to struct only contains zeros as it was previously zeroed by mlx5_mdev_uninit() which then leads to a NULL pointer access. The crash itself can be prevented by the following debug patch though clearly this is not a proper fix: unregister_netdev(priv->netdev); With that I then tried to track down why mlx5_mdev_uninit() is called and this might actually be s390 specific in that this happens during the removal of the VF which on s390 causes extra hot unplug events for the VFs (our virtualized PCI hotplug is per-PCI function) resulting in the following call trace: ... zpci_bus_remove_device() zpci_iov_remove_virtfn() pci_iov_remove_virtfn() pci_stop_and_remove_bus_device() pci_stop_bus_device() device_release_driver_internal() pci_device_remove() remove_one() mlx5_mdev_uninit() Then again I would expect that on other architectures VFs become at leastunresponsive during a FLR of the PF not sure if that also lead to calls to remove_one() though. As another data point I tried the same on the default Ubuntu 22.04 generic 5.15 kernel and there no crash occurs so this might be a newer issue. Also, I did test with and without the patch I sent recently for skipping the wait in mlx5_health_wait_pci_up() but that made no difference. Any hints on how to debug this further and could you try to see if this occurs on other architectures as well? Thanks, Niklas Reproduced with (0004:00:00.0 being the first PF of a ConnectX-5): $ devlink dev eswitch set pci/0004:00:00.0 mode switchdev $ devlink dev eswitch set pci/0004:00:00.1 mode switchdev The next 2 lines are needed on s390 due to an unrelated issue with smsf mode. $ devlink dev param set pci/0004:00:00.0 name flow_steering_mode value dmfs cmode runtime $ devlink dev param set pci/0004:00:00.1 name flow_steering_mode value dmfs cmode runtime $ echo 1 > /sys/bus/pci/devices/0004:00:00.0/sriov_numvfs # Check the reset method is FLR though others might also cause this $ cat /sys/bus/pci/devices/0004:00:00.0/reset_method flr Then to trigger the crash (after a bit of recovery, it may be racy though it hits pretty consistently for me) $ echo 1 > /sys/bus/pci/devices/0004:00:00.0/reset [ 1346.987414] mlx5_core 0003:00:00.0 ens8704f0np0: Link up [ 1347.006845] IPv6: ADDRCONF(NETDEV_CHANGE): ens8704f0np0: link becomes ready [ 1356.518669] mlx5_core 0004:00:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0) [ 1359.192049] mlx5_core 0004:00:00.0 ens8832f0np0: Dropping C-tag vlan stripping offload due to S-tag vlan [ 1359.192152] mlx5_core 0004:00:00.0 ens8832f0np0: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode [ 1359.365370] mlx5_core 0004:00:00.0: E-Switch: Enable: mode(OFFLOADS), nvfs(0), active vports(1) [ 1359.365952] devlink (1586) used greatest stack depth: 7552 bytes left [ 1359.448638] mlx5_core 0004:00:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0) [ 1361.256671] debugfs: Directory 'mlx5_dr_chunks' with parent 'slab' already present! [ 1361.257295] debugfs: Directory 'mlx5_dr_htbls' with parent 'slab' already present! [ 1362.063433] mlx5_core 0004:00:00.1 ens8896f1np1: Dropping C-tag vlan stripping offload due to S-tag vlan [ 1362.063560] mlx5_core 0004:00:00.1 ens8896f1np1: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode [ 1362.240899] mlx5_core 0004:00:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(0), active vports(1) [ 1362.241527] devlink (1595) used greatest stack depth: 7520 bytes left [ 1362.327301] mlx5_core 0004:00:00.0 ens8832f0np0: Link up [ 1362.345353] IPv6: ADDRCONF(NETDEV_CHANGE): ens8832f0np0: link becomes ready [ 1362.346169] ip (1603) used greatest stack depth: 7368 bytes left [ 1362.437044] pci 0004:00:00.2: [15b3:101a] type 00 class 0x020000 [ 1362.437277] pci 0004:00:00.2: reg 0x10: [mem 0xffffc00002000000-0xffffc000021fffff 64bit pref] [ 1362.437423] pci 0004:00:00.2: enabling Extended Tags [ 1362.438619] pci 0004:00:00.2: Adding to iommu group 30 [ 1362.440640] mlx5_core 0004:00:00.2: enabling device (0000 -> 0002) [ 1362.440810] mlx5_core 0004:00:00.2: firmware version: 16.35.1012 [ 1362.675795] mlx5_core 0004:00:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 1362.676711] debugfs: Directory 'mlx5_fs_fgs' with parent 'slab' already present! [ 1362.677278] debugfs: Directory 'mlx5_fs_ftes' with parent 'slab' already present! [ 1362.696847] mlx5_core 0004:00:00.2: Assigned random MAC address 02:0e:8a:c7:93:9a [ 1362.830432] mlx5_core 0004:00:00.2: MLX5E: StrdRq(1) RqSz(16) StrdSz(4096) RxCqeCmprss(0 basic) [ 1362.835203] mlx5_core 0004:00:00.2 ens8832f0v0: renamed from eth1 [ 1372.768645] mlx5_core 0004:00:00.2: poll_health:818:(pid 0): Fatal error 1 detected [ 1372.768791] mlx5_core 0004:00:00.2: print_health_info:423:(pid 0): PCI slot is unavailable [ 1373.168544] mlx5_core 0004:00:00.0: poll_health:818:(pid 0): Fatal error 3 detected [ 1375.434387] mlx5_core 0004:00:00.0: mlx5_health_try_recover:335:(pid 1505): handling bad device here [ 1375.434502] mlx5_core 0004:00:00.0: mlx5_handle_bad_state:292:(pid 1505): starting teardown [ 1375.434545] mlx5_core 0004:00:00.0: mlx5_error_sw_reset:243:(pid 1505): start [ 1375.434586] mlx5_core 0004:00:00.0: mlx5_error_sw_reset:276:(pid 1505): end [ 1375.771395] mlx5_core 0004:00:00.0 eth0 (unregistering): vport 1 error -67 reading stats [ 1376.151345] mlx5_core 0004:00:00.0: mlx5e_init_nic_tx:5376:(pid 1505): create tises failed, -67 [ 1376.238808] mlx5_core 0004:00:00.0 ens8832f0np0: mlx5e_netdev_change_profile: new profile init failed, -67 [ 1376.243746] mlx5_core 0004:00:00.0: mlx5e_init_rep_tx:1101:(pid 1505): create tises failed, -67 [ 1376.328623] mlx5_core 0004:00:00.0 ens8832f0np0: mlx5e_netdev_change_profile: failed to rollback to orig profile, -67 [ 1376.329159] ------------[ cut here ]------------ [ 1376.329161] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 1376.329171] WARNING: CPU: 6 PID: 1505 at kernel/locking/mutex.c:582 __mutex_lock+0xbc6/0xfa8 [ 1376.329182] Modules linked in: kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink uvdevice s390_trng sunrpc mlx5_ib ib_uverbs ib_core ism eadm_sch vfio_ccw mdev v fio_iommu_type1 vfio sch_fq_codel loop configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme mlx5_core nvme_core pkey zcrypt rng_core autofs4 [ 1376.329230] CPU: 6 PID: 1505 Comm: kworker/u800:3 Not tainted 6.3.0-rc4 #49 [ 1376.329233] Hardware name: IBM 3931 A01 704 (LPAR) [ 1376.329235] Workqueue: mlx5_health0004:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] [ 1376.329345] Krnl PSW : 0704d00180000000 00000000ee827992 (__mutex_lock+0xbca/0xfa8) [ 1376.329351] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 1376.329354] Krnl GPRS: c0000000fffeffff 0000000080000001 0000000000000028 00000000eec90198 [ 1376.329357] 0000038004af36b8 0000038004af36b0 000003ff7fbcdfd8 0000000000000080 [ 1376.329359] 00000000f03839a8 0000000000000000 0000000000000000 0000000000059fe8 [ 1376.329361] 000000009fa58100 0000000000000000 00000000ee82798e 0000038004af3848 [ 1376.329368] Krnl Code: 00000000ee827982: c020001ef4a1 larl %r2,00000000eec062c4 [ 1376.329368] 00000000ee827988: c0e5ff90c318 brasl %r14,00000000eda3ffb8 [ 1376.329368] #00000000ee82798e: af000000 mc 0,0 [ 1376.329368] >00000000ee827992: a7f4fa4d brc 15,00000000ee826e2c [ 1376.329368] 00000000ee827996: af000000 mc 0,0 [ 1376.329368] 00000000ee82799a: a7f4faf4 brc 15,00000000ee826f82 [ 1376.329368] 00000000ee82799e: e31003400004 lg %r1,832 [ 1376.329368] 00000000ee8279a4: e32010000004 lg %r2,0(%r1) [ 1376.329405] Call Trace: [ 1376.329406] [<00000000ee827992>] __mutex_lock+0xbca/0xfa8 [ 1376.329409] ([<00000000ee82798e>] __mutex_lock+0xbc6/0xfa8) [ 1376.329412] [<00000000ee827da2>] mutex_lock_nested+0x32/0x40 [ 1376.329416] [<000003ff7fbcdfd8>] mlx5_core_uplink_netdev_set+0x38/0x60 [mlx5_core] [ 1376.329469] [<000003ff7fc1100c>] mlx5e_remove+0x3c/0xa0 [mlx5_core] [ 1376.329521] [<00000000ee42fa7c>] device_release_driver_internal+0x1c4/0x270 [ 1376.329526] [<00000000ee42d788>] bus_remove_device+0x100/0x188 [ 1376.329528] [<00000000ee42724e>] device_del+0x186/0x3b8 [ 1376.329533] [<000003ff7fbf6134>] mlx5_detach_device+0xac/0x110 [mlx5_core] [ 1376.329583] [<000003ff7fbcf812>] mlx5_unload_one_devl_locked+0x52/0xc8 [mlx5_core] [ 1376.329635] [<000003ff7fc777e8>] mlx5_health_try_recover+0x168/0x240 [mlx5_core] [ 1376.329687] [<00000000ee72b9d2>] devlink_health_reporter_recover+0x3a/0x98 [ 1376.329690] [<00000000ee72bb3c>] devlink_health_report+0x10c/0x300 [ 1376.329693] [<000003ff7fbdd0ac>] mlx5_fw_fatal_reporter_err_work+0xc4/0x1d8 [mlx5_core] [ 1376.329744] [<00000000eda6b934>] process_one_work+0x30c/0x688 [ 1376.329749] [<00000000eda6bd12>] worker_thread+0x62/0x438 [ 1376.329752] [<00000000eda77400>] kthread+0x138/0x150 [ 1376.329756] [<00000000ed9eb894>] __ret_from_fork+0x3c/0x58 [ 1376.329759] [<00000000ee8309da>] ret_from_fork+0xa/0x40 [ 1376.329762] INFO: lockdep is turned off. [ 1376.329764] Last Breaking-Event-Address: [ 1376.329765] [<00000000eda400e4>] __warn_printk+0x12c/0x138 [ 1376.329770] irq event stamp: 987413 [ 1376.329772] hardirqs last enabled at (987413): [<00000000ee82fab6>] _raw_spin_unlock_irqrestore+0x76/0xc0 [ 1376.329774] hardirqs last disabled at (987412): [<00000000ee82f6b6>] _raw_spin_lock_irqsave+0x9e/0xd8 [ 1376.329779] softirqs last enabled at (986754): [<00000000ee4ec40e>] netif_set_real_num_tx_queues+0x116/0x270 [ 1376.329784] softirqs last disabled at (986752): [<00000000ee4ec3fa>] netif_set_real_num_tx_queues+0x102/0x270 [ 1376.329787] ---[ end trace 0000000000000000 ]--- [ 1376.329792] Unable to handle kernel pointer dereference in virtual kernel address space [ 1376.329794] Failing address: a7190001eb112000 TEID: a7190001eb112803 [ 1376.329796] Fault in home space mode while using kernel ASCE. [ 1376.329801] AS:00000000ef868007 R3:0000000000000024 [ 1376.329831] Oops: 0038 ilc:2 [#1] PREEMPT SMP [ 1376.329834] Modules linked in: kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink uvdevice s390_trng sunrpc mlx5_ib ib_uverbs ib_core ism eadm_sch vfio_ccw mdev v fio_iommu_type1 vfio sch_fq_codel loop configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme mlx5_core nvme_core pkey zcrypt rng_core autofs4 [ 1376.329876] CPU: 6 PID: 1505 Comm: kworker/u800:3 Tainted: G W 6.3.0-rc4 #49 [ 1376.329878] Hardware name: IBM 3931 A01 704 (LPAR) [ 1376.329880] Workqueue: mlx5_health0004:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] [ 1376.329933] Krnl PSW : 0704d00180000000 00000000ee8277c4 (__mutex_lock+0x9fc/0xfa8) [ 1376.329938] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 1376.329941] Krnl GPRS: c0000000fffeffff 000000009fa58100 a7190001eb112000 a7190001eb112000 [ 1376.329944] 0000000000000000 0000000000000000 00000000eee2cd60 a7190001eb112000 [ 1376.329946] 00000000f03839a8 00000000eee2ca48 000000000005a058 0000000000059fe8 [ 1376.329949] 000000009fa58100 000000009fa58100 00000000ee827396 0000038004af3848 [ 1376.329953] Krnl Code: 00000000ee8277b6: c0e5ffffe97d brasl %r14,00000000ee824ab0 [ 1376.329953] 00000000ee8277bc: a7f4fc36 brc 15,00000000ee827028 [ 1376.329953] #00000000ee8277c0: a527fff8 nill %r2,65528 [ 1376.329953] >00000000ee8277c4: 58302034 l %r3,52(%r2) [ 1376.329953] 00000000ee8277c8: ec38fc66007e cij %r3,0,8,00000000ee827094 [ 1376.329953] 00000000ee8277ce: 58202010 l %r2,16(%r2) [ 1376.329953] 00000000ee8277d2: b9140022 lgfr %r2,%r2 [ 1376.329953] 00000000ee8277d6: c0e5ff8ecdfd brasl %r14,00000000eda013d0 [ 1376.329969] Call Trace: [ 1376.329971] [<00000000ee8277c4>] __mutex_lock+0x9fc/0xfa8 [ 1376.329974] ([<00000000ee827396>] __mutex_lock+0x5ce/0xfa8) [ 1376.329977] [<00000000ee827da2>] mutex_lock_nested+0x32/0x40 [ 1376.329980] [<000003ff7fbcdfd8>] mlx5_core_uplink_netdev_set+0x38/0x60 [mlx5_core] [ 1376.330032] [<000003ff7fc1100c>] mlx5e_remove+0x3c/0xa0 [mlx5_core] [ 1376.330085] [<00000000ee42fa7c>] device_release_driver_internal+0x1c4/0x270 [ 1376.330088] [<00000000ee42d788>] bus_remove_device+0x100/0x188 [ 1376.330090] [<00000000ee42724e>] device_del+0x186/0x3b8 [ 1376.330093] [<000003ff7fbf6134>] mlx5_detach_device+0xac/0x110 [mlx5_core] [ 1376.330143] [<000003ff7fbcf812>] mlx5_unload_one_devl_locked+0x52/0xc8 [mlx5_core] [ 1376.330195] [<000003ff7fc777e8>] mlx5_health_try_recover+0x168/0x240 [mlx5_core] [ 1376.330246] [<00000000ee72b9d2>] devlink_health_reporter_recover+0x3a/0x98 [ 1376.330249] [<00000000ee72bb3c>] devlink_health_report+0x10c/0x300 [ 1376.330251] [<000003ff7fbdd0ac>] mlx5_fw_fatal_reporter_err_work+0xc4/0x1d8 [mlx5_core] [ 1376.330303] [<00000000eda6b934>] process_one_work+0x30c/0x688 [ 1376.330305] [<00000000eda6bd12>] worker_thread+0x62/0x438 [ 1376.330308] [<00000000eda77400>] kthread+0x138/0x150 [ 1376.330311] [<00000000ed9eb894>] __ret_from_fork+0x3c/0x58 [ 1376.330314] [<00000000ee8309da>] ret_from_fork+0xa/0x40 [ 1376.330316] INFO: lockdep is turned off. [ 1376.330317] Last Breaking-Event-Address: [ 1376.330319] [<00000000ee826f9c>] __mutex_lock+0x1d4/0xfa8 [ 1376.330323] Kernel panic - not syncing: Fatal exception: panic_on_oops --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -6012,6 +6012,10 @@ static void mlx5e_remove(struct auxiliary_device *adev) struct mlx5e_priv *priv = mlx5e_dev->priv; pm_message_t state = {}; + if (!priv->mdev) { + pr_err("%s with zeroed mlx5e_dev->priv\n", __func__); + return; + } mlx5_core_uplink_netdev_set(priv->mdev, NULL); mlx5e_dcbnl_delete_app(priv);