From patchwork Mon Jun 19 12:50:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ido Schimmel X-Patchwork-Id: 13284427 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 825CD10952 for ; Mon, 19 Jun 2023 12:52:06 +0000 (UTC) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2084.outbound.protection.outlook.com [40.107.96.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CDCF139 for ; Mon, 19 Jun 2023 05:51:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g0vH+QtpW3YZ4gFaA7OY/kOd1Ul9oUIA/3xFpkfh43W6nwTTR8whO/3r3Ehxjc0ULMHLUbFIkpQn2uaVWpGIlil4kknEp+KfIJJogEBNIBXIoVsKjfCXWqvIl5uFU1LwInGVzxJz/4dm109jFJztO9InNMst0l3H7vgIssTabayWbrdAFgRv0rxjKb7hKBRRe2hqiyRFJEHmVkvx4+8dmkxDB1iDEhF8CoCAH2GAy75gr9yzkbInzmxC9Q8/8xVdIHj16poLQjEBtFj/XJN0JGJbIq/y/bZh8lHfyyPbntOpoFweAcrr3ks9VKTnpJZrlWgz/CgsxhMlDg0hZNSgzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=m3ItEmMUNNSjmvnxMp5s6ceU/AGtYblCSuugUvWT28c=; b=F5dLTK/9Ri4DqVuFU0o2sXgxsLlFLORhMXmSHMNdz5xsJEjmx5h30sEeUD+bWQvRPFDwDQ2Wzw1YIhGSkrVPXT4bQhlp+x3GW0QGzLIWOTSkr5KleXj0NOTps2q3ALJRUd6cT6880ClOv/hAlndCgSCi56wsfWC0rhzNDO84V9otf8frqEHsEw99l6yoYMMu52s2iRV/CXGgZjdljLGrUd48cT7qQncohtEsjEiEOd4TIaAhFdQRQ3S536+QLx6ddCkI9x+7pXVaRE1Ke6QhnV2gGFiOdqbHLHhH0wogByfJziDHrZQay586vNIgAGOpBowtcpSA7nDprjTSLG4UIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=m3ItEmMUNNSjmvnxMp5s6ceU/AGtYblCSuugUvWT28c=; b=G5Zs9hkNmX3lDhebRVJOOwoTMli9OTtVidClnopz+JDIHcroGymxqTDq9OuqIyDMGc6PzOCUCPJihsBor36FDF/12eKdwyFp9D+EnOnhjtQ/Uk2ny9qLBGdn7/HE3W/zVQ4/fUHwpVcUocqZaOOIRn11Lc7CcB6fgiA9p/UcKJEYXJc0VCW0XNn8moadAYhGO++f9QRdJkV7fmYh5ndiDz/vvHKFteVsDAKBEJaYVheBNbjBxn5lHoQL60VfoWmLOlm5045JayuWOH9biG34wkvuoD6Zrxffs/s/ppajojIU0AFvOdse9Q0JB42Sh8nX2tfcVM5aOP/74Axw0i5mHQ== Received: from DM6PR05CA0039.namprd05.prod.outlook.com (2603:10b6:5:335::8) by MN2PR12MB4335.namprd12.prod.outlook.com (2603:10b6:208:1d4::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36; Mon, 19 Jun 2023 12:50:55 +0000 Received: from DM6NAM11FT020.eop-nam11.prod.protection.outlook.com (2603:10b6:5:335:cafe::f) by DM6PR05CA0039.outlook.office365.com (2603:10b6:5:335::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.21 via Frontend Transport; Mon, 19 Jun 2023 12:50:56 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DM6NAM11FT020.mail.protection.outlook.com (10.13.172.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.37 via Frontend Transport; Mon, 19 Jun 2023 12:50:56 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Mon, 19 Jun 2023 05:50:43 -0700 Received: from dev-r-vrt-155.mtr.labs.mlnx (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Mon, 19 Jun 2023 05:50:41 -0700 From: Ido Schimmel To: CC: , , , , , , Ido Schimmel Subject: [RFC PATCH net-next 1/2] devlink: Hold a reference on parent device Date: Mon, 19 Jun 2023 15:50:14 +0300 Message-ID: <20230619125015.1541143-2-idosch@nvidia.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230619125015.1541143-1-idosch@nvidia.com> References: <20230619125015.1541143-1-idosch@nvidia.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT020:EE_|MN2PR12MB4335:EE_ X-MS-Office365-Filtering-Correlation-Id: 2ae507eb-8b79-401c-0109-08db70c3d2df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +zByShGtfovh87+vj9hig9ZEYFdDZ1LLgw71vqd22DBEjrRbh4EvJc1npBJMl1Vbpzw2Y2lMKZjJx8brugYzsDhZ8QRhT9/IlDIZzKmFrJOSoMClgzuxKe9hXiwtk2nmMnkAbTwXPH6dn448EblTPe/DDxRQvkPdnfj9kFQtIQ0k6duwvdxNzbCKa0IFUwQltgTodeSxGiiPpj7IerhL6C0epJN8jnPIcX31YsN1JdF70OQGc41vKV5vzaBWHi+N6a5rmzagx0P0r/gcPNmBN+2te0kbuZAW0CHJ/Cu/U4H8wLpBIzNeoIeyaPz1d44u643fPe/epASBuWbxk19XewBG4SLlNScVxWRSxrx93vRkY/d8X7UZYWyE+3QWGZ0G0eI26RREnN1U18JTye6Zz2BPkAlK2RUmhg/CNSgMFGfrqqyhr1XaNr8wScB25US++Z2ykSBzqqO0Ljn3gFMBozNuFNMMy21qDwS8YYkgWDIAeaATX/SCqU8Gzj+cr60Nr0yWLh6KKGbttH0Ma1VelZP7lShaH6mYcVVtTqn5vyO9zelxQgZ/Ab1e5PYP6Xg1sC4jFIrv2cDPf4/aXpsVrkyiuClJOYHT8d2dx/wWlQ9xjN3TFnjJxD7utNi0pHQFLiyyQfbM5HYni7wWmvYl8ukPdnB9TyeWs4eb/GTGwBJoGpQcfZdiH9mcCdpcnch6dcL4AHGohdAgAkzS7TMIl9ejlMxYr2+/eqjMCo8VlRA8kxsXfs7C8JYT4ZkhyHI/ X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(136003)(346002)(39860400002)(396003)(376002)(451199021)(40470700004)(46966006)(36840700001)(186003)(16526019)(107886003)(40460700003)(26005)(1076003)(82740400003)(36860700001)(2616005)(40480700001)(47076005)(7636003)(356005)(336012)(426003)(83380400001)(82310400005)(478600001)(45080400002)(4326008)(41300700001)(6916009)(70206006)(70586007)(8676002)(36756003)(316002)(8936002)(54906003)(86362001)(2906002)(6666004)(5660300002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jun 2023 12:50:56.0414 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2ae507eb-8b79-401c-0109-08db70c3d2df X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT020.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4335 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Each devlink instance is associated with a parent device and a pointer to this device is stored in the devlink structure, but devlink does not hold a reference on this device. This is going to be a problem in the next patch where - among other things - devlink will acquire the device lock during netns dismantle, before the reload operation. Since netns dismantle is performed asynchronously and since a reference is not held on the parent device, it will be possible to hit a use-after-free. Prepare for the upcoming change by holding a reference on the parent device. Unfortunately, with this patch and this reproducer [1], the following crash can be observed [2]. The last reference is released from the device asynchronously - after an RCU grace period - when the netdevsim module is no longer present. This causes device_release() to invoke a release callback that is no longer present: nsim_bus_dev_release(). It's not clear to me if I'm doing something wrong in devlink (I don't think so), if it's a bug in netdevsim or alternatively a bug in core driver code that allows the bus module to go away before all the devices that were connected to it are released. The problem can be solved by devlink holding a reference on the backing module (i.e., dev->driver->owner) or by each netdevsim device holding a reference on the netdevsim module. However, this will prevent the removal of the module when devices are present, something that is possible today. [1] #!/bin/bash for i in $(seq 1 1000); do echo "$i" insmod drivers/net/netdevsim/netdevsim.ko echo "10 0" > /sys/bus/netdevsim/new_device rmmod netdevsim done [2] BUG: unable to handle page fault for address: ffffffffc0490910 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 12e040067 P4D 12e040067 PUD 12e042067 PMD 100a38067 PTE 0 Oops: 0010 [#1] PREEMPT SMP CPU: 0 PID: 138 Comm: kworker/0:2 Not tainted 6.4.0-rc5-custom-g42e05937ca59 #299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Workqueue: events devlink_release RIP: 0010:0xffffffffc0490910 Code: Unable to access opcode bytes at 0xffffffffc04908e6. RSP: 0018:ffffb487802f3e40 EFLAGS: 00010282 RAX: ffffffffc0490910 RBX: ffff92e6c0057800 RCX: 0001020304050608 RDX: 0000000000000001 RSI: ffffffff92b7d763 RDI: ffff92e6c0057800 RBP: ffff92e6c1ef0a00 R08: ffff92e6c0055158 R09: ffff92e6c2be9134 R10: 0000000000000018 R11: fefefefefefefeff R12: ffffffff934c3e80 R13: ffff92e6c2a1a740 R14: 0000000000000000 R15: ffff92e7f7c30b05 FS: 0000000000000000(0000) GS:ffff92e7f7c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc04908e6 CR3: 0000000101f1a004 CR4: 0000000000170ef0 Call Trace: ? __die+0x23/0x70 ? page_fault_oops+0x181/0x470 ? exc_page_fault+0xa6/0x140 ? asm_exc_page_fault+0x26/0x30 ? device_release+0x23/0x90 ? device_release+0x34/0x90 ? kobject_put+0x7d/0x1b0 ? devlink_release+0x16/0x30 ? process_one_work+0x1e0/0x3d0 ? worker_thread+0x4e/0x3b0 ? rescuer_thread+0x3a0/0x3a0 ? kthread+0xe5/0x120 ? kthread_complete_and_exit+0x20/0x20 ? ret_from_fork+0x1f/0x30 Modules linked in: [last unloaded: netdevsim] Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/devlink/core.c b/net/devlink/core.c index c23ebabadc52..b191112f57af 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -4,6 +4,7 @@ * Copyright (c) 2016 Jiri Pirko */ +#include #include #include "devl_internal.h" @@ -91,6 +92,7 @@ static void devlink_release(struct work_struct *work) mutex_destroy(&devlink->lock); lockdep_unregister_key(&devlink->lock_key); + put_device(devlink->dev); kfree(devlink); } @@ -204,6 +206,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, if (ret < 0) goto err_xa_alloc; + get_device(dev); devlink->dev = dev; devlink->ops = ops; xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); From patchwork Mon Jun 19 12:50:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ido Schimmel X-Patchwork-Id: 13284426 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 689BEC8DC for ; Mon, 19 Jun 2023 12:52:03 +0000 (UTC) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2722EE7F for ; Mon, 19 Jun 2023 05:51:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dJRdvLtqgZgLyJf8KSdlPgTBAraK6ngch6dcDnbr3tGvwQLD/fbS9qOPj/JnARfc9FUq9pTEsdBG3uN+Zjic3YpXWdxTEC03xtoQI43tNKv0GtJq3i7QChuL0fb5UwZADiYNwDdppkRMu6jhVzUGcwq3lUsoY9HXCa3v7M9nOGvhavbhVkF3yhuqO5G15WuWHk4vtOCTQ0ZlsXh72g0uoF/0CCcrp8dUf8bWeQwwUbMVDn9hJrromOKzRXrMBYYmzjtuocZ4doPnotsit0ISOJYUrhZjjythj2BG3mcBllujjdPWIh0JPs67Nz2/6dNqQKH3U0R7dj2uLhFjzZOtMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vOU32seMfkBjGQQP71JeUHRFWNlHECOYcMG4k9cAfJs=; b=dLlRlqhZakrKfH+ijibLoVbL7g+/YtxRdoX7u4VFz1w2b4Hciw3j4boQ1lQC3aYTItzbjX9dWZjbaDrRN2GEPfvRbp3tamGj3NyAZsjRgiVTuAZHYLmZVS8N4jxZ8qbQ1+YfgrW6POS4dLTm4sJcLIzfdW7jbBu7yDHiIrLlUcYt2/NpHH64zLNs8Nm/KR0adAgCViSilZOFeoMb6zBY56zwX/h82g0KczzHoyCf4lXWCx0L8E3EKdUQ1Seh+2HtZA3/JMliTdN+cOr2WkFgyWWvnsc0841IO9XL/nTcgoZMsbdVX6g7QEusoUXho5SiKzIfdJwo50H1bDvq7asv2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vOU32seMfkBjGQQP71JeUHRFWNlHECOYcMG4k9cAfJs=; b=lr+zvBstVbxCgl/MCLpgBQVv7zxNREKzo6MSJq5vf4ewPQpXtw8C0KSa6iZ240F6591TcfqS8gyjT/Nj97PjvK0lJPbyC55GSNYLXDF2MMzmPTizN9I7o1ahdHuOpw+IwY4ClC9cF1x24WqquLhz1Dmgm+7S5P/VMo54ZI7iQCJfUwrUScsKoEZukE4k+4ZDM7sWRyZjpfzdNJnp/xaYHBbxFPmQ8DTVdbcdUTIuv8lKa1fTc5ebxlPKoPmn0nJEvKSyxRomrD6/xaGU+WWSHzkamYYHgOMLNnD8fLWz8Bl1uuPHIlM+yh5PXaKdzt0Wck+0aFNbyLd8EjfT5U/yxA== Received: from DM6PR05CA0055.namprd05.prod.outlook.com (2603:10b6:5:335::24) by SJ2PR12MB8692.namprd12.prod.outlook.com (2603:10b6:a03:543::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.35; Mon, 19 Jun 2023 12:50:59 +0000 Received: from DM6NAM11FT020.eop-nam11.prod.protection.outlook.com (2603:10b6:5:335:cafe::c5) by DM6PR05CA0055.outlook.office365.com (2603:10b6:5:335::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.21 via Frontend Transport; Mon, 19 Jun 2023 12:50:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DM6NAM11FT020.mail.protection.outlook.com (10.13.172.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.37 via Frontend Transport; Mon, 19 Jun 2023 12:50:59 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Mon, 19 Jun 2023 05:50:46 -0700 Received: from dev-r-vrt-155.mtr.labs.mlnx (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Mon, 19 Jun 2023 05:50:43 -0700 From: Ido Schimmel To: CC: , , , , , , Ido Schimmel Subject: [RFC PATCH net-next 2/2] devlink: Acquire device lock during reload Date: Mon, 19 Jun 2023 15:50:15 +0300 Message-ID: <20230619125015.1541143-3-idosch@nvidia.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230619125015.1541143-1-idosch@nvidia.com> References: <20230619125015.1541143-1-idosch@nvidia.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT020:EE_|SJ2PR12MB8692:EE_ X-MS-Office365-Filtering-Correlation-Id: 5fbb0deb-3615-4013-880f-08db70c3d50a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: V9lu1YVGoBRbNqv0GmTBApcu+HH3SvQXepa7mp4TgMdPKoWvVye+anA56I7vf1nfFJtk/nLKXh31i6f9g8Begga8zMdHz4vs7uxJuK6eTX5A2/zMYC0ZdfhOUXnMLVdWKvAYMZcLxLpFEnwVMRdOU9JhBCFKn42UDnvXIGLOjnjlzXCPNWQDSTrvPwB0M4RlqvUg5KKB74ZfqI6QYa5gBTZGYJ6gITtWcMtTbNMjMpUAhIztdZ1D3rnYY7y4l9Q6fuEs2KTXTofXMaxsdMcsBnaTJE+olO2lLcJFMVjfRtrZgUmKwVSVVK6Th36ggXo+XdRC9JWqtAghKXbbY9yTxTT+/1rKVPl8cUd1WYP38/O1U/lXnmJcaQIdqd8p+CABfeCtPx0QSeBXZXVjSpB2J+dRhc5L/qcVDOAFHzoonJksA7dr4KCtgkMcD00J51541BjioUgtpP/BuVB6U4mfICnNEPQuv9D+i7+W9Vku7kK1WkTqHen/C4Ja+FksxUG6KnsqJgqQ6hj8y0WVpk2Qg1tODPKhPr7iFnM2Ke0i9pAKJxM0PbIDfngpLSuPGwLwakePVsebwGD6EcOqLNrRFnk1r/Jnbu7pHWPWJgIBZRbfrH0CoRghU9fjTM+pwJeZcCic2bmH7jgdse1vIQ884FYSUdrmUjQBWrbqrlothGrWFC7J2g6m0PGeELqCoClHy/1v1rczXm6eJAPJCXztGST2Wbmz50xvHWCqpa1PZb2JzZ60x+ftRKtgAymlTz9n X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230028)(4636009)(346002)(39860400002)(376002)(136003)(396003)(451199021)(40470700004)(46966006)(36840700001)(82310400005)(478600001)(4326008)(6916009)(70206006)(70586007)(36756003)(6666004)(356005)(54906003)(316002)(40460700003)(86362001)(1076003)(36860700001)(186003)(16526019)(2616005)(7636003)(336012)(82740400003)(5660300002)(41300700001)(8936002)(8676002)(2906002)(26005)(107886003)(426003)(83380400001)(47076005)(40480700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jun 2023 12:50:59.6665 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5fbb0deb-3615-4013-880f-08db70c3d50a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT020.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8692 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Device drivers register with devlink from their probe routines (under the device lock) by acquiring the devlink instance lock and calling devl_register(). Drivers that support a devlink reload usually implement the reload_{down, up}() operations in a similar fashion to their remove and probe routines, respectively. However, while the remove and probe routines are invoked with the device lock held, the reload operations are only invoked with the devlink instance lock held. It is therefore impossible for drivers to acquire the device lock from their reload operations, as this would result in lock inversion. The motivating use case for invoking the reload operations with the device lock held is in mlxsw which needs to trigger a PCI reset as part of the reload. The driver cannot call pci_reset_function() as this function acquires the device lock. Instead, it needs to call __pci_reset_function_locked which expects the device lock to be held. To that end, adjust devlink to always acquire the device lock before the devlink instance lock when performing a reload. Do that both when reload is triggered explicitly by user space and when it is triggered as part of netns dismantle. Tested the following flows with netdevsim and mlxsw while lockdep is enabled: netdevsim: # echo "10 1" > /sys/bus/netdevsim/new_device # devlink dev reload netdevsim/netdevsim10 # ip netns add bla # devlink dev reload netdevsim/netdevsim10 netns bla # ip netns del bla # echo 10 > /sys/bus/netdevsim/del_device mlxsw: # devlink dev reload pci/0000:01:00.0 # ip netns add bla # devlink dev reload pci/0000:01:00.0 netns bla # ip netns del bla # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove # echo 1 > /sys/bus/pci/rescan Reviewed-by: Jiri Pirko Signed-off-by: Ido Schimmel --- net/devlink/core.c | 4 ++-- net/devlink/dev.c | 8 ++++++++ net/devlink/devl_internal.h | 19 ++++++++++++++++++- net/devlink/health.c | 3 ++- net/devlink/leftover.c | 4 +++- net/devlink/netlink.c | 18 ++++++++++++------ 6 files changed, 45 insertions(+), 11 deletions(-) diff --git a/net/devlink/core.c b/net/devlink/core.c index b191112f57af..a4b6d548e50c 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -279,14 +279,14 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) * all devlink instances from this namespace into init_net. */ devlinks_xa_for_each_registered_get(net, index, devlink) { - devl_lock(devlink); + devl_dev_lock(devlink, true); err = 0; if (devl_is_registered(devlink)) err = devlink_reload(devlink, &init_net, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, DEVLINK_RELOAD_LIMIT_UNSPEC, &actions_performed, NULL); - devl_unlock(devlink); + devl_dev_unlock(devlink, true); devlink_put(devlink); if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); diff --git a/net/devlink/dev.c b/net/devlink/dev.c index bf1d6f1bcfc7..daee2039fb58 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -4,6 +4,7 @@ * Copyright (c) 2016 Jiri Pirko */ +#include #include #include #include "devl_internal.h" @@ -356,6 +357,13 @@ int devlink_reload(struct devlink *devlink, struct net *dest_net, struct net *curr_net; int err; + /* Make sure the reload operations are invoked with the device lock + * held to allow drivers to trigger functionality that expects it + * (e.g., PCI reset) and to close possible races between these + * operations and probe/remove. + */ + device_lock_assert(devlink->dev); + memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, sizeof(remote_reload_stats)); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 62921b2eb0d3..99c3efbae718 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -3,6 +3,7 @@ * Copyright (c) 2016 Jiri Pirko */ +#include #include #include #include @@ -87,12 +88,27 @@ static inline bool devl_is_registered(struct devlink *devlink) return xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); } +static inline void devl_dev_lock(struct devlink *devlink, bool dev_lock) +{ + if (dev_lock) + device_lock(devlink->dev); + devl_lock(devlink); +} + +static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock) +{ + devl_unlock(devlink); + if (dev_lock) + device_unlock(devlink->dev); +} + /* Netlink */ #define DEVLINK_NL_FLAG_NEED_PORT BIT(0) #define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) #define DEVLINK_NL_FLAG_NEED_RATE BIT(2) #define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3) #define DEVLINK_NL_FLAG_NEED_LINECARD BIT(4) +#define DEVLINK_NL_FLAG_NEED_DEV_LOCK BIT(5) enum devlink_multicast_groups { DEVLINK_MCGRP_CONFIG, @@ -122,7 +138,8 @@ struct devlink_cmd { extern const struct genl_small_ops devlink_nl_ops[56]; struct devlink * -devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs); +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs, + bool dev_lock); void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); diff --git a/net/devlink/health.c b/net/devlink/health.c index 194340a8bb86..fa8ccdcffb7a 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -1253,7 +1253,8 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) struct nlattr **attrs = info->attrs; struct devlink *devlink; - devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs, + false); if (IS_ERR(devlink)) return NULL; devl_unlock(devlink); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1f00f874471f..f4e6030e3b56 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5185,7 +5185,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, start_offset = state->start_offset; - devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs, + false); if (IS_ERR(devlink)) return PTR_ERR(devlink); @@ -6478,6 +6479,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_reload, .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEV_LOCK, }, { .cmd = DEVLINK_CMD_PARAM_GET, diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 7a332eb70f70..95fd8e3befea 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -83,7 +83,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { }; struct devlink * -devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs, + bool dev_lock) { struct devlink *devlink; unsigned long index; @@ -97,12 +98,12 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); devlinks_xa_for_each_registered_get(net, index, devlink) { - devl_lock(devlink); + devl_dev_lock(devlink, dev_lock); if (devl_is_registered(devlink) && strcmp(devlink->dev->bus->name, busname) == 0 && strcmp(dev_name(devlink->dev), devname) == 0) return devlink; - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); } @@ -115,9 +116,12 @@ static int devlink_nl_pre_doit(const struct genl_split_ops *ops, struct devlink_linecard *linecard; struct devlink_port *devlink_port; struct devlink *devlink; + bool dev_lock; int err; - devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs); + dev_lock = !!(ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEV_LOCK); + devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs, + dev_lock); if (IS_ERR(devlink)) return PTR_ERR(devlink); @@ -162,7 +166,7 @@ static int devlink_nl_pre_doit(const struct genl_split_ops *ops, return 0; unlock: - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); return err; } @@ -171,9 +175,11 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) { struct devlink *devlink; + bool dev_lock; + dev_lock = !!(ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEV_LOCK); devlink = info->user_ptr[0]; - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); }