From patchwork Tue Dec 5 17:25:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlad Buslov X-Patchwork-Id: 13480512 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="eYAVSaWt" Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5116B1BE for ; Tue, 5 Dec 2023 09:26:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NxTl6hKRf1bcvFqzkc+w4sCnn5z1uyCR+OfzqYVlKCRIg5i7+BWTEwY4uSqyvmN9JGvabEvzxrXjkJabPRhhfMVztIw8N1z2IqpEjTzs2g+9iRJvC3X+DSVTBraOJ3wJDa0UIH6PYyU2BiAF8dKx7oR9M4V6zN8MWP03tJ6A9yqxWPrtYBqareIwMtl2UxKN5EAz3nAn8m/kYJtOp7b3XDYt8o+UH53VjNCKXfUh42TlgGbfNVqQf5pBlYAklbj6cJd8Nh+Spz4t13LwhwrU8QHtb/AEJwuRo9XAweiSfmieHJpMxyyTVDQ8gdvauqQ32CnvUtB8DKJPAS43vRZv1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2fXl6KXzn5mMCBs8r1tyeCtD5+KUbGb6TsllZUkVitQ=; b=gwr99OQT2jYx0pbBt3kACnjYstnF4aVWQuJBr1KJYWSnC5YX8wqcwQzvjR3FTEZgyyXFvFsk4wCujJhJLeJZilK/i/Y6TdRkGZUIuIlpgbNCtcssGNMEDhDc1gJbx7Iw1DPvH+gyd1yVzAXQMQTb9F2txCO1Xra1eN0gFXSHSd89Yo/ahLCX0CeU8JV5FaogdfmlmQ6LOUgGo6gwqGl2jEAUWHL8zE+OPTxTuPTcIwi98mNxEx5L0AOICN/ZPBChDQW98CwKDf4mMblglzUDEDoMTGDESvfjqevqYQDqhaLFLL3DNyX9FDj2fSYOAPb11Vsjh+eq1t3W4nyMb4oPCw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2fXl6KXzn5mMCBs8r1tyeCtD5+KUbGb6TsllZUkVitQ=; b=eYAVSaWtlXFLXOkrjLxsNm5k9LOB8hdLz2tihmTk4+jUHXUKMCiVTb10LxZaHoMW9CE+cwPRnFo+NRLx99hvzjcbhq0NPvSWjvGaUZ1JLWaNT2ukHzOlbXL20UJ6xRymha689PFnPCJJgYjtlFOEJju3OkaKJG3zRtSoOF9WwnkgAd5fzFZ9vbt9qAFJXa2PYU0UKSrXdSAQcrn/ucZtJwi7IKAmMjAajIg1jc93vCobaO3Q6m3j9gjCavjD7/5DT2E/jfnZgIs7xR1p/7oTak8vYnwSd+816iwWn1KApIurqIqIJtPY5xgnSt+GX/VlsO3sq8qT4n3i0VSmlZOdYg== Received: from SJ0PR03CA0225.namprd03.prod.outlook.com (2603:10b6:a03:39f::20) by LV2PR12MB5798.namprd12.prod.outlook.com (2603:10b6:408:17a::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.33; Tue, 5 Dec 2023 17:26:35 +0000 Received: from MWH0EPF000971E6.namprd02.prod.outlook.com (2603:10b6:a03:39f:cafe::93) by SJ0PR03CA0225.outlook.office365.com (2603:10b6:a03:39f::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.34 via Frontend Transport; Tue, 5 Dec 2023 17:26:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by MWH0EPF000971E6.mail.protection.outlook.com (10.167.243.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.20 via Frontend Transport; Tue, 5 Dec 2023 17:26:34 +0000 Received: from rnnvmail203.nvidia.com (10.129.68.9) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Tue, 5 Dec 2023 09:26:16 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail203.nvidia.com (10.129.68.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Tue, 5 Dec 2023 09:26:16 -0800 Received: from vdi.nvidia.com (10.127.8.14) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.41 via Frontend Transport; Tue, 5 Dec 2023 09:26:12 -0800 From: Vlad Buslov To: , , , CC: , , , , , , , , "Vlad Buslov" , Paul Blakey Subject: [PATCH net] net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table Date: Tue, 5 Dec 2023 18:25:54 +0100 Message-ID: <20231205172554.3570602-1-vladbu@nvidia.com> X-Mailer: git-send-email 2.39.2 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWH0EPF000971E6:EE_|LV2PR12MB5798:EE_ X-MS-Office365-Filtering-Correlation-Id: fded20ca-1036-4797-9a6c-08dbf5b75423 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SQCVFqaSBs7SVHCXn4+MLhVg7tyhqP4xfDFovhvJX581wQ0lLOtR7G3Kx55lLYdVpW1JwPQrxmNmmvgMm4U0CO5hWNnxPfkxQY8NLOSSsce0HNpclQW1cQNoiC64HKB+PU8pJdMN8fgzo1JwzbynsbStC+KobD3HgGnmDrYNazj46hfVnGYg/2ES9FqpuY2PUircZq9lZY8LTI50zTB9+wY5kxU58PGgSoCeWdSZV2T2a94qY3YEMwX3T3N8WuR4DjOdBuwexYkrVPrp+thEDJSKwEaZIYymozNC4iCNlPxIbzAcTbyWJXUzopuFLTNfCj2oLpPicoGi4lJTMCrh2xnB8G7z1ornU4IpVQVBwyRh657+E2iBQxrhLuQZJ8rdA/W8pSusY8eIRzF0ON1EwxranFCunaS2HQUwe89eeGDtpW7e/go/tulrDB7fWDdtNgfCjfYpIH52FfCyLcAnXEN7VcLPUS3GnXJH4sB5tqicGpV9yYaoS/bJyvM2EgSpXaSJU6COTQFn4QCFliS9ylmb6k7t3UqGi/jPNbt3Sajl+YzW0wIik/L/ip9ZiIX7gFEMO3jE7fV2xf6qNJzF4rRK2gnSeRyd9x0W7HVQxbIaE6l52uknPj5417giQ0nQqyOg6fvBMNLruPd4LQJN8zn2q3wSFN7oCfjHuRyB8asH+UuRTgDz2+Iaki7X8d+gobys0zwpPSri+3qL/XgKZoCCEz/IVeXFpXdYwfpKnYsQlSS7Rp1DYgoOPoS3q+zoRIB6+cP5t+LY1LfotU4TEVJIc/B4+4QU6socYBBlsJI= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230031)(4636009)(376002)(346002)(39860400002)(396003)(136003)(230273577357003)(230173577357003)(230922051799003)(82310400011)(1800799012)(64100799003)(451199024)(186009)(46966006)(40470700004)(36840700001)(26005)(336012)(478600001)(83380400001)(7636003)(47076005)(45080400002)(7696005)(36756003)(40480700001)(1076003)(426003)(356005)(6666004)(82740400003)(2616005)(107886003)(316002)(110136005)(70586007)(54906003)(70206006)(36860700001)(5660300002)(40460700003)(86362001)(4326008)(8936002)(8676002)(7416002)(2906002)(41300700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2023 17:26:34.0934 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fded20ca-1036-4797-9a6c-08dbf5b75423 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MWH0EPF000971E6.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5798 X-Patchwork-Delegate: kuba@kernel.org The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free. Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered. [0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] [106170.970726] ---[ end trace 0000000000000000 ]--- Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov Reviewed-by: Paul Blakey Acked-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 10 ++++++++ net/sched/act_ct.c | 34 ++++++++++++++++++++++----- 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index fe1507c1db82..692d5955911c 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -62,6 +62,8 @@ struct nf_flowtable_type { enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule); void (*free)(struct nf_flowtable *ft); + void (*get)(struct nf_flowtable *ft); + void (*put)(struct nf_flowtable *ft); nf_hookfn *hook; struct module *owner; }; @@ -240,6 +242,11 @@ nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table, } list_add_tail(&block_cb->list, &block->cb_list); + up_write(&flow_table->flow_block_lock); + + if (flow_table->type->get) + flow_table->type->get(flow_table); + return 0; unlock: up_write(&flow_table->flow_block_lock); @@ -262,6 +269,9 @@ nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table, WARN_ON(true); } up_write(&flow_table->flow_block_lock); + + if (flow_table->type->put) + flow_table->type->put(flow_table); } void flow_offload_route_init(struct flow_offload *flow, diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b3f4a503ee2b..f69c47945175 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -286,9 +286,31 @@ static bool tcf_ct_flow_is_outdated(const struct flow_offload *flow) !test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags); } +static void tcf_ct_flow_table_get_ref(struct tcf_ct_flow_table *ct_ft); + +static void tcf_ct_nf_get(struct nf_flowtable *ft) +{ + struct tcf_ct_flow_table *ct_ft = + container_of(ft, struct tcf_ct_flow_table, nf_ft); + + tcf_ct_flow_table_get_ref(ct_ft); +} + +static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft); + +static void tcf_ct_nf_put(struct nf_flowtable *ft) +{ + struct tcf_ct_flow_table *ct_ft = + container_of(ft, struct tcf_ct_flow_table, nf_ft); + + tcf_ct_flow_table_put(ct_ft); +} + static struct nf_flowtable_type flowtable_ct = { .gc = tcf_ct_flow_is_outdated, .action = tcf_ct_flow_table_fill_actions, + .get = tcf_ct_nf_get, + .put = tcf_ct_nf_put, .owner = THIS_MODULE, }; @@ -337,9 +359,13 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) return err; } +static void tcf_ct_flow_table_get_ref(struct tcf_ct_flow_table *ct_ft) +{ + refcount_inc(&ct_ft->ref); +} + static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) { - struct flow_block_cb *block_cb, *tmp_cb; struct tcf_ct_flow_table *ct_ft; struct flow_block *block; @@ -347,13 +373,9 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) rwork); nf_flow_table_free(&ct_ft->nf_ft); - /* Remove any remaining callbacks before cleanup */ block = &ct_ft->nf_ft.flow_block; down_write(&ct_ft->nf_ft.flow_block_lock); - list_for_each_entry_safe(block_cb, tmp_cb, &block->cb_list, list) { - list_del(&block_cb->list); - flow_block_cb_free(block_cb); - } + WARN_ON(!list_empty(&block->cb_list)); up_write(&ct_ft->nf_ft.flow_block_lock); kfree(ct_ft);