From patchwork Mon Jun 5 10:33:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46478C77B73 for ; Mon, 5 Jun 2023 10:33:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230453AbjFEKdn (ORCPT ); Mon, 5 Jun 2023 06:33:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjFEKdm (ORCPT ); Mon, 5 Jun 2023 06:33:42 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 263B2A6 for ; Mon, 5 Jun 2023 03:33:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ADE4D615DA for ; Mon, 5 Jun 2023 10:33:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A630BC4339B; Mon, 5 Jun 2023 10:33:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961220; bh=UV70nQi+5y9J3tXlULJCqEQiGR0HO6Wwylntn9sRpW4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wk9kIaDKj4LtLiV8/WhKRL6lgkvG384/Jky9Qk4nODNFkSPvTo26CdPnWKg5Z1fg5 16DiqX/9SZ/hFqUvMx27czmZoWmm7l/0DQ2Rv+9eNxV7x7qvAsDIOcuTaZEgjRxTKZ 1amUXXvfWqd4C/UA5yMtZ5rWu/TCShQkzm6Pt4vvuLP1COSqcV1PN+sCLbumO5FxKO pDE2ttxjN5/sBI49lk2gqyxTFL33vaBoBIAs3n9E4TPVRUVoD2MU0wiRyUUg7KgRzj 75ONResCZg9QVs04JtWdbp92zZ0+vfqUYwsgtSNtKB69JcfCfqI+/4Y3A8uTCEvUAo 2jNeb1MlcmYDQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Maher Sanalla , linux-rdma@vger.kernel.org, Maor Gottlieb Subject: [PATCH rdma-rc 01/10] RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions Date: Mon, 5 Jun 2023 13:33:17 +0300 Message-Id: <2e9d386785043d48c38711826eb910315c1de141.1685960567.git.leon@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Maher Sanalla Delay drop data is initiated for PFs that have the capability of rq_delay_drop and are in roce profile. However, PFs with RAW ethernet profile do not initiate delay drop data on function load, causing kernel panic if delay drop struct members are accessed later on in case a dropless RQ is created. Thus, stage the delay drop initialization as part of RAW ethernet PF loading process. Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support") Signed-off-by: Maher Sanalla Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 5d45de223c43..f0b394ed7452 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -4275,6 +4275,9 @@ const struct mlx5_ib_profile raw_eth_profile = { STAGE_CREATE(MLX5_IB_STAGE_POST_IB_REG_UMR, mlx5_ib_stage_post_ib_reg_umr_init, NULL), + STAGE_CREATE(MLX5_IB_STAGE_DELAY_DROP, + mlx5_ib_stage_delay_drop_init, + mlx5_ib_stage_delay_drop_cleanup), STAGE_CREATE(MLX5_IB_STAGE_RESTRACK, mlx5_ib_restrack_init, NULL), From patchwork Mon Jun 5 10:33:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267182 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86175C7EE2A for ; Mon, 5 Jun 2023 10:33:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229529AbjFEKdr (ORCPT ); Mon, 5 Jun 2023 06:33:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbjFEKdq (ORCPT ); Mon, 5 Jun 2023 06:33:46 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FB48F2 for ; Mon, 5 Jun 2023 03:33:44 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D830760F07 for ; Mon, 5 Jun 2023 10:33:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB319C4339B; Mon, 5 Jun 2023 10:33:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961223; bh=haWfC2gFWXCmKrL5rQVMTLh/YHC1sL8+awkGk7ZCSog=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nXc8mOv2Q5AmrBgDIoib4bQc92Y0nsSFngO3DgOOz8Jf1ROB010l+6bD7JIIXdv3M RmSg2GcNoKEQeYnM//yBmwcoGRIgMjzcivLL44SLu9qD1VlAs+0lF3hr+UI/TbFTfm +aTsUNPrG6sQK8kf5NM7bK1UJxBsBseysf4EudiMvMULccuHooHUE+Zbdn7VXrN6Cd 7hnQUcy2J4E4NMGmBO7JaHK4kT9TNrTiTx+K6CourXOEfjh4/1CwrIBfpRLGs+z5/8 7YJhfcnDu0lnc8yOD3F2mJIB2CNLLGd33yAoPxlgl5hqVNIL6gr8AgtZewpumCvitC cGhPwL+vynP4Q== From: Leon Romanovsky To: Jason Gunthorpe Cc: Mark Bloch , linux-rdma@vger.kernel.org, Maor Gottlieb Subject: [PATCH rdma-rc 02/10] RDMA/mlx5: Create an indirect flow table for steering anchor Date: Mon, 5 Jun 2023 13:33:18 +0300 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Mark Bloch A misbehaved user can create a steering anchor that points to a kernel flow table and then destroy the anchor without freeing the associated STC. This creates a problem as the kernel can't destroy the flow table since there is still a reference to it. As a result, this can exhaust all available flow table resources, preventing other users from using the RDMA device. To prevent this problem, a solution is implemented where a special flow table with two steering rules is created when a user creates a steering anchor for the first time. The rules include one that drops all traffic and another that points to the kernel flow table. If the steering anchor is destroyed, only the rule pointing to the kernel's flow table is removed. Any traffic reaching the special flow table after that is dropped. Since the special flow table is not destroyed when the steering anchor is destroyed, any issues are prevented from occurring. The remaining resources are only destroyed when the RDMA device is destroyed, which happens after all DEVX objects are freed, including the STCs, thus mitigating the issue. Fixes: 0c6ab0ca9a66 ("RDMA/mlx5: Expose steering anchor to userspace") Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/fs.c | 276 ++++++++++++++++++++++++++- drivers/infiniband/hw/mlx5/fs.h | 16 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 11 ++ 3 files changed, 296 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c index 3008632a6c20..1e419e080b53 100644 --- a/drivers/infiniband/hw/mlx5/fs.c +++ b/drivers/infiniband/hw/mlx5/fs.c @@ -695,8 +695,6 @@ static struct mlx5_ib_flow_prio *_get_prio(struct mlx5_ib_dev *dev, struct mlx5_flow_table_attr ft_attr = {}; struct mlx5_flow_table *ft; - if (mlx5_ib_shared_ft_allowed(&dev->ib_dev)) - ft_attr.uid = MLX5_SHARED_RESOURCE_UID; ft_attr.prio = priority; ft_attr.max_fte = num_entries; ft_attr.flags = flags; @@ -2025,6 +2023,237 @@ static int flow_matcher_cleanup(struct ib_uobject *uobject, return 0; } +static int steering_anchor_create_ft(struct mlx5_ib_dev *dev, + struct mlx5_ib_flow_prio *ft_prio, + enum mlx5_flow_namespace_type ns_type) +{ + struct mlx5_flow_table_attr ft_attr = {}; + struct mlx5_flow_namespace *ns; + struct mlx5_flow_table *ft; + + if (ft_prio->anchor.ft) + return 0; + + ns = mlx5_get_flow_namespace(dev->mdev, ns_type); + if (!ns) + return -EOPNOTSUPP; + + ft_attr.flags = MLX5_FLOW_TABLE_UNMANAGED; + ft_attr.uid = MLX5_SHARED_RESOURCE_UID; + ft_attr.prio = 0; + ft_attr.max_fte = 2; + ft_attr.level = 1; + + ft = mlx5_create_flow_table(ns, &ft_attr); + if (IS_ERR(ft)) + return PTR_ERR(ft); + + ft_prio->anchor.ft = ft; + + return 0; +} + +static void steering_anchor_destroy_ft(struct mlx5_ib_flow_prio *ft_prio) +{ + if (ft_prio->anchor.ft) { + mlx5_destroy_flow_table(ft_prio->anchor.ft); + ft_prio->anchor.ft = NULL; + } +} + +static int +steering_anchor_create_fg_drop(struct mlx5_ib_flow_prio *ft_prio) +{ + int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + struct mlx5_flow_group *fg; + void *flow_group_in; + int err = 0; + + if (ft_prio->anchor.fg_drop) + return 0; + + flow_group_in = kvzalloc(inlen, GFP_KERNEL); + if (!flow_group_in) + return -ENOMEM; + + MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 1); + MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, 1); + + fg = mlx5_create_flow_group(ft_prio->anchor.ft, flow_group_in); + if (IS_ERR(fg)) { + err = PTR_ERR(fg); + goto out; + } + + ft_prio->anchor.fg_drop = fg; + +out: + kvfree(flow_group_in); + + return err; +} + +static void +steering_anchor_destroy_fg_drop(struct mlx5_ib_flow_prio *ft_prio) +{ + if (ft_prio->anchor.fg_drop) { + mlx5_destroy_flow_group(ft_prio->anchor.fg_drop); + ft_prio->anchor.fg_drop = NULL; + } +} + +static int +steering_anchor_create_fg_goto_table(struct mlx5_ib_flow_prio *ft_prio) +{ + int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in); + struct mlx5_flow_group *fg; + void *flow_group_in; + int err = 0; + + if (ft_prio->anchor.fg_goto_table) + return 0; + + flow_group_in = kvzalloc(inlen, GFP_KERNEL); + if (!flow_group_in) + return -ENOMEM; + + fg = mlx5_create_flow_group(ft_prio->anchor.ft, flow_group_in); + if (IS_ERR(fg)) { + err = PTR_ERR(fg); + goto out; + } + ft_prio->anchor.fg_goto_table = fg; + +out: + kvfree(flow_group_in); + + return err; +} + +static void +steering_anchor_destroy_fg_goto_table(struct mlx5_ib_flow_prio *ft_prio) +{ + if (ft_prio->anchor.fg_goto_table) { + mlx5_destroy_flow_group(ft_prio->anchor.fg_goto_table); + ft_prio->anchor.fg_goto_table = NULL; + } +} + +static int +steering_anchor_create_rule_drop(struct mlx5_ib_flow_prio *ft_prio) +{ + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_handle *handle; + + if (ft_prio->anchor.rule_drop) + return 0; + + flow_act.fg = ft_prio->anchor.fg_drop; + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP; + + handle = mlx5_add_flow_rules(ft_prio->anchor.ft, NULL, &flow_act, + NULL, 0); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ft_prio->anchor.rule_drop = handle; + + return 0; +} + +static void steering_anchor_destroy_rule_drop(struct mlx5_ib_flow_prio *ft_prio) +{ + if (ft_prio->anchor.rule_drop) { + mlx5_del_flow_rules(ft_prio->anchor.rule_drop); + ft_prio->anchor.rule_drop = NULL; + } +} + +static int +steering_anchor_create_rule_goto_table(struct mlx5_ib_flow_prio *ft_prio) +{ + struct mlx5_flow_destination dest = {}; + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_handle *handle; + + if (ft_prio->anchor.rule_goto_table) + return 0; + + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; + flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; + flow_act.fg = ft_prio->anchor.fg_goto_table; + + dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; + dest.ft = ft_prio->flow_table; + + handle = mlx5_add_flow_rules(ft_prio->anchor.ft, NULL, &flow_act, + &dest, 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ft_prio->anchor.rule_goto_table = handle; + + return 0; +} + +static void +steering_anchor_destroy_rule_goto_table(struct mlx5_ib_flow_prio *ft_prio) +{ + if (ft_prio->anchor.rule_goto_table) { + mlx5_del_flow_rules(ft_prio->anchor.rule_goto_table); + ft_prio->anchor.rule_goto_table = NULL; + } +} + +static int steering_anchor_create_res(struct mlx5_ib_dev *dev, + struct mlx5_ib_flow_prio *ft_prio, + enum mlx5_flow_namespace_type ns_type) +{ + int err; + + err = steering_anchor_create_ft(dev, ft_prio, ns_type); + if (err) + return err; + + err = steering_anchor_create_fg_drop(ft_prio); + if (err) + goto destroy_ft; + + err = steering_anchor_create_fg_goto_table(ft_prio); + if (err) + goto destroy_fg_drop; + + err = steering_anchor_create_rule_drop(ft_prio); + if (err) + goto destroy_fg_goto_table; + + err = steering_anchor_create_rule_goto_table(ft_prio); + if (err) + goto destroy_rule_drop; + + return 0; + +destroy_rule_drop: + steering_anchor_destroy_rule_drop(ft_prio); +destroy_fg_goto_table: + steering_anchor_destroy_fg_goto_table(ft_prio); +destroy_fg_drop: + steering_anchor_destroy_fg_drop(ft_prio); +destroy_ft: + steering_anchor_destroy_ft(ft_prio); + + return err; +} + +static void mlx5_steering_anchor_destroy_res(struct mlx5_ib_flow_prio *ft_prio) +{ + steering_anchor_destroy_rule_goto_table(ft_prio); + steering_anchor_destroy_rule_drop(ft_prio); + steering_anchor_destroy_fg_goto_table(ft_prio); + steering_anchor_destroy_fg_drop(ft_prio); + steering_anchor_destroy_ft(ft_prio); +} + static int steering_anchor_cleanup(struct ib_uobject *uobject, enum rdma_remove_reason why, struct uverbs_attr_bundle *attrs) @@ -2035,6 +2264,9 @@ static int steering_anchor_cleanup(struct ib_uobject *uobject, return -EBUSY; mutex_lock(&obj->dev->flow_db->lock); + if (!--obj->ft_prio->anchor.rule_goto_table_ref) + steering_anchor_destroy_rule_goto_table(obj->ft_prio); + put_flow_table(obj->dev, obj->ft_prio, true); mutex_unlock(&obj->dev->flow_db->lock); @@ -2042,6 +2274,24 @@ static int steering_anchor_cleanup(struct ib_uobject *uobject, return 0; } +static void fs_cleanup_anchor(struct mlx5_ib_flow_prio *prio, + int count) +{ + while (count--) + mlx5_steering_anchor_destroy_res(&prio[count]); +} + +void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev) +{ + fs_cleanup_anchor(dev->flow_db->prios, MLX5_IB_NUM_FLOW_FT); + fs_cleanup_anchor(dev->flow_db->egress_prios, MLX5_IB_NUM_FLOW_FT); + fs_cleanup_anchor(dev->flow_db->sniffer, MLX5_IB_NUM_SNIFFER_FTS); + fs_cleanup_anchor(dev->flow_db->egress, MLX5_IB_NUM_EGRESS_FTS); + fs_cleanup_anchor(dev->flow_db->fdb, MLX5_IB_NUM_FDB_FTS); + fs_cleanup_anchor(dev->flow_db->rdma_rx, MLX5_IB_NUM_FLOW_FT); + fs_cleanup_anchor(dev->flow_db->rdma_tx, MLX5_IB_NUM_FLOW_FT); +} + static int mlx5_ib_matcher_ns(struct uverbs_attr_bundle *attrs, struct mlx5_ib_flow_matcher *obj) { @@ -2182,21 +2432,31 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_STEERING_ANCHOR_CREATE)( return -ENOMEM; mutex_lock(&dev->flow_db->lock); + ft_prio = _get_flow_table(dev, priority, ns_type, 0); if (IS_ERR(ft_prio)) { - mutex_unlock(&dev->flow_db->lock); err = PTR_ERR(ft_prio); goto free_obj; } ft_prio->refcount++; - ft_id = mlx5_flow_table_id(ft_prio->flow_table); - mutex_unlock(&dev->flow_db->lock); + + if (!ft_prio->anchor.rule_goto_table_ref) { + err = steering_anchor_create_res(dev, ft_prio, ns_type); + if (err) + goto put_flow_table; + } + + ft_prio->anchor.rule_goto_table_ref++; + + ft_id = mlx5_flow_table_id(ft_prio->anchor.ft); err = uverbs_copy_to(attrs, MLX5_IB_ATTR_STEERING_ANCHOR_FT_ID, &ft_id, sizeof(ft_id)); if (err) - goto put_flow_table; + goto destroy_res; + + mutex_unlock(&dev->flow_db->lock); uobj->object = obj; obj->dev = dev; @@ -2205,8 +2465,10 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_STEERING_ANCHOR_CREATE)( return 0; +destroy_res: + --ft_prio->anchor.rule_goto_table_ref; + mlx5_steering_anchor_destroy_res(ft_prio); put_flow_table: - mutex_lock(&dev->flow_db->lock); put_flow_table(dev, ft_prio, true); mutex_unlock(&dev->flow_db->lock); free_obj: diff --git a/drivers/infiniband/hw/mlx5/fs.h b/drivers/infiniband/hw/mlx5/fs.h index ad320adaf321..b9734904f5f0 100644 --- a/drivers/infiniband/hw/mlx5/fs.h +++ b/drivers/infiniband/hw/mlx5/fs.h @@ -10,6 +10,7 @@ #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) int mlx5_ib_fs_init(struct mlx5_ib_dev *dev); +void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev); #else static inline int mlx5_ib_fs_init(struct mlx5_ib_dev *dev) { @@ -21,9 +22,24 @@ static inline int mlx5_ib_fs_init(struct mlx5_ib_dev *dev) mutex_init(&dev->flow_db->lock); return 0; } + +inline void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev) {} #endif + static inline void mlx5_ib_fs_cleanup(struct mlx5_ib_dev *dev) { + /* When a steering anchor is created, a special flow table is also + * created for the user to reference. Since the user can reference it, + * the kernel cannot trust that when the user destroys the steering + * anchor, they no longer reference the flow table. + * + * To address this issue, when a user destroys a steering anchor, only + * the flow steering rule in the table is destroyed, but the table + * itself is kept to deal with the above scenario. The remaining + * resources are only removed when the RDMA device is destroyed, which + * is a safe assumption that all references are gone. + */ + mlx5_ib_fs_cleanup_anchor(dev); kfree(dev->flow_db); } #endif /* _MLX5_IB_FS_H */ diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 2c148ef60088..2a2d2a356c41 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -238,8 +238,19 @@ enum { #define MLX5_IB_NUM_SNIFFER_FTS 2 #define MLX5_IB_NUM_EGRESS_FTS 1 #define MLX5_IB_NUM_FDB_FTS MLX5_BY_PASS_NUM_REGULAR_PRIOS + +struct mlx5_ib_anchor { + struct mlx5_flow_table *ft; + struct mlx5_flow_group *fg_goto_table; + struct mlx5_flow_group *fg_drop; + struct mlx5_flow_handle *rule_goto_table; + struct mlx5_flow_handle *rule_drop; + unsigned int rule_goto_table_ref; +}; + struct mlx5_ib_flow_prio { struct mlx5_flow_table *flow_table; + struct mlx5_ib_anchor anchor; unsigned int refcount; }; From patchwork Mon Jun 5 10:33:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267184 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3708C77B73 for ; Mon, 5 Jun 2023 10:33:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229455AbjFEKdy (ORCPT ); Mon, 5 Jun 2023 06:33:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229982AbjFEKdx (ORCPT ); Mon, 5 Jun 2023 06:33:53 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6A90109 for ; Mon, 5 Jun 2023 03:33:50 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B3839615DA for ; Mon, 5 Jun 2023 10:33:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7CA26C433D2; Mon, 5 Jun 2023 10:33:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961229; bh=PPWTLbpL+f2ai/TredqX3MVrh/oL0bBE+3aIDie/Yqk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WAbZBkS5vQ+fQ6WApAcbDnx83JpxnMnBWE36My1gj7CtkP7cR2JVxbBSjInKD8+f5 bC6EtmyO++rop/cnJr9+eQZvY1wd89BNSrGBAa2hstOks/mHy7aqJc4idRMoq7kLtP IyCbwxixNEMnmw6aD4fGgfudKXr617hMrOWWjCe5i+t+r7fsa7hFUnW/jBuNlsIfa/ xg1m0/kxDvgj/TSoLKovLI/aCZZdWWmq52ebY78Zd6+ZN1jxZnzwDWjVmPgngOzX+M K01spuDL0eRFXEm/oS2RybJvSLa+Y8kZAYA/n6kYtYPgDvROWIJk8j1Vpvn9aIFcvp bznK0qZ6jsJxQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Zhang Subject: [PATCH rdma-rc 03/10] RDMA/mlx5: Fix Q-counters per vport allocation Date: Mon, 5 Jun 2023 13:33:19 +0300 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Patrisious Haddad Previously Q-counters data was being allocated over the PF for all of the available vports, however that isn't necessary. Since each VF or SF has a Q-counter allocated for itself. So we only need to allocate two counters data structures, one for the device counters, and one for all the other vports to expose the representors, since they only need to read from it in order to determine mainly counters numbers and names, so they can all share. This in turn also solves a bug we previously had where we couldn't switch the device to switchdev mode when there were more than 128 SF/VFs configured, since that is the maximum amount of Q-counters available for a single port Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") Signed-off-by: Patrisious Haddad Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 1c06920505d2..3d7ef81a50b8 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -209,7 +209,8 @@ static const struct mlx5_ib_counters *get_counters(struct mlx5_ib_dev *dev, !vport_qcounters_supported(dev)) || !port_num) return &dev->port[0].cnts; - return &dev->port[port_num - 1].cnts; + return is_mdev_switchdev_mode(dev->mdev) ? + &dev->port[1].cnts : &dev->port[port_num - 1].cnts; } /** @@ -262,7 +263,7 @@ static struct rdma_hw_stats * mlx5_ib_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num) { struct mlx5_ib_dev *dev = to_mdev(ibdev); - const struct mlx5_ib_counters *cnts = &dev->port[port_num - 1].cnts; + const struct mlx5_ib_counters *cnts = get_counters(dev, port_num); return do_alloc_stats(cnts); } @@ -725,11 +726,11 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev, static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev) { u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {}; - int num_cnt_ports; + int num_cnt_ports = dev->num_ports; int i, j; - num_cnt_ports = (!is_mdev_switchdev_mode(dev->mdev) || - vport_qcounters_supported(dev)) ? dev->num_ports : 1; + if (is_mdev_switchdev_mode(dev->mdev)) + num_cnt_ports = min(2, num_cnt_ports); MLX5_SET(dealloc_q_counter_in, in, opcode, MLX5_CMD_OP_DEALLOC_Q_COUNTER); @@ -761,15 +762,22 @@ static int mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev) { u32 out[MLX5_ST_SZ_DW(alloc_q_counter_out)] = {}; u32 in[MLX5_ST_SZ_DW(alloc_q_counter_in)] = {}; - int num_cnt_ports; + int num_cnt_ports = dev->num_ports; int err = 0; int i; bool is_shared; MLX5_SET(alloc_q_counter_in, in, opcode, MLX5_CMD_OP_ALLOC_Q_COUNTER); is_shared = MLX5_CAP_GEN(dev->mdev, log_max_uctx) != 0; - num_cnt_ports = (!is_mdev_switchdev_mode(dev->mdev) || - vport_qcounters_supported(dev)) ? dev->num_ports : 1; + + /* + * In switchdev we need to allocate two ports, one that is used for + * the device Q_counters and it is essentially the real Q_counters of + * this device, while the other is used as a helper for PF to be able to + * query all other vports. + */ + if (is_mdev_switchdev_mode(dev->mdev)) + num_cnt_ports = min(2, num_cnt_ports); for (i = 0; i < num_cnt_ports; i++) { err = __mlx5_ib_alloc_counters(dev, &dev->port[i].cnts, i); From patchwork Mon Jun 5 10:33:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DBD0C7EE24 for ; Mon, 5 Jun 2023 10:34:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229716AbjFEKeU (ORCPT ); Mon, 5 Jun 2023 06:34:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231308AbjFEKeP (ORCPT ); Mon, 5 Jun 2023 06:34:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C0BE11A for ; Mon, 5 Jun 2023 03:33:58 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2B0D260F07 for ; Mon, 5 Jun 2023 10:33:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2854EC433D2; Mon, 5 Jun 2023 10:33:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961237; bh=ghakADF3Do4ns/JpwVP8ei3W2KNCAGymf7RU8bSQmHU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Mcz3CJifsxpGR6k90e+zzYwBgUXQuD7X5DPMT5nAhS2k7gfLm4mabf1X23pTeQUQL xPK0Z7T+ia3uzfa56M7gxfq0e/FYY8quef0gULrH4FXDd8euH9zHrDqkDw66AePnhF C+AmrNCfBRoO5DIF0Kuzvr8fFXKM0+tcV9nQwQ8Nd2S24oJSYhQnh/a5vZGxUwvKgs pD2JsqgV6GasHEYLJ5+jKJEfWXfWOXeSVsZ7mxa6sbPw4UL6c1qaQ4EFzVjmvKtIYM hJqiTHnCo2TClxui7FIfXKnbHZoenVK0aZm2E526IvPj3+NESj3CtS9oZ/QJXjXLGU hk87YIKqR5WVw== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Zhang Subject: [PATCH rdma-rc 04/10] RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters Date: Mon, 5 Jun 2023 13:33:20 +0300 Message-Id: <016777b7f16eb6bb178999ff59097d0c0f91f68a.1685960567.git.leon@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Patrisious Haddad Previously the Q-counters initialization assumed that the vport Q-counters structures and the normal Q-counters structures are identical in size, and hence when a Q-counter was added to normal Q-counters structure but not to the vport Q-counters struct it would lead to that counter name being NULL in switchdev mode, which could cause the kernel crash below. Currently break the dependency between those two structure and always use the appropriate struct size, in order to remove the assumption that both structure sizes are equal. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 20c64a067 P4D 20c64a067 PUD 20152b067 PMD 0 Oops: 0000 [#1] SMP CPU: 19 PID: 11717 Comm: devlink Tainted: G OE 6.2.0_mlnx #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:strlen+0x0/0x20 Code: 66 2e 0f 1f 84 00 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 48 89 f8 74 10 48 83 c7 01 80 3f 00 75 f7 48 29 c7 48 89 RSP: 0018:ffffc9000318b618 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000002c00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888211918110 R09: ffff888211918000 R10: 000000000000001e R11: ffff888211918000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881038ec250 FS: 00007fa53342fe80(0000) GS:ffff88885fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000002042b2003 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: kernfs_name_hash+0x12/0x80 kernfs_find_ns+0x35/0xb0 kernfs_remove_by_name_ns+0x46/0xc0 remove_files.isra.1+0x30/0x70 internal_create_group+0x253/0x380 internal_create_groups.part.4+0x3e/0xa0 setup_port+0x27a/0x8c0 [ib_core] ib_setup_port_attrs+0x9d/0x300 [ib_core] ib_register_device+0x48e/0x550 [ib_core] __mlx5_ib_add+0x2b/0x80 [mlx5_ib] mlx5_ib_vport_rep_load+0x141/0x360 [mlx5_ib] mlx5_esw_offloads_rep_load+0x48/0xa0 [mlx5_core] esw_offloads_enable+0x41e/0xd10 [mlx5_core] mlx5_eswitch_enable_locked+0x1e3/0x340 [mlx5_core] ? __cond_resched+0x15/0x30 mlx5_devlink_eswitch_mode_set+0x204/0x3c0 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0x8d/0x100 genl_family_rcv_msg_doit.isra.19+0xea/0x110 genl_rcv_msg+0x19b/0x290 ? devlink_nl_cmd_region_read_dumpit+0x760/0x760 ? devlink_nl_cmd_port_param_get_doit+0x30/0x30 ? devlink_put+0x50/0x50 ? genl_get_cmd_both+0x60/0x60 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1be/0x2a0 netlink_sendmsg+0x361/0x4d0 sock_sendmsg+0x30/0x40 __sys_sendto+0x11a/0x150 ? handle_mm_fault+0x101/0x2b0 ? do_user_addr_fault+0x21d/0x720 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fa533611cba Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffdb6a898a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000daab00 RCX: 00007fa533611cba RDX: 0000000000000038 RSI: 0000000000daab00 RDI: 0000000000000003 RBP: 0000000000daa910 R08: 00007fa533822000 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) mlx_compat(OE) mlxfw(OE) memtrack(OE) pci_hyperv_intf nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat dns_resolver nf_nat br_netfilter nfs bridge stp llc lockd grace fscache netfs rfkill overlay iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel i2c_i801 sunrpc lpc_ich sha512_ssse3 pcspkr i2c_smbus mfd_core drm sch_fq_codel i2c_core ip_tables fuse crc32c_intel serio_raw virtio_net net_failover failover [last unloaded: mlxfw] CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:strlen+0x0/0x20 Code: 66 2e 0f 1f 84 00 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 48 89 f8 74 10 48 83 c7 01 80 3f 00 75 f7 48 29 c7 48 89 RSP: 0018:ffffc9000318b618 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000002c00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888211918110 R09: ffff888211918000 R10: 000000000000001e R11: ffff888211918000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881038ec250 FS: 00007fa53342fe80(0000) GS:ffff88885fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000002042b2003 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") Signed-off-by: Patrisious Haddad Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 58 ++++++++++++++++++--------- 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 3d7ef81a50b8..f40d9c61e30b 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -576,43 +576,53 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev, bool is_vport = is_mdev_switchdev_mode(dev->mdev) && port_num != MLX5_VPORT_PF; const struct mlx5_ib_counter *names; - int j = 0, i; + int j = 0, i, size; names = is_vport ? vport_basic_q_cnts : basic_q_cnts; - for (i = 0; i < ARRAY_SIZE(basic_q_cnts); i++, j++) { + size = is_vport ? ARRAY_SIZE(vport_basic_q_cnts) : + ARRAY_SIZE(basic_q_cnts); + for (i = 0; i < size; i++, j++) { descs[j].name = names[i].name; - offsets[j] = basic_q_cnts[i].offset; + offsets[j] = names[i].offset; } names = is_vport ? vport_out_of_seq_q_cnts : out_of_seq_q_cnts; + size = is_vport ? ARRAY_SIZE(vport_out_of_seq_q_cnts) : + ARRAY_SIZE(out_of_seq_q_cnts); if (MLX5_CAP_GEN(dev->mdev, out_of_seq_cnt)) { - for (i = 0; i < ARRAY_SIZE(out_of_seq_q_cnts); i++, j++) { + for (i = 0; i < size; i++, j++) { descs[j].name = names[i].name; - offsets[j] = out_of_seq_q_cnts[i].offset; + offsets[j] = names[i].offset; } } names = is_vport ? vport_retrans_q_cnts : retrans_q_cnts; + size = is_vport ? ARRAY_SIZE(vport_retrans_q_cnts) : + ARRAY_SIZE(retrans_q_cnts); if (MLX5_CAP_GEN(dev->mdev, retransmission_q_counters)) { - for (i = 0; i < ARRAY_SIZE(retrans_q_cnts); i++, j++) { + for (i = 0; i < size; i++, j++) { descs[j].name = names[i].name; - offsets[j] = retrans_q_cnts[i].offset; + offsets[j] = names[i].offset; } } names = is_vport ? vport_extended_err_cnts : extended_err_cnts; + size = is_vport ? ARRAY_SIZE(vport_extended_err_cnts) : + ARRAY_SIZE(extended_err_cnts); if (MLX5_CAP_GEN(dev->mdev, enhanced_error_q_counters)) { - for (i = 0; i < ARRAY_SIZE(extended_err_cnts); i++, j++) { + for (i = 0; i < size; i++, j++) { descs[j].name = names[i].name; - offsets[j] = extended_err_cnts[i].offset; + offsets[j] = names[i].offset; } } names = is_vport ? vport_roce_accl_cnts : roce_accl_cnts; + size = is_vport ? ARRAY_SIZE(vport_roce_accl_cnts) : + ARRAY_SIZE(roce_accl_cnts); if (MLX5_CAP_GEN(dev->mdev, roce_accl)) { - for (i = 0; i < ARRAY_SIZE(roce_accl_cnts); i++, j++) { + for (i = 0; i < size; i++, j++) { descs[j].name = names[i].name; - offsets[j] = roce_accl_cnts[i].offset; + offsets[j] = names[i].offset; } } @@ -662,25 +672,37 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev, static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev, struct mlx5_ib_counters *cnts, u32 port_num) { - u32 num_counters, num_op_counters = 0; + bool is_vport = is_mdev_switchdev_mode(dev->mdev) && + port_num != MLX5_VPORT_PF; + u32 num_counters, num_op_counters = 0, size; - num_counters = ARRAY_SIZE(basic_q_cnts); + size = is_vport ? ARRAY_SIZE(vport_basic_q_cnts) : + ARRAY_SIZE(basic_q_cnts); + num_counters = size; + size = is_vport ? ARRAY_SIZE(vport_out_of_seq_q_cnts) : + ARRAY_SIZE(out_of_seq_q_cnts); if (MLX5_CAP_GEN(dev->mdev, out_of_seq_cnt)) - num_counters += ARRAY_SIZE(out_of_seq_q_cnts); + num_counters += size; + size = is_vport ? ARRAY_SIZE(vport_retrans_q_cnts) : + ARRAY_SIZE(retrans_q_cnts); if (MLX5_CAP_GEN(dev->mdev, retransmission_q_counters)) - num_counters += ARRAY_SIZE(retrans_q_cnts); + num_counters += size; + size = is_vport ? ARRAY_SIZE(vport_extended_err_cnts) : + ARRAY_SIZE(extended_err_cnts); if (MLX5_CAP_GEN(dev->mdev, enhanced_error_q_counters)) - num_counters += ARRAY_SIZE(extended_err_cnts); + num_counters += size; + size = is_vport ? ARRAY_SIZE(vport_roce_accl_cnts) : + ARRAY_SIZE(roce_accl_cnts); if (MLX5_CAP_GEN(dev->mdev, roce_accl)) - num_counters += ARRAY_SIZE(roce_accl_cnts); + num_counters += size; cnts->num_q_counters = num_counters; - if (is_mdev_switchdev_mode(dev->mdev) && port_num != MLX5_VPORT_PF) + if (is_vport) goto skip_non_qcounters; if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) { From patchwork Mon Jun 5 10:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7148BC77B73 for ; Mon, 5 Jun 2023 10:33:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229683AbjFEKdt (ORCPT ); Mon, 5 Jun 2023 06:33:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229455AbjFEKds (ORCPT ); Mon, 5 Jun 2023 06:33:48 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39F97EA for ; Mon, 5 Jun 2023 03:33:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CB3136121E for ; Mon, 5 Jun 2023 10:33:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD865C433EF; Mon, 5 Jun 2023 10:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961227; bh=4nPWi1m1zyWeKstR86T0YCsDA7ksQ9Xx04R5YkTHyiQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OUSaG6AtpyN5xL1cCXJRVCIG1pZruPNHPvjLybvsp58HM8H6fSulJPSy83N9o/xKr 6Y9BEtIOGudECXYfBJhWrKtP39dXU4EZDPqfxt6a4o3gvQ1GmweAxrMbkWURoRBaeV Q02qHDaUcgNj9geHNHGP2PpQpEPMiyU6ZIbW37qbX665lYBzYQmDcdgrVxA4kZoTQ4 qEO/V0Svv6WX6HHZj7CHOAqKU2qcMwbe1XXM2DpEm9iDplynm3wx2eYCPAEDL6uKl3 Ps87Z7l3GWZFbByJs2vQYhbtxqhtpblEhcLnhUilikZ3cgYGgp+Re1JdYyyTnNmnra u+UE36BlRQS9w== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Zhang Subject: [PATCH rdma-rc 05/10] RDMA/mlx5: Fix Q-counters query in LAG mode Date: Mon, 5 Jun 2023 13:33:21 +0300 Message-Id: <778d7d7a24892348d0bdef17d2e5f9e044717e86.1685960567.git.leon@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Patrisious Haddad Previously we used the core device associated to the IB device in order to do the Q-counters query to the FW, but in LAG mode it is possible that the core device isn't the one that created this VF. Hence instead of using the core device to query the Q-counters we use the ESW core device which is guaranteed to be that of the VF. Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") Signed-off-by: Patrisious Haddad Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index f40d9c61e30b..93257fa5aae8 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -330,6 +330,7 @@ static int mlx5_ib_query_q_counters_vport(struct mlx5_ib_dev *dev, { u32 out[MLX5_ST_SZ_DW(query_q_counter_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_q_counter_in)] = {}; + struct mlx5_core_dev *mdev; __be32 val; int ret, i; @@ -337,12 +338,16 @@ static int mlx5_ib_query_q_counters_vport(struct mlx5_ib_dev *dev, dev->port[port_num].rep->vport == MLX5_VPORT_UPLINK) return 0; + mdev = mlx5_eswitch_get_core_dev(dev->port[port_num].rep->esw); + if (!mdev) + return -EOPNOTSUPP; + MLX5_SET(query_q_counter_in, in, opcode, MLX5_CMD_OP_QUERY_Q_COUNTER); MLX5_SET(query_q_counter_in, in, other_vport, 1); MLX5_SET(query_q_counter_in, in, vport_number, dev->port[port_num].rep->vport); MLX5_SET(query_q_counter_in, in, aggregate, 1); - ret = mlx5_cmd_exec_inout(dev->mdev, query_q_counter, in, out); + ret = mlx5_cmd_exec_inout(mdev, query_q_counter, in, out); if (ret) return ret; From patchwork Mon Jun 5 10:33:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267185 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49212C77B73 for ; Mon, 5 Jun 2023 10:34:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231228AbjFEKeA (ORCPT ); Mon, 5 Jun 2023 06:34:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231231AbjFEKd7 (ORCPT ); Mon, 5 Jun 2023 06:33:59 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37DB7122 for ; Mon, 5 Jun 2023 03:33:52 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 269C3615D7 for ; Mon, 5 Jun 2023 10:33:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17874C433EF; Mon, 5 Jun 2023 10:33:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961230; bh=QVUDK/aRoQ7jFxhURfJyIlysyNbdKu1oCPBu1IjHqGo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gM1CubwMBs3piNgGWod006B626JQLcipSssakZdEtA2EJ5Bx+LyioVPn4y06EI+BF cbhNCpWM2TM01dU7mNZ8wSR9txYKbqVeVrMWiUbOqmR/1vX3KJz8SYRfbr0ZQ+31+V R0QKgKP5roJEiJGtVezqxd7STPo2iw+zwuEHPvBgrN1Mej2sfxpY6a2cNwljXzAvJ6 wssmoS4SRx8UM0vs4Enwtv1LiIqPudBWR+EVaWMMsV1gwBEUA2RqdTT/EZQesdwjDS tZL3kW2Ym0LTleETIlGzl2BYSNOPvbidQvFZG/ti899G4T3pl2LF8AuYRyI92/Jo1s +0gvK6rohGTpg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Michael Guralnik , linux-rdma@vger.kernel.org, Maor Gottlieb Subject: [PATCH rdma-rc 06/10] RDMA/mlx5: Fix mkey cache possible deadlock on cleanup Date: Mon, 5 Jun 2023 13:33:22 +0300 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Michael Guralnik Move cancellation of delayed cache work that adds or removes mkeys to the a separate iteration in the mkey cleanup so that we don't call someone_adding() while holding the rb_lock. Lockdep: WARNING: possible circular locking dependency detected 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted ------------------------------------------------------ devlink/53872 is trying to acquire lock: ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900 but task is already holding lock: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}: __mutex_lock+0x14c/0x15c0 delayed_cache_work_func+0x2d1/0x610 [mlx5_ib] process_one_work+0x7c2/0x1310 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}: __lock_acquire+0x2d8a/0x5fe0 lock_acquire+0x1c1/0x540 __flush_work+0xe8/0x900 __cancel_work_timer+0x2c7/0x3f0 mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib] __mlx5_ib_remove+0x68/0x120 [mlx5_ib] mlx5r_remove+0x63/0x80 [mlx5_ib] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x3c1/0x600 bus_remove_device+0x2a5/0x560 device_del+0x492/0xb80 mlx5_detach_device+0x1a9/0x360 [mlx5_core] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core] devlink_reload+0x439/0x590 devlink_nl_cmd_reload+0xaef/0xff0 genl_family_rcv_msg_doit.isra.0+0x1bd/0x290 genl_rcv_msg+0x3ca/0x6c0 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x7a1/0xca0 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1bc/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev->cache.rb_lock); lock((work_completion)(&(&ent->dwork)->work)); lock(&dev->cache.rb_lock); lock((work_completion)(&(&ent->dwork)->work)); *** DEADLOCK *** 6 locks held by devlink/53872: #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0 #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] Signed-off-by: Michael Guralnik Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mr.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 1ce48e485c5b..f113656e4027 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1033,7 +1033,15 @@ void mlx5_mkey_cache_cleanup(struct mlx5_ib_dev *dev) xa_lock_irq(&ent->mkeys); ent->disabled = true; xa_unlock_irq(&ent->mkeys); - cancel_delayed_work_sync(&ent->dwork); + } + + /* Run the canceling of delayed works on the cache in a separate loop after + * disabling all entries to ensure someone_adding() will not try taking the + * rb_lock while flushing the workqueue. + */ + for (node = rb_first(root); node; node = rb_next(node)) { + ent = rb_entry(node, struct mlx5_cache_ent, node); + cancel_delayed_work(&ent->dwork); } mlx5_mkey_cache_debugfs_cleanup(dev); From patchwork Mon Jun 5 10:33:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267186 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04AFCC77B73 for ; Mon, 5 Jun 2023 10:34:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231874AbjFEKeP (ORCPT ); Mon, 5 Jun 2023 06:34:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231330AbjFEKeF (ORCPT ); Mon, 5 Jun 2023 06:34:05 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 540DE109 for ; Mon, 5 Jun 2023 03:33:55 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9EA30615E2 for ; Mon, 5 Jun 2023 10:33:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B13EC433D2; Mon, 5 Jun 2023 10:33:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961234; bh=oFgmMBHAFi8w4Ukx5OsdcbE21NBOubsFXNRuO9K/0bs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mNHLDiIwiOSm7ob274Wom3eyqCo4RZ1owiCfsupGiANk3oLW8dr6/YcfrT4BtyS2H Z0QfBsXSnj6i7SEHry2fZBWmtkKbxR4DfXT95XIid5okFLLRgoFoEJ5s9Vhy/xOts/ d3jhBOwG8B/kZbsAO6wysP9Edhl09oj2APD8/lWxSaDDa4YVTsG7NDtcVoGyFYLDWF BRiyYz2s1SNW/bz2e6rTTdiD+P5am11lkqzTv5f48Ie/LpzSK5aOTglwomHpPxdgVc 8y0PF8PEYXUXYAtzcxjMkTNjMIBuve6YpsdbR3OwZUyFkHS3E4hzKrOaA6c/c6b/YB N60ZCZUNos81A== From: Leon Romanovsky To: Jason Gunthorpe Cc: Mark Zhang , linux-rdma@vger.kernel.org Subject: [PATCH rdma-rc 07/10] RDMA/cma: Always set static rate to 0 for RoCE Date: Mon, 5 Jun 2023 13:33:23 +0300 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Mark Zhang Set static rate to 0 as it should be discovered by path query and has no meaning for RoCE. This also avoid of using the rtnl lock and ethtool API, which is a bottleneck when try to setup many rdma-cm connections at the same time, especially with multiple processes. Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices") Signed-off-by: Mark Zhang Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cma.c | 4 ++-- include/rdma/ib_addr.h | 23 ----------------------- 2 files changed, 2 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 93a1c48d0c32..6b3f4384e46a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3295,7 +3295,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) route->path_rec->traffic_class = tos; route->path_rec->mtu = iboe_get_mtu(ndev->mtu); route->path_rec->rate_selector = IB_SA_EQ; - route->path_rec->rate = iboe_get_rate(ndev); + route->path_rec->rate = IB_RATE_PORT_CURRENT; dev_put(ndev); route->path_rec->packet_life_time_selector = IB_SA_EQ; /* In case ACK timeout is set, use this value to calculate @@ -4964,7 +4964,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, if (!ndev) return -ENODEV; - ib.rec.rate = iboe_get_rate(ndev); + ib.rec.rate = IB_RATE_PORT_CURRENT; ib.rec.hop_limit = 1; ib.rec.mtu = iboe_get_mtu(ndev->mtu); diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index d808dc3d239e..811a0f11d0db 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -194,29 +194,6 @@ static inline enum ib_mtu iboe_get_mtu(int mtu) return 0; } -static inline int iboe_get_rate(struct net_device *dev) -{ - struct ethtool_link_ksettings cmd; - int err; - - rtnl_lock(); - err = __ethtool_get_link_ksettings(dev, &cmd); - rtnl_unlock(); - if (err) - return IB_RATE_PORT_CURRENT; - - if (cmd.base.speed >= 40000) - return IB_RATE_40_GBPS; - else if (cmd.base.speed >= 30000) - return IB_RATE_30_GBPS; - else if (cmd.base.speed >= 20000) - return IB_RATE_20_GBPS; - else if (cmd.base.speed >= 10000) - return IB_RATE_10_GBPS; - else - return IB_RATE_PORT_CURRENT; -} - static inline int rdma_link_local_addr(struct in6_addr *addr) { if (addr->s6_addr32[0] == htonl(0xfe800000) && From patchwork Mon Jun 5 10:33:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267190 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7325FC7EE24 for ; Mon, 5 Jun 2023 10:34:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232040AbjFEKe1 (ORCPT ); Mon, 5 Jun 2023 06:34:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232054AbjFEKeS (ORCPT ); Mon, 5 Jun 2023 06:34:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44F57135 for ; Mon, 5 Jun 2023 03:34:08 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2679A60F07 for ; Mon, 5 Jun 2023 10:34:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 25D86C433EF; Mon, 5 Jun 2023 10:34:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961247; bh=QpFFFlNxWIcnRDAY9hU4W8LNXtFoBTcdlUBVKXO9wm8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q0CJerbs96eQyyIk5H49BEopjxL74M2Obqei5ILI5uSo2pTqF9d+amUXD9TIvPjin wXgLq2RJ8NxCdNydgxst47i0r35kwf9thX+AwsecesydMQA/ED+nxJra99PXfeD1fL OSRt3mOBTUacbekK2+xTZhXgFEzKiLI7pGcKQ1J2vYaGqazih6XUIiFytrsVngKU6y AOJ68CNHoGBMtVpzFbOHdsLQFFTPmEBry73BjtTDH2WzAKjXQdvkrjD42kPESreJ0h 07f9HYOIy/vU4r/LuYnrmWc0do/zYaPS68+f2yLhYTpj+3e0ArPScI/iFkDP8vORXA /tNsX1hR92RDg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Edward Srouji , linux-rdma@vger.kernel.org Subject: [PATCH rdma-rc 08/10] RDMA/uverbs: Restrict usage of privileged QKEYs Date: Mon, 5 Jun 2023 13:33:24 +0300 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Edward Srouji According to the IB specification rel-1.6, section 3.5.3: "QKEYs with the most significant bit set are considered controlled QKEYs, and a HCA does not allow a consumer to arbitrarily specify a controlled QKEY." Thus, block non-privileged users from setting such a QKEY. Fixes: bc38a6abdd5a ("[PATCH] IB uverbs: core implementation") Signed-off-by: Edward Srouji Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/uverbs_cmd.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 4796f6a8828c..e836c9c477f6 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1850,8 +1850,13 @@ static int modify_qp(struct uverbs_attr_bundle *attrs, attr->path_mtu = cmd->base.path_mtu; if (cmd->base.attr_mask & IB_QP_PATH_MIG_STATE) attr->path_mig_state = cmd->base.path_mig_state; - if (cmd->base.attr_mask & IB_QP_QKEY) + if (cmd->base.attr_mask & IB_QP_QKEY) { + if (cmd->base.qkey & IB_QP_SET_QKEY && !capable(CAP_NET_RAW)) { + ret = -EPERM; + goto release_qp; + } attr->qkey = cmd->base.qkey; + } if (cmd->base.attr_mask & IB_QP_RQ_PSN) attr->rq_psn = cmd->base.rq_psn; if (cmd->base.attr_mask & IB_QP_SQ_PSN) From patchwork Mon Jun 5 10:33:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267188 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07B9FC7EE2D for ; Mon, 5 Jun 2023 10:34:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230462AbjFEKeW (ORCPT ); Mon, 5 Jun 2023 06:34:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231293AbjFEKeQ (ORCPT ); Mon, 5 Jun 2023 06:34:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27EA21A7 for ; Mon, 5 Jun 2023 03:34:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8DFF26223C for ; Mon, 5 Jun 2023 10:34:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A5D9C433D2; Mon, 5 Jun 2023 10:34:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961241; bh=P6qTBjExfs6p45g3tbSPwwNmlQsHzBCSAozQgEWxuh8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LG+7n04IGKWe2f1P3Lbfg/LvtianhVA50L/q8dbzBLLqjzhyiSb8Eyvwuae3WUn2l DKkJNR4CH6GYT3c9R3jJgRP+f15hPRS1X6q7hj1YUVyF+n5yedqTTcRZRaVVRyAD3P yP5mWsSxfw17oxZLF6xzbBFKAvmK8J2e41OQZ9/qgiIzlB4JtSy2MMQruZwlrfAYGJ X5x+5HcnD3L2Z68bIv/OEMmDLt+E2sJyHZm85pDKsBLr/K93C24HgRkZbYOcu8iWen g1e6AQaj7egVEli2Cp641gZVWYW8erYqAS5gOmOgOeduNA5mqfej+GdalBtL7++rmb /hpMmSJTeOAyg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Yishai Hadas , linux-rdma@vger.kernel.org, Maor Gottlieb Subject: [PATCH rdma-rc 09/10] IB/uverbs: Fix to consider event queue closing also upon non-blocking mode Date: Mon, 5 Jun 2023 13:33:25 +0300 Message-Id: <97b00116a1e1e13f8dc4ec38a5ea81cf8c030210.1685960567.git.leon@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Fix ib_uverbs_event_read() to consider event queue closing also upon non-blocking mode. Once the queue is closed (e.g. hot-plug flow) all the existing events are cleaned-up as part of ib_uverbs_free_event_queue(). An application that uses the non-blocking FD mode should get -EIO in that case to let it knows that the device was removed already. Otherwise, it can loose the indication that the device was removed and won't recover. As part of that, refactor the code to have a single flow with regards to 'is_closed' for both blocking and non-blocking modes. Fixes: 14e23bd6d221 ("RDMA/core: Fix locking in ib_uverbs_event_read") Reviewed-by: Maor Gottlieb Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/uverbs_main.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index fbace69672ca..7c9c79c13941 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -222,8 +222,12 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, spin_lock_irq(&ev_queue->lock); while (list_empty(&ev_queue->event_list)) { - spin_unlock_irq(&ev_queue->lock); + if (ev_queue->is_closed) { + spin_unlock_irq(&ev_queue->lock); + return -EIO; + } + spin_unlock_irq(&ev_queue->lock); if (filp->f_flags & O_NONBLOCK) return -EAGAIN; @@ -233,12 +237,6 @@ static ssize_t ib_uverbs_event_read(struct ib_uverbs_event_queue *ev_queue, return -ERESTARTSYS; spin_lock_irq(&ev_queue->lock); - - /* If device was disassociated and no event exists set an error */ - if (list_empty(&ev_queue->event_list) && ev_queue->is_closed) { - spin_unlock_irq(&ev_queue->lock); - return -EIO; - } } event = list_entry(ev_queue->event_list.next, struct ib_uverbs_event, list); From patchwork Mon Jun 5 10:33:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13267189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78D1DC77B73 for ; Mon, 5 Jun 2023 10:34:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231529AbjFEKeY (ORCPT ); Mon, 5 Jun 2023 06:34:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231397AbjFEKeR (ORCPT ); Mon, 5 Jun 2023 06:34:17 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2CFC130 for ; Mon, 5 Jun 2023 03:34:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D3C266121E for ; Mon, 5 Jun 2023 10:34:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CEF52C433EF; Mon, 5 Jun 2023 10:34:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685961244; bh=xf1SuXnNMTP/UlB2X8LQfYbQXiQ3GHRJqJed9+/s9h4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QXsloKmZ4rlUFGeRotHPmOhbuNlqP2B5bPPXRWLLXOVeZfEkspXVzXN1xYioGYCeV KIfiMU51y2abKnGSHRv5QbzXJ4LaESWHlZtfBCfW64OGE7keRsIR/MVB6/ahUa2oRJ 9avXFLr/77t/J4O/5yE3AMvXcUHBDMa4Q7IX1O9nY85LeAZTdVmEQH55j+4kyHjd0V XQoY0ORUWlYld2p2A2dVagu5R/x02/mPxhiEHSihfyYJslu89KZ8411HfCQ8J0zHlQ x4YLsj/ZBcnNk5UIPvNvs7yTbDUbEPKVmEYhWBVRDs7Tk0NIJftjTtJkaP6bEPhTVv t65pT7jh1nI3w== From: Leon Romanovsky To: Jason Gunthorpe Cc: Mark Bloch , linux-rdma@vger.kernel.org, Maor Gottlieb Subject: [PATCH rdma-rc 10/10] RDMA/mlx5: Fix affinity assignment Date: Mon, 5 Jun 2023 13:33:26 +0300 Message-Id: <425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Mark Bloch The cited commit aimed to ensure that Virtual Functions (VFs) assign a queue affinity to a Queue Pair (QP) to distribute traffic when the LAG master creates a hardware LAG. If the affinity was set while the hardware was not in LAG, the firmware would ignore the affinity value. However, this commit unintentionally assigned an affinity to QPs on the LAG master's VPORT even if the RDMA device was not marked as LAG-enabled. In most cases, this was not an issue because when the hardware entered hardware LAG configuration, the RDMA device of the LAG master would be destroyed and a new one would be created, marked as LAG-enabled. The problem arises when a user configures Equal-Cost Multipath (ECMP). In ECMP mode, traffic can be directed to different physical ports based on the queue affinity, which is intended for use by VPORTS other than the E-Switch manager. ECMP mode is supported only if both E-Switch managers are in switchdev mode and the appropriate route is configured via IP. In this configuration, the RDMA device is not destroyed, and we retain the RDMA device that is not marked as LAG-enabled. To ensure correct behavior, Send Queues (SQs) opened by the E-Switch manager through verbs should be assigned strict affinity. This means they will only be able to communicate through the native physical port associated with the E-Switch manager. This will prevent the firmware from assigning affinity and will not allow the SQs to be remapped in case of failover. Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode") Reviewed-by: Maor Gottlieb Signed-off-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 +++ drivers/infiniband/hw/mlx5/qp.c | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 12 ------------ include/linux/mlx5/driver.h | 12 ++++++++++++ 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 2a2d2a356c41..e0ad75f7026f 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -1599,6 +1599,9 @@ static inline bool mlx5_ib_lag_should_assign_affinity(struct mlx5_ib_dev *dev) MLX5_CAP_PORT_SELECTION(dev->mdev, port_select_flow_table_bypass)) return 0; + if (mlx5_lag_is_lacp_owner(dev->mdev) && !dev->lag_active) + return 0; + return dev->lag_active || (MLX5_CAP_GEN(dev->mdev, num_lag_ports) > 1 && MLX5_CAP_GEN(dev->mdev, lag_tx_port_affinity)); diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index c4d04c05546a..f99e49499529 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1239,6 +1239,9 @@ static int create_raw_packet_qp_tis(struct mlx5_ib_dev *dev, MLX5_SET(create_tis_in, in, uid, to_mpd(pd)->uid); MLX5_SET(tisc, tisc, transport_domain, tdn); + if (!mlx5_ib_lag_should_assign_affinity(dev) && + mlx5_lag_is_lacp_owner(dev->mdev)) + MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1); if (qp->flags & IB_QP_CREATE_SOURCE_QPN) MLX5_SET(tisc, tisc, underlay_qpn, qp->underlay_qpn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 1d879374acaa..229520405d4a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -276,18 +276,6 @@ static inline bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev) return pci_num_vf(dev->pdev) ? true : false; } -static inline int mlx5_lag_is_lacp_owner(struct mlx5_core_dev *dev) -{ - /* LACP owner conditions: - * 1) Function is physical. - * 2) LAG is supported by FW. - * 3) LAG is managed by driver (currently the only option). - */ - return MLX5_CAP_GEN(dev, vport_group_manager) && - (MLX5_CAP_GEN(dev, num_lag_ports) > 1) && - MLX5_CAP_GEN(dev, lag_master); -} - int mlx5_rescan_drivers_locked(struct mlx5_core_dev *dev); static inline int mlx5_rescan_drivers(struct mlx5_core_dev *dev) { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e67c603d507b..1e9f5bb4882b 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1227,6 +1227,18 @@ static inline u16 mlx5_core_max_vfs(const struct mlx5_core_dev *dev) return dev->priv.sriov.max_vfs; } +static inline int mlx5_lag_is_lacp_owner(struct mlx5_core_dev *dev) +{ + /* LACP owner conditions: + * 1) Function is physical. + * 2) LAG is supported by FW. + * 3) LAG is managed by driver (currently the only option). + */ + return MLX5_CAP_GEN(dev, vport_group_manager) && + (MLX5_CAP_GEN(dev, num_lag_ports) > 1) && + MLX5_CAP_GEN(dev, lag_master); +} + static inline int mlx5_get_gid_table_len(u16 param) { if (param > 4) {