From patchwork Wed Dec 4 22:09:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tariq Toukan X-Patchwork-Id: 13894394 X-Patchwork-Delegate: kuba@kernel.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2068.outbound.protection.outlook.com [40.107.93.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 096E11A8F69; Wed, 4 Dec 2024 22:12:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.68 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733350341; cv=fail; b=iITFW5iZEoK7PK8yQ379XBuXZhJPrs/CDftCKFbBzsnr3n5at0XGzxxs6IQhDf7Qrym0VQToF3qRrOyg5m9L9ypfB+aURt4k2ngTopWWGVBKpksRbHbi5Ttxawi4xdSWxuXyPTeOjdtxvpLalXFukDgF974uVtKKYogH5Y4yN3M= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733350341; c=relaxed/simple; bh=pHPiHxkHVeAXSkJmkHECWEtJLRvdm2FGHKnjy3XcH74=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jDstnT7fn+TjePZRVucpchsmNw/ZTD7DHOuP/GeqpDYot7pBfCSOBnziydFgHZRDPmf0yMOOVNv4O4Xm+2bkv9wc6lEv0QORDcZbKr/9+p7q0e98l9VeQpo04L9wHWJMR08KbcDQRyI8vVRRjOaPtXxglchMK60nZPj90+SHNvo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Twp+kr12; arc=fail smtp.client-ip=40.107.93.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Twp+kr12" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RdbTelDx7Kl1cjyj+seasCLTDcTX66xoSN3tVOjESdZxxsXWDrFEDWl6onrfpDD/zm3k17MnvW1sfZU1mwjQfBETrWLOmnhpAyijfgpB+urV+aroJcdCnsBad4OoCjb77lvzebvHwqWx7cAugJ41jLTeA3H7H9yGyHEg68KVnudHj8SeQgFeT2h2+rBoYRnR96A2DIC9ZCkhrZq5mriUQU9xJevV3TqA3uYJG11EErI6Ao6Fu8JaMWfwpE8uoZvVDPeeuBvBYUIJjYY53ZqvMnLDVVgAKUT/Ct738D7B3czvQDUh0i458+RSXVZMN0nJOanCL2JfzuO+agOHp3PB/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QFQzAYTi2EezQBOcjrjkNtbQA0YlRS5M3RDhfwGiltw=; b=vAs3o6wHxhvc2uFqTmpHgxJnaCbG+9GT++zm9MimmChr9sW6r4dcoFnENBuMwqTZTsPX5jujwYK9sTmNGckHOGzaweJU1+T1FeDY+x4z/Mn8gxluVveCizd7x9cUk07KB8CX7co4gJRcq+HrZDN0USGLVB2ar8LYkuptXa+yvYQHsoaDlejO32xcJqYMBfDwklLG7qHElv1PLOYNjbVlUCDu7OnYcRAUPoyw2dbb5s8eOfsiCkP7vcDgLaHOna/UnHIgIiXH8WfJjsAr0SLPK6Q+tbt1gyzf25I118lHW9t+ePeYt4u5xhKuF2q2jW9jWnGbC8Ivuhjg09mplrFkeg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QFQzAYTi2EezQBOcjrjkNtbQA0YlRS5M3RDhfwGiltw=; b=Twp+kr12dn2uxPYTryLaYNS52fUEAmFUMYVtQhcCdYgf3g5QB4C3DKdoL9I9e0LbkM28/86OK/keslFf0v5zb7ccf12EgggSbu1378zNjx2EKAFG1+3REdAOLE7edtRmO1cVxaFow3aE6yPk9VAz7fJNExS85RyhyMo6xbIANDj09y6yxrL2n2qmkZWaAAU8wKStHzaFbF0he8Hyi5PHbUAcFvB1fE1u4N7F5JgVWaNoO13RSk7+3zEqHLwaB1Y7uPrMj0UPGSntcjhgB8ud0NHGG8CyuVtNA7RrKSsutuG8uYkcyknsbPs/Z0wzWkoXF+9Zxmtc2MZctADUr8i3LA== Received: from BY5PR13CA0010.namprd13.prod.outlook.com (2603:10b6:a03:180::23) by DM4PR12MB5964.namprd12.prod.outlook.com (2603:10b6:8:6b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8230.11; Wed, 4 Dec 2024 22:12:05 +0000 Received: from SJ5PEPF000001D1.namprd05.prod.outlook.com (2603:10b6:a03:180:cafe::98) by BY5PR13CA0010.outlook.office365.com (2603:10b6:a03:180::23) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8230.8 via Frontend Transport; Wed, 4 Dec 2024 22:12:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SJ5PEPF000001D1.mail.protection.outlook.com (10.167.242.53) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8230.7 via Frontend Transport; Wed, 4 Dec 2024 22:12:04 +0000 Received: from rnnvmail203.nvidia.com (10.129.68.9) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 4 Dec 2024 14:11:43 -0800 Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail203.nvidia.com (10.129.68.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 4 Dec 2024 14:11:42 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.9) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Wed, 4 Dec 2024 14:11:39 -0800 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet , "Andrew Lunn" , Leon Romanovsky CC: , Saeed Mahameed , Gal Pressman , , Carolina Jubran , Cosmin Ratiu , Tariq Toukan Subject: [PATCH net-next V5 10/11] net/mlx5: Add traffic class scheduling support for vport QoS Date: Thu, 5 Dec 2024 00:09:30 +0200 Message-ID: <20241204220931.254964-11-tariqt@nvidia.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20241204220931.254964-1-tariqt@nvidia.com> References: <20241204220931.254964-1-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D1:EE_|DM4PR12MB5964:EE_ X-MS-Office365-Filtering-Correlation-Id: b9834cd7-0269-4382-6285-08dd14b0afa5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info: uqrb0smiM3LbaFU+sawNYFX+mXCbexOQcDvpyQGfSHmKgG2cNd7ublgP8sOVPLXFW9ShEfPLcVR+8nHmhUsBep7ez8BSnIXcah/2pAdu2M2VIq6wsFI+Tn2hj7kpa1fmRyaJFfdRzTfYybL/mEq7eZOPKTVZDVKNgF9/pXOJKQ6zPRD13d2U/DIcTcCipkdI2lW+uIdPwvehmjgngTIWmimxEsBskbqj06eXaFlxM7vgRc5LSTv5By7ZCbBYhfUMfT6jl4TLmpBnoM9g8bKJR/uV5nbwJPFEmGTLl/yEMySwouz8xkeUh+kNaPPRYDLZKcOujfu/CICHegt7F+275wFO7uEXct0Z8B7oNiAJ+Mxa60fDC3OgJpKDykh8aZksugnR54sWKLKXGTVkhdAFSr1hASa350+7Z+X8DG6pbLjklr4uTQxNO6DWWpoSW36ketUepy5n8WRiWofj3sK7LFV8iJIfp9YgIexADGZlU/ITnkuR/nH/vagZGg3/npckLio3f3uN7cQuEcFy3cRdrQyi0+tE1bYP8Mm0No4j2UFPmVI5PdeCNEW1ZrN23h7IIlNijwKs0KxR2Cn07a9mIoNvvXpW7P0CIFC0v79fsJAd6SBPyo5l5gYydBZaVwx89ods5GD/W1GPgATgd3lZiZiisooJvSFufBQYD9KIUuYWxfn5NkML/yH/8sGdqycqNFLsS9u3+y8cde1NPm1YAMSLv9p/cTZS/MSIwM0wIN8DjyOviJV366C90ByIspQL9iq0QI6MF7wYLeCgdaou5oAirZXXOC3jk6+NTCRoPhj5pKvgo8xNF+GSf6xefhKOfzssdYR2QmNCC6rkiL/kGeYTTbINj3tZLr5dmZ+YudvEOoo+RpxYIe0GeAGyxVGBkyECfd/BIa/150dICBsqTXHhgRcdq3HUenwmoG92uAkhlHP28B9ap5tDzh3HS6uSa1BLWM+AMZ6t5qPsOfcWWRoCoHDhmt+JpZItF3fBi1VeXDV1FGCYu89VKLYoJlQGgoplObWFW+MZ73Vwjyp3ZeVjqkxBfEtn2RmfB39p6/T/WCIWjPcS9b6+bGD75lbTpsbOP82D4FfMWlD6K2iOTfnkuTGFrn2xocXBMGyIv1IN9dSzd8PDMXHFl6gzbhdNJGWatWJuJrm2U02aJUtoophNVhVgU7yPjdM5pRrzTZvUlMENxVre/yYU1VBtnk1LcLTwi3g1BBGEqbKYTd/WIvPM/YxlJFDWgDXODidkbvEtaG9iAyxScO9sCwffe99aaakwUMaGDMe1uLY5zhfoc0VVgirN+uONeAY+9Azq9RDimm1lhFPUElpZgblZWyydahaJdvalYTwzLMgQ7rAyTvssGba0EzSDhyuPu6AHWs3zVUElIc+z5qGIHc/ls/9RstAjCQpIi0NHftuzCT2rAMVF6KgdhkKbKafJE2NXf4SKP7ZEKUl7x6EXaHslm313 X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Dec 2024 22:12:04.8723 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b9834cd7-0269-4382-6285-08dd14b0afa5 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D1.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5964 X-Patchwork-Delegate: kuba@kernel.org From: Carolina Jubran Introduce support for traffic class (TC) scheduling on vports by allowing the vport to own multiple TC scheduling nodes. This patch enables more granular control of QoS by defining three distinct QoS states for vports, each providing unique scheduling behavior: 1. Regular QoS: The `sched_node` represents the vport directly, handling QoS as a single scheduling entity. 2. TC QoS on the vport: The `sched_node` acts as a TC arbiter, enabling TC scheduling directly on the vport. 3. TC QoS on the parent node: The `sched_node` functions as a rate limiter, with TC arbitration enabled at the parent level, associating multiple scheduling nodes with each vport. Key changes include: - Added support for new scheduling elements, vport traffic class and rate limiter. - New helper functions for creating, destroying, and restoring vport TC scheduling nodes, handling transitions between regular QoS and TC arbitration states. - Updated `esw_qos_vport_enable()` and `esw_qos_vport_disable()` to support both regular QoS and TC arbitration states, ensuring consistent transitions between scheduling modes. - Introduced a `sched_nodes` array under `vport->qos` to store multiple TC scheduling nodes per vport, enabling finer control over per-TC QoS. - Enhanced `esw_qos_vport_update_parent()` to handle transitions between the three QoS states based on the current and new parent node types. This patch lays the groundwork for future support for configuring tc-bw on vports. Although the infrastructure is in place, full support for tc-bw is not yet implemented; attempts to set tc-bw on vports will return `-EOPNOTSUPP`. No functional changes are introduced at this stage. Signed-off-by: Carolina Jubran Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 360 +++++++++++++++++- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 13 +- 2 files changed, 352 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c index b17c3a82d175..afb00deaae16 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c @@ -65,12 +65,16 @@ enum sched_node_type { SCHED_NODE_TYPE_VPORTS_TSAR, SCHED_NODE_TYPE_VPORT, SCHED_NODE_TYPE_TC_ARBITER_TSAR, + SCHED_NODE_TYPE_RATE_LIMITER, + SCHED_NODE_TYPE_VPORT_TC, }; static const char * const sched_node_type_str[] = { [SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR", [SCHED_NODE_TYPE_VPORT] = "vport", [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR", + [SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter", + [SCHED_NODE_TYPE_VPORT_TC] = "vport TC", }; struct mlx5_esw_sched_node { @@ -92,6 +96,8 @@ struct mlx5_esw_sched_node { struct list_head children; /* Valid only if this node is associated with a vport. */ struct mlx5_vport *vport; + /* Valid only when this node represents a traffic class. */ + u8 tc; }; static int esw_qos_num_tcs(struct mlx5_core_dev *dev) @@ -121,6 +127,14 @@ esw_qos_nodes_set_parent(struct list_head *nodes, struct mlx5_esw_sched_node *pa void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) { + if (vport->qos.sched_nodes) { + int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); + + for (i = 0; i < num_tcs; i++) + kfree(vport->qos.sched_nodes[i]); + kfree(vport->qos.sched_nodes); + } + kfree(vport->qos.sched_node); memset(&vport->qos, 0, sizeof(vport->qos)); } @@ -145,11 +159,17 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport) static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op) { switch (node->type) { + case SCHED_NODE_TYPE_VPORT_TC: + esw_warn(node->esw->dev, + "E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n", + op, sched_node_type_str[node->type], node->vport->vport, node->tc, err); + break; case SCHED_NODE_TYPE_VPORT: esw_warn(node->esw->dev, "E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n", op, sched_node_type_str[node->type], node->vport->vport, err); break; + case SCHED_NODE_TYPE_RATE_LIMITER: case SCHED_NODE_TYPE_TC_ARBITER_TSAR: case SCHED_NODE_TYPE_VPORTS_TSAR: esw_warn(node->esw->dev, @@ -243,6 +263,23 @@ static int esw_qos_sched_elem_config(struct mlx5_esw_sched_node *node, u32 max_r return 0; } +static int esw_qos_create_rate_limit_element(struct mlx5_esw_sched_node *node, + struct netlink_ext_ack *extack) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; + + if (!mlx5_qos_element_type_supported(node->esw->dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT, + SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, node->max_rate); + MLX5_SET(scheduling_context, sched_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT); + + return esw_qos_node_create_sched_element(node, sched_ctx, extack); +} + static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, struct mlx5_esw_sched_node *parent) { @@ -379,6 +416,31 @@ static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_ return esw_qos_node_create_sched_element(vport_node, sched_ctx, extack); } +static int esw_qos_vport_tc_create_sched_element(struct mlx5_esw_sched_node *vport_tc_node, + u32 rate_limit_elem_ix, + struct netlink_ext_ack *extack) +{ + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; + struct mlx5_core_dev *dev = vport_tc_node->esw->dev; + void *attr; + + if (!mlx5_qos_element_type_supported(dev, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC, + SCHEDULING_HIERARCHY_E_SWITCH)) + return -EOPNOTSUPP; + + MLX5_SET(scheduling_context, sched_ctx, element_type, + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC); + attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); + MLX5_SET(vport_tc_element, attr, vport_number, vport_tc_node->vport->vport); + MLX5_SET(vport_tc_element, attr, traffic_class, vport_tc_node->tc); + MLX5_SET(scheduling_context, sched_ctx, max_bw_obj_id, rate_limit_elem_ix); + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_tc_node->parent->ix); + MLX5_SET(scheduling_context, sched_ctx, bw_share, vport_tc_node->bw_share); + + return esw_qos_node_create_sched_element(vport_tc_node, sched_ctx, extack); +} + static struct mlx5_esw_sched_node * __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type type, struct mlx5_esw_sched_node *parent) @@ -575,12 +637,169 @@ static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node, return -EOPNOTSUPP; } +static int esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport, + u32 rate_limit_elem_ix, + struct mlx5_esw_sched_node *vports_tc_node, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + struct mlx5_esw_sched_node *vport_tc_node; + u8 tc = vports_tc_node->tc; + int err; + + vport_tc_node = __esw_qos_alloc_node(vport_node->esw, 0, SCHED_NODE_TYPE_VPORT_TC, + vports_tc_node); + if (!vport_tc_node) + return -ENOMEM; + + vport_tc_node->min_rate = vport_node->min_rate; + vport_tc_node->tc = tc; + vport_tc_node->vport = vport; + err = esw_qos_vport_tc_create_sched_element(vport_tc_node, rate_limit_elem_ix, extack); + if (err) + goto err_out; + + vport->qos.sched_nodes[tc] = vport_tc_node; + + return 0; +err_out: + __esw_qos_free_node(vport_tc_node); + return err; +} + +static void esw_qos_destroy_vport_tc_sched_elements(struct mlx5_vport *vport, + struct netlink_ext_ack *extack) +{ + int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); + + for (i = 0; i < num_tcs; i++) { + if (vport->qos.sched_nodes[i]) + __esw_qos_destroy_node(vport->qos.sched_nodes[i], extack); + } + + kfree(vport->qos.sched_nodes); + vport->qos.sched_nodes = NULL; +} + +static int esw_qos_create_vport_tc_sched_elements(struct mlx5_vport *vport, + enum sched_node_type type, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + struct mlx5_esw_sched_node *tc_arbiter_node, *vports_tc_node; + int err, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); + u32 rate_limit_elem_ix; + + vport->qos.sched_nodes = kcalloc(num_tcs, sizeof(struct mlx5_esw_sched_node *), GFP_KERNEL); + if (!vport->qos.sched_nodes) { + NL_SET_ERR_MSG_MOD(extack, "Allocating the vport TC scheduling elements failed."); + return -ENOMEM; + } + + rate_limit_elem_ix = type == SCHED_NODE_TYPE_RATE_LIMITER ? vport_node->ix : 0; + tc_arbiter_node = type == SCHED_NODE_TYPE_RATE_LIMITER ? vport_node->parent : vport_node; + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { + err = esw_qos_create_vport_tc_sched_node(vport, rate_limit_elem_ix, vports_tc_node, + extack); + if (err) + goto err_create_vport_tc; + } + + return 0; + +err_create_vport_tc: + esw_qos_destroy_vport_tc_sched_elements(vport, NULL); + + return err; +} + +static int esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + int err; + + if (type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && + MLX5_CAP_QOS(vport_node->esw->dev, log_esw_max_sched_depth) < 2) { + NL_SET_ERR_MSG_MOD(extack, "Setting up TC Arbiter for a vport is not supported."); + return -EOPNOTSUPP; + } + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + + if (type == SCHED_NODE_TYPE_RATE_LIMITER) + err = esw_qos_create_rate_limit_element(vport_node, extack); + else + err = esw_qos_tc_arbiter_scheduling_setup(vport_node, extack); + if (err) + return err; + + /* Rate limiters impact multiple nodes not directly connected to them + * and are not direct members of the QoS hierarchy. + * Unlink it from the parent to reflect that. + */ + if (type == SCHED_NODE_TYPE_RATE_LIMITER) + list_del_init(&vport_node->entry); + + err = esw_qos_create_vport_tc_sched_elements(vport, type, extack); + if (err) + goto err_sched_nodes; + + return 0; + +err_sched_nodes: + if (type == SCHED_NODE_TYPE_RATE_LIMITER) { + esw_qos_node_destroy_sched_element(vport_node, NULL); + list_add_tail(&vport_node->entry, &vport_node->parent->children); + } else { + esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL); + } + return err; +} + +static void esw_qos_vport_tc_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + enum sched_node_type curr_type = vport_node->type; + + esw_qos_destroy_vport_tc_sched_elements(vport, extack); + + if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER) + esw_qos_node_destroy_sched_element(vport_node, extack); + else + esw_qos_tc_arbiter_scheduling_teardown(vport_node, extack); +} + +static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, u32 min_rate, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; + int err, i, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); + + for (i = 0; i < num_tcs; i++) { + err = esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], min_rate, extack); + if (err) + goto err_out; + } + vport_node->min_rate = min_rate; + + return 0; +err_out: + for (--i; i >= 0; i--) + esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], vport_node->min_rate, extack); + return err; +} + static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) { struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; struct mlx5_esw_sched_node *parent = vport_node->parent; + enum sched_node_type curr_type = vport_node->type; - esw_qos_node_destroy_sched_element(vport_node, extack); + if (curr_type == SCHED_NODE_TYPE_VPORT) + esw_qos_node_destroy_sched_element(vport_node, extack); + else + esw_qos_vport_tc_disable(vport, extack); vport_node->bw_share = 0; list_del_init(&vport_node->entry); @@ -589,7 +808,8 @@ static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_a trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport); } -static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, +static int esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_type type, + struct mlx5_esw_sched_node *parent, struct netlink_ext_ack *extack) { int err; @@ -597,10 +817,14 @@ static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_ esw_assert_qos_lock_held(vport->dev->priv.eswitch); esw_qos_node_set_parent(vport->qos.sched_node, parent); - err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack); + if (type == SCHED_NODE_TYPE_VPORT) + err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack); + else + err = esw_qos_vport_tc_enable(vport, type, extack); if (err) return err; + vport->qos.sched_node->type = type; esw_qos_normalize_min_rate(parent->esw, parent, extack); return 0; @@ -628,7 +852,7 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t sched_node->min_rate = min_rate; sched_node->vport = vport; vport->qos.sched_node = sched_node; - err = esw_qos_vport_enable(vport, parent, extack); + err = esw_qos_vport_enable(vport, type, parent, extack); if (err) esw_qos_put(esw); @@ -680,6 +904,8 @@ static int mlx5_esw_qos_set_vport_min_rate(struct mlx5_vport *vport, u32 min_rat if (!vport_node) return mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, NULL, 0, min_rate, extack); + else if (vport_node->type == SCHED_NODE_TYPE_RATE_LIMITER) + return esw_qos_set_vport_tcs_min_rate(vport, min_rate, extack); else return esw_qos_set_node_min_rate(vport_node, min_rate, extack); } @@ -712,12 +938,59 @@ bool mlx5_esw_qos_get_vport_rate(struct mlx5_vport *vport, u32 *max_rate, u32 *m return enabled; } +static int esw_qos_vport_tc_check_type(enum sched_node_type curr_type, + enum sched_node_type new_type, + struct netlink_ext_ack *extack) +{ + if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && + new_type == SCHED_NODE_TYPE_RATE_LIMITER) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot switch from vport-level TC arbitration to node-level TC arbitration"); + return -EOPNOTSUPP; + } + + if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER && + new_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot switch from node-level TC arbitration to vport-level TC arbitration"); + return -EOPNOTSUPP; + } + + return 0; +} + +static int esw_qos_vport_update(struct mlx5_vport *vport, enum sched_node_type type, + struct mlx5_esw_sched_node *parent, + struct netlink_ext_ack *extack) +{ + struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent; + enum sched_node_type curr_type = vport->qos.sched_node->type; + int err; + + esw_assert_qos_lock_held(vport->dev->priv.eswitch); + parent = parent ?: curr_parent; + if (curr_type == type && curr_parent == parent) + return 0; + + err = esw_qos_vport_tc_check_type(curr_type, type, extack); + if (err) + return err; + + esw_qos_vport_disable(vport, extack); + + err = esw_qos_vport_enable(vport, type, parent, extack); + if (err) + esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); + + return err; +} + static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, struct netlink_ext_ack *extack) { struct mlx5_eswitch *esw = vport->dev->priv.eswitch; struct mlx5_esw_sched_node *curr_parent; - int err; + enum sched_node_type type; esw_assert_qos_lock_held(esw); curr_parent = vport->qos.sched_node->parent; @@ -725,16 +998,17 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw if (curr_parent == parent) return 0; - esw_qos_vport_disable(vport, extack); - - err = esw_qos_vport_enable(vport, parent, extack); - if (err) { - if (esw_qos_vport_enable(vport, curr_parent, NULL)) - esw_warn(parent->esw->dev, "vport restore QoS failed (vport=%d)\n", - vport->vport); - } + /* Set vport QoS type based on parent node type if different from default QoS; + * otherwise, use the vport's current QoS type. + */ + if (parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type = SCHED_NODE_TYPE_RATE_LIMITER; + else if (curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) + type = SCHED_NODE_TYPE_VPORT; + else + type = vport->qos.sched_node->type; - return err; + return esw_qos_vport_update(vport, type, parent, extack); } static void esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_node, @@ -1025,6 +1299,14 @@ static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, u32 *tc return true; } +static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vport, u32 *tc_bw) +{ + struct mlx5_eswitch *esw = vport->qos.sched_node ? + vport->qos.sched_node->parent->esw : vport->dev->priv.eswitch; + + return esw_qos_validate_unsupported_tc_bw(esw, tc_bw); +} + static bool esw_qos_tc_bw_disabled(u32 *tc_bw) { int i; @@ -1098,8 +1380,44 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void * int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, void *priv, u32 *tc_bw, struct netlink_ext_ack *extack) { - NL_SET_ERR_MSG_MOD(extack, "TC bandwidth shares are not supported on leafs"); - return -EOPNOTSUPP; + struct mlx5_esw_sched_node *vport_node; + struct mlx5_vport *vport = priv; + struct mlx5_eswitch *esw; + bool disable; + int err = 0; + + esw = vport->dev->priv.eswitch; + if (!mlx5_esw_allowed(esw)) + return -EPERM; + + disable = esw_qos_tc_bw_disabled(tc_bw); + esw_qos_lock(esw); + + if (!esw_qos_vport_validate_unsupported_tc_bw(vport, tc_bw)) { + NL_SET_ERR_MSG_MOD(extack, "E-Switch traffic classes number is not supported"); + err = -EOPNOTSUPP; + goto unlock; + } + + vport_node = vport->qos.sched_node; + if (disable && !vport_node) + goto unlock; + + if (disable && vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { + err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, NULL, extack); + goto unlock; + } + + if (!vport_node) { + err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, 0, 0, + extack); + vport_node = vport->qos.sched_node; + } else { + err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, extack); + } +unlock: + esw_qos_unlock(esw); + return err; } int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *priv, @@ -1218,10 +1536,14 @@ int mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_s } esw_qos_lock(esw); - if (!vport->qos.sched_node && parent) - err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, parent, 0, 0, extack); - else if (vport->qos.sched_node) + if (!vport->qos.sched_node && parent) { + enum sched_node_type type = parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR ? + SCHED_NODE_TYPE_RATE_LIMITER : SCHED_NODE_TYPE_VPORT; + + err = mlx5_esw_qos_vport_enable(vport, type, parent, 0, 0, extack); + } else if (vport->qos.sched_node) { err = esw_qos_vport_update_parent(vport, parent, extack); + } esw_qos_unlock(esw); return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index a83d41121db6..9b0be25631df 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -212,10 +212,19 @@ struct mlx5_vport { struct mlx5_vport_info info; - /* Protected with the E-Switch qos domain lock. */ + /* Protected with the E-Switch qos domain lock. The Vport QoS can + * either be disabled (sched_node is NULL) or in one of three states: + * 1. Regular QoS (sched_node is a vport node). + * 2. TC QoS enabled on the vport (sched_node is a TC arbiter). + * 3. TC QoS enabled on the vport's parent node + * (sched_node is a rate limit node). + * When TC is enabled in either mode, the vport owns vport TC scheduling nodes. + */ struct { - /* Vport scheduling element node. */ + /* Vport scheduling node. */ struct mlx5_esw_sched_node *sched_node; + /* Array of vport traffic class scheduling nodes. */ + struct mlx5_esw_sched_node **sched_nodes; } qos; u16 vport;