diff mbox series

[net,1/8] net/mlx5: Lag, do bond only if slaves agree on roce state

Message ID 20240522192659.840796-2-tariqt@nvidia.com (mailing list archive)
State Accepted
Commit 51ef9305b8f40946d65c40368ffb4c14636d369a
Delegated to: Netdev Maintainers
Headers show
Series mlx5 fixes 24-05-22 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 905 this patch: 905
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: mbloch@nvidia.com shayd@nvidia.com linux-rdma@vger.kernel.org
netdev/build_clang success Errors and warnings before: 909 this patch: 909
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 909 this patch: 909
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-23--12-00 (tests: 1038)

Commit Message

Tariq Toukan May 22, 2024, 7:26 p.m. UTC
From: Maher Sanalla <msanalla@nvidia.com>

Currently, the driver does not enforce that lag bond slaves must have
matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts
to enable roce on all vports of the bond slaves, causing the following
syndrome when one slave has no roce fw support:

mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0)
failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22)

Thus, create HW lag only if bond's slaves agree on roce state,
either all slaves have roce support resulting in a roce lag bond,
or none do, resulting in a raw eth bond.

Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Simon Horman May 23, 2024, 9:45 a.m. UTC | #1
On Wed, May 22, 2024 at 10:26:52PM +0300, Tariq Toukan wrote:
> From: Maher Sanalla <msanalla@nvidia.com>
> 
> Currently, the driver does not enforce that lag bond slaves must have
> matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts
> to enable roce on all vports of the bond slaves, causing the following
> syndrome when one slave has no roce fw support:
> 
> mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0)
> failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22)
> 
> Thus, create HW lag only if bond's slaves agree on roce state,
> either all slaves have roce support resulting in a roce lag bond,
> or none do, resulting in a raw eth bond.
> 
> Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
> Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index f7f0476a4a58..d0871c46b8c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -719,6 +719,7 @@  bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 	struct mlx5_core_dev *dev;
 	u8 mode;
 #endif
+	bool roce_support;
 	int i;
 
 	for (i = 0; i < ldev->ports; i++)
@@ -743,6 +744,11 @@  bool mlx5_lag_check_prereq(struct mlx5_lag *ldev)
 		if (mlx5_sriov_is_enabled(ldev->pf[i].dev))
 			return false;
 #endif
+	roce_support = mlx5_get_roce_state(ldev->pf[MLX5_LAG_P1].dev);
+	for (i = 1; i < ldev->ports; i++)
+		if (mlx5_get_roce_state(ldev->pf[i].dev) != roce_support)
+			return false;
+
 	return true;
 }
 
@@ -910,8 +916,10 @@  static void mlx5_do_bond(struct mlx5_lag *ldev)
 		} else if (roce_lag) {
 			dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
 			mlx5_rescan_drivers_locked(dev0);
-			for (i = 1; i < ldev->ports; i++)
-				mlx5_nic_vport_enable_roce(ldev->pf[i].dev);
+			for (i = 1; i < ldev->ports; i++) {
+				if (mlx5_get_roce_state(ldev->pf[i].dev))
+					mlx5_nic_vport_enable_roce(ldev->pf[i].dev);
+			}
 		} else if (shared_fdb) {
 			int i;