diff mbox series

[net,4/4] net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only

Message ID 20241220081505.1286093-5-tariqt@nvidia.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series mlx5 misc fixes 2024-12-20 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers fail 1 blamed authors not CCed: roid@nvidia.com; 2 maintainers not CCed: linux-rdma@vger.kernel.org roid@nvidia.com
netdev/build_clang success Errors and warnings before: 258 this patch: 258
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 61 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-12-20--09-00 (tests: 879)

Commit Message

Tariq Toukan Dec. 20, 2024, 8:15 a.m. UTC
From: Jianbo Liu <jianbol@nvidia.com>

In the cited commit, when changing from switchdev to legacy mode,
uplink representor's netdev is kept, and its profile is replaced with
nic profile, so netdev is detached from old profile, then attach to
new profile.

During profile change, the hardware resources allocated by the old
profile will be cleaned up. However, the cleanup is relying on the
related kernel modules. And they may need to flush themselves first,
which is triggered by netdev events, for example, NETDEV_UNREGISTER.
However, netdev is kept, or netdev_register is called after the
cleanup, which may cause troubles because the resources are still
referred by kernel modules.

The same process applies to all the caes when uplink is leaving
switchdev mode, including devlink eswitch mode set legacy, driver
unload and devlink reload. For the first one, it can be blocked and
returns failure to users, whenever possible. But it's hard for the
others. Besides, the attachment to nic profile is unnecessary as the
netdev will be unregistered anyway for such cases.

So in this patch, the original behavior is kept only for devlink
eswitch set mode legacy. For the others, moves netdev unregistration
before the profile change.

Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 19 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 15 +++++++++++++++
 .../mellanox/mlx5/core/eswitch_offloads.c     |  2 ++
 include/linux/mlx5/driver.h                   |  1 +
 4 files changed, 35 insertions(+), 2 deletions(-)

Comments

Przemek Kitszel Dec. 20, 2024, 8:48 a.m. UTC | #1
On 12/20/24 09:15, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> In the cited commit, when changing from switchdev to legacy mode,
> uplink representor's netdev is kept, and its profile is replaced with
> nic profile, so netdev is detached from old profile, then attach to
> new profile.
> 
> During profile change, the hardware resources allocated by the old
> profile will be cleaned up. However, the cleanup is relying on the
> related kernel modules. And they may need to flush themselves first,
> which is triggered by netdev events, for example, NETDEV_UNREGISTER.
> However, netdev is kept, or netdev_register is called after the
> cleanup, which may cause troubles because the resources are still
> referred by kernel modules.
> 
> The same process applies to all the caes when uplink is leaving

case

> switchdev mode, including devlink eswitch mode set legacy, driver
> unload and devlink reload. For the first one, it can be blocked and
> returns failure to users, whenever possible. But it's hard for the
> others. Besides, the attachment to nic profile is unnecessary as the
> netdev will be unregistered anyway for such cases.
> 
> So in this patch, the original behavior is kept only for devlink
> eswitch set mode legacy. For the others, moves netdev unregistration
> before the profile change.
> 
> Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/en_main.c | 19 +++++++++++++++++--
>   .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 15 +++++++++++++++
>   .../mellanox/mlx5/core/eswitch_offloads.c     |  2 ++
>   include/linux/mlx5/driver.h                   |  1 +
>   4 files changed, 35 insertions(+), 2 deletions(-)
> 

sorry for nitpick-only review
I didn't spotted anything bad in the logic through the series OTOH

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index dd16d73000c3..0ec17c276bdd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -6542,8 +6542,23 @@ static void _mlx5e_remove(struct auxiliary_device *adev)
>   
>   	mlx5_core_uplink_netdev_set(mdev, NULL);
>   	mlx5e_dcbnl_delete_app(priv);
> -	unregister_netdev(priv->netdev);
> -	_mlx5e_suspend(adev, false);
> +	/* When unload driver, the netdev is in registered state

/*
  * Netdev dropped the special comment allowance rule,
  * now you have to put one line almost blank at the front.
  */

> +	 * if it's from legacy mode. If from switchdev mode, it
> +	 * is already unregistered before changing to NIC profile.
> +	 */
> +	if (priv->netdev->reg_state == NETREG_REGISTERED) {
> +		unregister_netdev(priv->netdev);
> +		_mlx5e_suspend(adev, false);
> +	} else {
> +		struct mlx5_core_dev *pos;
> +		int i;
> +
> +		if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))

you have more than one statement/expression inside the if,
so you must wrap with braces

> +			mlx5_sd_for_each_dev(i, mdev, pos)
> +				mlx5e_destroy_mdev_resources(pos);
> +		else
> +			_mlx5e_suspend(adev, true);
> +	}
>   	/* Avoid cleanup if profile rollback failed. */
>   	if (priv->profile)
>   		priv->profile->cleanup(priv);
Jakub Kicinski Dec. 20, 2024, 2:46 p.m. UTC | #2
On Fri, 20 Dec 2024 09:48:11 +0100 Przemek Kitszel wrote:
> >   	mlx5_core_uplink_netdev_set(mdev, NULL);
> >   	mlx5e_dcbnl_delete_app(priv);
> > -	unregister_netdev(priv->netdev);
> > -	_mlx5e_suspend(adev, false);
> > +	/* When unload driver, the netdev is in registered state  
> 
> /*
>   * Netdev dropped the special comment allowance rule,
>   * now you have to put one line almost blank at the front.
>   */

Incorrect, we still prefer the old comment style, we just give a pass 
now to people who have a strong preference the opposite way.

> > +	 * if it's from legacy mode. If from switchdev mode, it
> > +	 * is already unregistered before changing to NIC profile.
> > +	 */
> > +	if (priv->netdev->reg_state == NETREG_REGISTERED) {
> > +		unregister_netdev(priv->netdev);
> > +		_mlx5e_suspend(adev, false);
> > +	} else {
> > +		struct mlx5_core_dev *pos;
> > +		int i;
> > +
> > +		if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))  
> 
> you have more than one statement/expression inside the if,
> so you must wrap with braces

I'm not aware of that as a hard rule either.
Przemek Kitszel Dec. 20, 2024, 3:10 p.m. UTC | #3
On 12/20/24 15:46, Jakub Kicinski wrote:
> On Fri, 20 Dec 2024 09:48:11 +0100 Przemek Kitszel wrote:
>>>    	mlx5_core_uplink_netdev_set(mdev, NULL);
>>>    	mlx5e_dcbnl_delete_app(priv);
>>> -	unregister_netdev(priv->netdev);
>>> -	_mlx5e_suspend(adev, false);
>>> +	/* When unload driver, the netdev is in registered state
>>
>> /*
>>    * Netdev dropped the special comment allowance rule,
>>    * now you have to put one line almost blank at the front.
>>    */
> 
> Incorrect, we still prefer the old comment style, we just give a pass
> now to people who have a strong preference the opposite way.

good to know, I will pass it down to my folks

> 
>>> +	 * if it's from legacy mode. If from switchdev mode, it
>>> +	 * is already unregistered before changing to NIC profile.
>>> +	 */
>>> +	if (priv->netdev->reg_state == NETREG_REGISTERED) {
>>> +		unregister_netdev(priv->netdev);
>>> +		_mlx5e_suspend(adev, false);
>>> +	} else {
>>> +		struct mlx5_core_dev *pos;
>>> +		int i;
>>> +
>>> +		if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))
>>
>> you have more than one statement/expression inside the if,
>> so you must wrap with braces
> 
> I'm not aware of that as a hard rule either.

[1] says
"Also, use braces when a loop contains more than a single simple statement:"

And I see that here we have and if() instead of a loop, but I believe
that the intent of "the rule" (not sure how hard) was to pet the brace
in such cases.
To be clear, I don't want to stop an otherwise good PR just for this!

[1] 
https://www.kernel.org/doc/html/latest/process/coding-style.html#placing-braces-and-spaces
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index dd16d73000c3..0ec17c276bdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -6542,8 +6542,23 @@  static void _mlx5e_remove(struct auxiliary_device *adev)
 
 	mlx5_core_uplink_netdev_set(mdev, NULL);
 	mlx5e_dcbnl_delete_app(priv);
-	unregister_netdev(priv->netdev);
-	_mlx5e_suspend(adev, false);
+	/* When unload driver, the netdev is in registered state
+	 * if it's from legacy mode. If from switchdev mode, it
+	 * is already unregistered before changing to NIC profile.
+	 */
+	if (priv->netdev->reg_state == NETREG_REGISTERED) {
+		unregister_netdev(priv->netdev);
+		_mlx5e_suspend(adev, false);
+	} else {
+		struct mlx5_core_dev *pos;
+		int i;
+
+		if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))
+			mlx5_sd_for_each_dev(i, mdev, pos)
+				mlx5e_destroy_mdev_resources(pos);
+		else
+			_mlx5e_suspend(adev, true);
+	}
 	/* Avoid cleanup if profile rollback failed. */
 	if (priv->profile)
 		priv->profile->cleanup(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 554f9cb5b53f..fdff9fd8a89e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1509,6 +1509,21 @@  mlx5e_vport_uplink_rep_unload(struct mlx5e_rep_priv *rpriv)
 
 	priv = netdev_priv(netdev);
 
+	/* This bit is set when using devlink to change eswitch mode from
+	 * switchdev to legacy. As need to keep uplink netdev ifindex, we
+	 * detach uplink representor profile and attach NIC profile only.
+	 * The netdev will be unregistered later when unload NIC auxiliary
+	 * driver for this case.
+	 * We explicitly block devlink eswitch mode change if any IPSec rules
+	 * offloaded, but can't block other cases, such as driver unload
+	 * and devlink reload. We have to unregister netdev before profile
+	 * change for those cases. This is to avoid resource leak because
+	 * the offloaded rules don't have the chance to be unoffloaded before
+	 * cleanup which is triggered by detach uplink representor profile.
+	 */
+	if (!(priv->mdev->priv.flags & MLX5_PRIV_FLAGS_SWITCH_LEGACY))
+		unregister_netdev(netdev);
+
 	mlx5e_netdev_attach_nic_profile(priv);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 40359f320724..06076dd9ec64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3777,6 +3777,8 @@  int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	esw->eswitch_operation_in_progress = true;
 	up_write(&esw->mode_lock);
 
+	if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
+		esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
 	mlx5_eswitch_disable_locked(esw);
 	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
 		if (mlx5_devlink_trap_get_num_active(esw->dev)) {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index fc7e6153b73d..8f5991168ccd 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -524,6 +524,7 @@  enum {
 	 * creation/deletion on drivers rescan. Unset during device attach.
 	 */
 	MLX5_PRIV_FLAGS_DETACH = 1 << 2,
+	MLX5_PRIV_FLAGS_SWITCH_LEGACY = 1 << 3,
 };
 
 struct mlx5_adev {