diff mbox series

[net-next,v2,5/5] devlink: Delete reload enable/disable interface

Message ID 06ebba9e115d421118b16ac4efda61c2e08f4d50.1633284302.git.leonro@nvidia.com (mailing list archive)
State Superseded
Headers show
Series devlink reload simplification | expand

Commit Message

Leon Romanovsky Oct. 3, 2021, 6:12 p.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

After changes to allow dynamically set the reload_up/_down callbacks,
we ensure that properly supported devlink ops are not accessible before
devlink_register, which is last command in the initialization sequence.

It makes devlink_reload_enable/_disable not relevant anymore and can be
safely deleted.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../hisilicon/hns3/hns3pf/hclge_devlink.c     |  3 --
 .../hisilicon/hns3/hns3vf/hclgevf_devlink.c   |  3 --
 drivers/net/ethernet/mellanox/mlx4/main.c     |  2 -
 .../net/ethernet/mellanox/mlx5/core/main.c    |  3 --
 .../mellanox/mlx5/core/sf/dev/driver.c        |  5 +--
 drivers/net/ethernet/mellanox/mlxsw/core.c    | 10 ++---
 drivers/net/netdevsim/dev.c                   |  3 --
 include/net/devlink.h                         |  2 -
 net/core/devlink.c                            | 43 +------------------
 9 files changed, 5 insertions(+), 69 deletions(-)

Comments

Ido Schimmel Oct. 4, 2021, 2:19 p.m. UTC | #1
On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> After changes to allow dynamically set the reload_up/_down callbacks,
> we ensure that properly supported devlink ops are not accessible before
> devlink_register, which is last command in the initialization sequence.
> 
> It makes devlink_reload_enable/_disable not relevant anymore and can be
> safely deleted.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

[...]

> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index cb6645012a30..09e48fb232a9 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
>  
>  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
>  	devlink_register(devlink);
> -	devlink_reload_enable(devlink);
>  	return 0;
>  
>  err_psample_exit:
> @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
>  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
>  	struct devlink *devlink = priv_to_devlink(nsim_dev);
>  
> -	devlink_reload_disable(devlink);
>  	devlink_unregister(devlink);
> -
>  	nsim_dev_reload_destroy(nsim_dev);
>  
>  	nsim_bpf_dev_exit(nsim_dev);

I didn't remember why devlink_reload_{enable,disable}() were added in
the first place so it was not clear to me from the commit message why
they can be removed. It is described in commit a0c76345e3d3 ("devlink:
disallow reload operation during device cleanup") with a reproducer.

Tried the reproducer with this series and I cannot reproduce the issue.
Wasn't quite sure why, but it does not seem to be related to "changes to
allow dynamically set the reload_up/_down callbacks", as this seems to
be specific to mlx5.

IIUC, the reason that the race described in above mentioned commit can
no longer happen is related to the fact that devlink_unregister() is
called first in the device dismantle path, after your previous patches.
Since both the reload operation and devlink_unregister() hold
'devlink_mutex', it is not possible for the reload operation to race
with device dismantle.

Agree? If so, I think it would be good to explain this in the commit
message unless it's clear to everyone else.

Thanks
Leon Romanovsky Oct. 4, 2021, 3:45 p.m. UTC | #2
On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > After changes to allow dynamically set the reload_up/_down callbacks,
> > we ensure that properly supported devlink ops are not accessible before
> > devlink_register, which is last command in the initialization sequence.
> > 
> > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > safely deleted.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> 
> [...]
> 
> > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > index cb6645012a30..09e48fb232a9 100644
> > --- a/drivers/net/netdevsim/dev.c
> > +++ b/drivers/net/netdevsim/dev.c
> > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> >  
> >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> >  	devlink_register(devlink);
> > -	devlink_reload_enable(devlink);
> >  	return 0;
> >  
> >  err_psample_exit:
> > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> >  
> > -	devlink_reload_disable(devlink);
> >  	devlink_unregister(devlink);
> > -
> >  	nsim_dev_reload_destroy(nsim_dev);
> >  
> >  	nsim_bpf_dev_exit(nsim_dev);
> 
> I didn't remember why devlink_reload_{enable,disable}() were added in
> the first place so it was not clear to me from the commit message why
> they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> disallow reload operation during device cleanup") with a reproducer.

It was added because devlink ops were accessible by the user space very
early in the driver lifetime. All my latest devlink patches are the
attempt to fix this arch/design/implementation issue.

> 
> Tried the reproducer with this series and I cannot reproduce the issue.
> Wasn't quite sure why, but it does not seem to be related to "changes to
> allow dynamically set the reload_up/_down callbacks", as this seems to
> be specific to mlx5.

You didn't reproduce because of my series that moved
devlink_register()/devlink_unregister() to be last/first commands in
.probe()/.remove() flows.

Patch to allow dynamically set ops was needed because mlx5 had logic
like this:
 if(something)
    devlink_reload_enable()

And I needed a way to keep this if ... condition.

> 
> IIUC, the reason that the race described in above mentioned commit can
> no longer happen is related to the fact that devlink_unregister() is
> called first in the device dismantle path, after your previous patches.
> Since both the reload operation and devlink_unregister() hold
> 'devlink_mutex', it is not possible for the reload operation to race
> with device dismantle.
> 
> Agree? If so, I think it would be good to explain this in the commit
> message unless it's clear to everyone else.

I don't agree for very simple reason that devlink_mutex is going to be
removed very soon and it is really not a reason why devlink reload is
safer now when before.

The reload can't race due to:
1. devlink_unregister(), which works as a barrier to stop accesses
from the user space.
2. reference counting that ensures that all in-flight commands are counted.
3. wait_for_completion that blocks till all commands are done.

Thanks

> 
> Thanks
Ido Schimmel Oct. 4, 2021, 4:54 p.m. UTC | #3
On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > we ensure that properly supported devlink ops are not accessible before
> > > devlink_register, which is last command in the initialization sequence.
> > > 
> > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > safely deleted.
> > > 
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > 
> > [...]
> > 
> > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > index cb6645012a30..09e48fb232a9 100644
> > > --- a/drivers/net/netdevsim/dev.c
> > > +++ b/drivers/net/netdevsim/dev.c
> > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > >  
> > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > >  	devlink_register(devlink);
> > > -	devlink_reload_enable(devlink);
> > >  	return 0;
> > >  
> > >  err_psample_exit:
> > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > >  
> > > -	devlink_reload_disable(devlink);
> > >  	devlink_unregister(devlink);
> > > -
> > >  	nsim_dev_reload_destroy(nsim_dev);
> > >  
> > >  	nsim_bpf_dev_exit(nsim_dev);
> > 
> > I didn't remember why devlink_reload_{enable,disable}() were added in
> > the first place so it was not clear to me from the commit message why
> > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > disallow reload operation during device cleanup") with a reproducer.
> 
> It was added because devlink ops were accessible by the user space very
> early in the driver lifetime. All my latest devlink patches are the
> attempt to fix this arch/design/implementation issue.

The reproducer in the commit message executed the reload after the
device was fully initialized. IIRC, the problem there was that nothing
prevented these two tasks from racing:

devlink dev reload netdevsim/netdevsim10
echo 10 > /sys/bus/netdevsim/del_device

The title also talks about forbidding reload during device cleanup.

> 
> > 
> > Tried the reproducer with this series and I cannot reproduce the issue.
> > Wasn't quite sure why, but it does not seem to be related to "changes to
> > allow dynamically set the reload_up/_down callbacks", as this seems to
> > be specific to mlx5.
> 
> You didn't reproduce because of my series that moved
> devlink_register()/devlink_unregister() to be last/first commands in
> .probe()/.remove() flows.

Agree, that is what I wrote in the next paragraph of my reply.

> 
> Patch to allow dynamically set ops was needed because mlx5 had logic
> like this:
>  if(something)
>     devlink_reload_enable()
> 
> And I needed a way to keep this if ... condition.
> 
> > 
> > IIUC, the reason that the race described in above mentioned commit can
> > no longer happen is related to the fact that devlink_unregister() is
> > called first in the device dismantle path, after your previous patches.
> > Since both the reload operation and devlink_unregister() hold
> > 'devlink_mutex', it is not possible for the reload operation to race
> > with device dismantle.
> > 
> > Agree? If so, I think it would be good to explain this in the commit
> > message unless it's clear to everyone else.
> 
> I don't agree for very simple reason that devlink_mutex is going to be
> removed very soon and it is really not a reason why devlink reload is
> safer now when before.
> 
> The reload can't race due to:
> 1. devlink_unregister(), which works as a barrier to stop accesses
> from the user space.
> 2. reference counting that ensures that all in-flight commands are counted.
> 3. wait_for_completion that blocks till all commands are done.

So the wait_for_completion() is what prevents the race, not
'devlink_mutex' that is taken later. This needs to be explained in the
commit message to make it clear why the removal is safe.

Thanks
Leon Romanovsky Oct. 4, 2021, 7:02 p.m. UTC | #4
On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > we ensure that properly supported devlink ops are not accessible before
> > > > devlink_register, which is last command in the initialization sequence.
> > > > 
> > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > safely deleted.
> > > > 
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > [...]
> > > 
> > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > index cb6645012a30..09e48fb232a9 100644
> > > > --- a/drivers/net/netdevsim/dev.c
> > > > +++ b/drivers/net/netdevsim/dev.c
> > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > >  
> > > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > >  	devlink_register(devlink);
> > > > -	devlink_reload_enable(devlink);
> > > >  	return 0;
> > > >  
> > > >  err_psample_exit:
> > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > >  
> > > > -	devlink_reload_disable(devlink);
> > > >  	devlink_unregister(devlink);
> > > > -
> > > >  	nsim_dev_reload_destroy(nsim_dev);
> > > >  
> > > >  	nsim_bpf_dev_exit(nsim_dev);
> > > 
> > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > the first place so it was not clear to me from the commit message why
> > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > disallow reload operation during device cleanup") with a reproducer.
> > 
> > It was added because devlink ops were accessible by the user space very
> > early in the driver lifetime. All my latest devlink patches are the
> > attempt to fix this arch/design/implementation issue.
> 
> The reproducer in the commit message executed the reload after the
> device was fully initialized. IIRC, the problem there was that nothing
> prevented these two tasks from racing:
> 
> devlink dev reload netdevsim/netdevsim10
> echo 10 > /sys/bus/netdevsim/del_device
> 
> The title also talks about forbidding reload during device cleanup.

It is incomplete title and reproducer. In our verification, we observed
more than 40 bugs related to devlink reload flows and races around it.

> 
> > 
> > > 
> > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > be specific to mlx5.
> > 
> > You didn't reproduce because of my series that moved
> > devlink_register()/devlink_unregister() to be last/first commands in
> > .probe()/.remove() flows.
> 
> Agree, that is what I wrote in the next paragraph of my reply.
> 
> > 
> > Patch to allow dynamically set ops was needed because mlx5 had logic
> > like this:
> >  if(something)
> >     devlink_reload_enable()
> > 
> > And I needed a way to keep this if ... condition.
> > 
> > > 
> > > IIUC, the reason that the race described in above mentioned commit can
> > > no longer happen is related to the fact that devlink_unregister() is
> > > called first in the device dismantle path, after your previous patches.
> > > Since both the reload operation and devlink_unregister() hold
> > > 'devlink_mutex', it is not possible for the reload operation to race
> > > with device dismantle.
> > > 
> > > Agree? If so, I think it would be good to explain this in the commit
> > > message unless it's clear to everyone else.
> > 
> > I don't agree for very simple reason that devlink_mutex is going to be
> > removed very soon and it is really not a reason why devlink reload is
> > safer now when before.
> > 
> > The reload can't race due to:
> > 1. devlink_unregister(), which works as a barrier to stop accesses
> > from the user space.
> > 2. reference counting that ensures that all in-flight commands are counted.
> > 3. wait_for_completion that blocks till all commands are done.
> 
> So the wait_for_completion() is what prevents the race, not
> 'devlink_mutex' that is taken later. This needs to be explained in the
> commit message to make it clear why the removal is safe.

Can you please suggest what exactly should I write in the commit message
to make it clear?

I'm too much into this delvink stuff already and for me this patch is
trivial. IMHO, that change doesn't need an explanation at all because
coding pattern of refcount + wait_for_completion is pretty common in the
kernel. So I think that I explained good enough: move of
devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.

I have no problem to update the commit message, just help me with the
message.

Thanks

> 
> Thanks
Ido Schimmel Oct. 5, 2021, 6:10 a.m. UTC | #5
On Mon, Oct 04, 2021 at 10:02:06PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> > On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > 
> > > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > > we ensure that properly supported devlink ops are not accessible before
> > > > > devlink_register, which is last command in the initialization sequence.
> > > > > 
> > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > > safely deleted.
> > > > > 
> > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > > index cb6645012a30..09e48fb232a9 100644
> > > > > --- a/drivers/net/netdevsim/dev.c
> > > > > +++ b/drivers/net/netdevsim/dev.c
> > > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > > >  
> > > > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > >  	devlink_register(devlink);
> > > > > -	devlink_reload_enable(devlink);
> > > > >  	return 0;
> > > > >  
> > > > >  err_psample_exit:
> > > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > > >  
> > > > > -	devlink_reload_disable(devlink);
> > > > >  	devlink_unregister(devlink);
> > > > > -
> > > > >  	nsim_dev_reload_destroy(nsim_dev);
> > > > >  
> > > > >  	nsim_bpf_dev_exit(nsim_dev);
> > > > 
> > > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > > the first place so it was not clear to me from the commit message why
> > > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > > disallow reload operation during device cleanup") with a reproducer.
> > > 
> > > It was added because devlink ops were accessible by the user space very
> > > early in the driver lifetime. All my latest devlink patches are the
> > > attempt to fix this arch/design/implementation issue.
> > 
> > The reproducer in the commit message executed the reload after the
> > device was fully initialized. IIRC, the problem there was that nothing
> > prevented these two tasks from racing:
> > 
> > devlink dev reload netdevsim/netdevsim10
> > echo 10 > /sys/bus/netdevsim/del_device
> > 
> > The title also talks about forbidding reload during device cleanup.
> 
> It is incomplete title and reproducer.

How can the reproducer be incomplete when it reproduced the issue 100%
of the time?

> In our verification, we observed more than 40 bugs related to devlink
> reload flows and races around it.

I assume these bugs are related to mlx5. syzkaller is familiar with the
devlink messages [1] and we are using it to fuzz over both mlxsw and
netdevsim. syzbot is also fuzzing over netdevsim and I'm not aware of
any open bugs.

[1] https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_devlink.txt

> 
> > 
> > > 
> > > > 
> > > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > > be specific to mlx5.
> > > 
> > > You didn't reproduce because of my series that moved
> > > devlink_register()/devlink_unregister() to be last/first commands in
> > > .probe()/.remove() flows.
> > 
> > Agree, that is what I wrote in the next paragraph of my reply.
> > 
> > > 
> > > Patch to allow dynamically set ops was needed because mlx5 had logic
> > > like this:
> > >  if(something)
> > >     devlink_reload_enable()
> > > 
> > > And I needed a way to keep this if ... condition.
> > > 
> > > > 
> > > > IIUC, the reason that the race described in above mentioned commit can
> > > > no longer happen is related to the fact that devlink_unregister() is
> > > > called first in the device dismantle path, after your previous patches.
> > > > Since both the reload operation and devlink_unregister() hold
> > > > 'devlink_mutex', it is not possible for the reload operation to race
> > > > with device dismantle.
> > > > 
> > > > Agree? If so, I think it would be good to explain this in the commit
> > > > message unless it's clear to everyone else.
> > > 
> > > I don't agree for very simple reason that devlink_mutex is going to be
> > > removed very soon and it is really not a reason why devlink reload is
> > > safer now when before.
> > > 
> > > The reload can't race due to:
> > > 1. devlink_unregister(), which works as a barrier to stop accesses
> > > from the user space.
> > > 2. reference counting that ensures that all in-flight commands are counted.
> > > 3. wait_for_completion that blocks till all commands are done.
> > 
> > So the wait_for_completion() is what prevents the race, not
> > 'devlink_mutex' that is taken later. This needs to be explained in the
> > commit message to make it clear why the removal is safe.
> 
> Can you please suggest what exactly should I write in the commit message
> to make it clear?
> 
> I'm too much into this delvink stuff already and for me this patch is
> trivial. IMHO, that change doesn't need an explanation at all because
> coding pattern of refcount + wait_for_completion is pretty common in the
> kernel. So I think that I explained good enough: move of
> devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.
> 
> I have no problem to update the commit message, just help me with the
> message.

I suggest something like:

"
Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe / dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs. Tested
with the reproducer mentioned in cited commit.
"
Leon Romanovsky Oct. 5, 2021, 7:40 a.m. UTC | #6
On Tue, Oct 05, 2021 at 09:10:15AM +0300, Ido Schimmel wrote:
> On Mon, Oct 04, 2021 at 10:02:06PM +0300, Leon Romanovsky wrote:
> > On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> > > On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > > > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > > 
> > > > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > > > we ensure that properly supported devlink ops are not accessible before
> > > > > > devlink_register, which is last command in the initialization sequence.
> > > > > > 
> > > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > > > safely deleted.
> > > > > > 
> > > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > > > index cb6645012a30..09e48fb232a9 100644
> > > > > > --- a/drivers/net/netdevsim/dev.c
> > > > > > +++ b/drivers/net/netdevsim/dev.c
> > > > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > > > >  
> > > > > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > > >  	devlink_register(devlink);
> > > > > > -	devlink_reload_enable(devlink);
> > > > > >  	return 0;
> > > > > >  
> > > > > >  err_psample_exit:
> > > > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > > > >  
> > > > > > -	devlink_reload_disable(devlink);
> > > > > >  	devlink_unregister(devlink);
> > > > > > -
> > > > > >  	nsim_dev_reload_destroy(nsim_dev);
> > > > > >  
> > > > > >  	nsim_bpf_dev_exit(nsim_dev);
> > > > > 
> > > > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > > > the first place so it was not clear to me from the commit message why
> > > > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > > > disallow reload operation during device cleanup") with a reproducer.
> > > > 
> > > > It was added because devlink ops were accessible by the user space very
> > > > early in the driver lifetime. All my latest devlink patches are the
> > > > attempt to fix this arch/design/implementation issue.
> > > 
> > > The reproducer in the commit message executed the reload after the
> > > device was fully initialized. IIRC, the problem there was that nothing
> > > prevented these two tasks from racing:
> > > 
> > > devlink dev reload netdevsim/netdevsim10
> > > echo 10 > /sys/bus/netdevsim/del_device
> > > 
> > > The title also talks about forbidding reload during device cleanup.
> > 
> > It is incomplete title and reproducer.
> 
> How can the reproducer be incomplete when it reproduced the issue 100%
> of the time?

Incomplete in the sense that other reproducers exists.
Our internally famous one is module load/reload together with devlink
reload. More complex includes PCI errors, health recover e.t.c.

> 
> > In our verification, we observed more than 40 bugs related to devlink
> > reload flows and races around it.
> 
> I assume these bugs are related to mlx5. syzkaller is familiar with the
> devlink messages [1] and we are using it to fuzz over both mlxsw and
> netdevsim. syzbot is also fuzzing over netdevsim and I'm not aware of
> any open bugs.
> 
> [1] https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_devlink.txt

We don't know what we don't know.

> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > > > be specific to mlx5.
> > > > 
> > > > You didn't reproduce because of my series that moved
> > > > devlink_register()/devlink_unregister() to be last/first commands in
> > > > .probe()/.remove() flows.
> > > 
> > > Agree, that is what I wrote in the next paragraph of my reply.
> > > 
> > > > 
> > > > Patch to allow dynamically set ops was needed because mlx5 had logic
> > > > like this:
> > > >  if(something)
> > > >     devlink_reload_enable()
> > > > 
> > > > And I needed a way to keep this if ... condition.
> > > > 
> > > > > 
> > > > > IIUC, the reason that the race described in above mentioned commit can
> > > > > no longer happen is related to the fact that devlink_unregister() is
> > > > > called first in the device dismantle path, after your previous patches.
> > > > > Since both the reload operation and devlink_unregister() hold
> > > > > 'devlink_mutex', it is not possible for the reload operation to race
> > > > > with device dismantle.
> > > > > 
> > > > > Agree? If so, I think it would be good to explain this in the commit
> > > > > message unless it's clear to everyone else.
> > > > 
> > > > I don't agree for very simple reason that devlink_mutex is going to be
> > > > removed very soon and it is really not a reason why devlink reload is
> > > > safer now when before.
> > > > 
> > > > The reload can't race due to:
> > > > 1. devlink_unregister(), which works as a barrier to stop accesses
> > > > from the user space.
> > > > 2. reference counting that ensures that all in-flight commands are counted.
> > > > 3. wait_for_completion that blocks till all commands are done.
> > > 
> > > So the wait_for_completion() is what prevents the race, not
> > > 'devlink_mutex' that is taken later. This needs to be explained in the
> > > commit message to make it clear why the removal is safe.
> > 
> > Can you please suggest what exactly should I write in the commit message
> > to make it clear?
> > 
> > I'm too much into this delvink stuff already and for me this patch is
> > trivial. IMHO, that change doesn't need an explanation at all because
> > coding pattern of refcount + wait_for_completion is pretty common in the
> > kernel. So I think that I explained good enough: move of
> > devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.
> > 
> > I have no problem to update the commit message, just help me with the
> > message.
> 
> I suggest something like:
> 
> "
> Commit a0c76345e3d3 ("devlink: disallow reload operation during device
> cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
> operation from racing with device probe / dismantle.
> 
> After recent changes to move devlink_register() to the end of device
> probe and devlink_unregister() to the beginning of device dismantle,
> these races can no longer happen. Reload operations will be denied if
> the devlink instance is unregistered and devlink_unregister() will block
> until all in-flight operations are done.
> 
> Therefore, remove these devlink_reload_{enable,disable}() APIs. Tested
> with the reproducer mentioned in cited commit.
> "

Sure, thanks.
Can I added your TOB to the patch?
Ido Schimmel Oct. 5, 2021, 8:18 a.m. UTC | #7
On Tue, Oct 05, 2021 at 10:40:58AM +0300, Leon Romanovsky wrote:
> Can I added your TOB to the patch?

Yes

Tested-by: Ido Schimmel <idosch@nvidia.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
index 59b0ae7d59e0..c394c393421e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
@@ -120,7 +120,6 @@  int hclge_devlink_init(struct hclge_dev *hdev)
 	hdev->devlink = devlink;
 
 	devlink_register(devlink);
-	devlink_reload_enable(devlink);
 	return 0;
 }
 
@@ -128,8 +127,6 @@  void hclge_devlink_uninit(struct hclge_dev *hdev)
 {
 	struct devlink *devlink = hdev->devlink;
 
-	devlink_reload_disable(devlink);
-
 	devlink_unregister(devlink);
 
 	devlink_free(devlink);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
index d60cc9426f70..d67c151e024b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
@@ -122,7 +122,6 @@  int hclgevf_devlink_init(struct hclgevf_dev *hdev)
 	hdev->devlink = devlink;
 
 	devlink_register(devlink);
-	devlink_reload_enable(devlink);
 	return 0;
 }
 
@@ -130,8 +129,6 @@  void hclgevf_devlink_uninit(struct hclgevf_dev *hdev)
 {
 	struct devlink *devlink = hdev->devlink;
 
-	devlink_reload_disable(devlink);
-
 	devlink_unregister(devlink);
 
 	devlink_free(devlink);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 9541f3a920c8..8b410800f049 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -4026,7 +4026,6 @@  static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	pci_save_state(pdev);
 	devlink_register(devlink);
-	devlink_reload_enable(devlink);
 	return 0;
 
 err_params_unregister:
@@ -4135,7 +4134,6 @@  static void mlx4_remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(priv);
 	int active_vfs = 0;
 
-	devlink_reload_disable(devlink);
 	devlink_unregister(devlink);
 
 	if (mlx4_is_slave(dev))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 5893fdd5aedb..65313448a47c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1538,8 +1538,6 @@  static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	pci_save_state(pdev);
 	devlink_register(devlink);
-	if (!mlx5_core_is_mp_slave(dev))
-		devlink_reload_enable(devlink);
 	return 0;
 
 err_init_one:
@@ -1559,7 +1557,6 @@  static void remove_one(struct pci_dev *pdev)
 	struct mlx5_core_dev *dev  = pci_get_drvdata(pdev);
 	struct devlink *devlink = priv_to_devlink(dev);
 
-	devlink_reload_disable(devlink);
 	devlink_unregister(devlink);
 	mlx5_crdump_disable(dev);
 	mlx5_drain_health_wq(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 3cf272fa2164..7b4783ce213e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -47,7 +47,6 @@  static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 		goto init_one_err;
 	}
 	devlink_register(devlink);
-	devlink_reload_enable(devlink);
 	return 0;
 
 init_one_err:
@@ -62,10 +61,8 @@  static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
 static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
 {
 	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
-	struct devlink *devlink;
+	struct devlink *devlink = priv_to_devlink(sf_dev->mdev);
 
-	devlink = priv_to_devlink(sf_dev->mdev);
-	devlink_reload_disable(devlink);
 	devlink_unregister(devlink);
 	mlx5_uninit_one(sf_dev->mdev);
 	iounmap(sf_dev->mdev->iseg);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 9e831e8b607a..895b3ba88e45 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2007,11 +2007,8 @@  __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 			goto err_driver_init;
 	}
 
-	if (!reload) {
+	if (!reload)
 		devlink_register(devlink);
-		devlink_reload_enable(devlink);
-	}
-
 	return 0;
 
 err_driver_init:
@@ -2075,10 +2072,9 @@  void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
-	if (!reload) {
-		devlink_reload_disable(devlink);
+	if (!reload)
 		devlink_unregister(devlink);
-	}
+
 	if (devlink_is_reload_failed(devlink)) {
 		if (!reload)
 			/* Only the parts that were not de-initialized in the
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index cb6645012a30..09e48fb232a9 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1512,7 +1512,6 @@  int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
 
 	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
 	devlink_register(devlink);
-	devlink_reload_enable(devlink);
 	return 0;
 
 err_psample_exit:
@@ -1566,9 +1565,7 @@  void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
 	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
 	struct devlink *devlink = priv_to_devlink(nsim_dev);
 
-	devlink_reload_disable(devlink);
 	devlink_unregister(devlink);
-
 	nsim_dev_reload_destroy(nsim_dev);
 
 	nsim_bpf_dev_exit(nsim_dev);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 320146d95fb8..23355fd92553 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1523,8 +1523,6 @@  static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
 void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops);
 void devlink_register(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
-void devlink_reload_enable(struct devlink *devlink);
-void devlink_reload_disable(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
 int devlink_port_register(struct devlink *devlink,
 			  struct devlink_port *devlink_port,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 25c2aa2b35cd..b45bdba9775d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -62,8 +62,7 @@  struct devlink {
 	 * port, sb, dpipe, resource, params, region, traps and more.
 	 */
 	struct mutex lock;
-	u8 reload_failed:1,
-	   reload_enabled:1;
+	u8 reload_failed:1;
 	refcount_t refcount;
 	struct completion comp;
 	char priv[0] __aligned(NETDEV_ALIGN);
@@ -4033,9 +4032,6 @@  static int devlink_reload(struct devlink *devlink, struct net *dest_net,
 	struct net *curr_net;
 	int err;
 
-	if (!devlink->reload_enabled)
-		return -EOPNOTSUPP;
-
 	memcpy(remote_reload_stats, devlink->stats.remote_reload_stats,
 	       sizeof(remote_reload_stats));
 
@@ -9245,49 +9241,12 @@  void devlink_unregister(struct devlink *devlink)
 	wait_for_completion(&devlink->comp);
 
 	mutex_lock(&devlink_mutex);
-	WARN_ON(devlink_reload_supported(&devlink->ops) &&
-		devlink->reload_enabled);
 	devlink_notify_unregister(devlink);
 	xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
 	mutex_unlock(&devlink_mutex);
 }
 EXPORT_SYMBOL_GPL(devlink_unregister);
 
-/**
- *	devlink_reload_enable - Enable reload of devlink instance
- *
- *	@devlink: devlink
- *
- *	Should be called at end of device initialization
- *	process when reload operation is supported.
- */
-void devlink_reload_enable(struct devlink *devlink)
-{
-	mutex_lock(&devlink_mutex);
-	devlink->reload_enabled = true;
-	mutex_unlock(&devlink_mutex);
-}
-EXPORT_SYMBOL_GPL(devlink_reload_enable);
-
-/**
- *	devlink_reload_disable - Disable reload of devlink instance
- *
- *	@devlink: devlink
- *
- *	Should be called at the beginning of device cleanup
- *	process when reload operation is supported.
- */
-void devlink_reload_disable(struct devlink *devlink)
-{
-	mutex_lock(&devlink_mutex);
-	/* Mutex is taken which ensures that no reload operation is in
-	 * progress while setting up forbidded flag.
-	 */
-	devlink->reload_enabled = false;
-	mutex_unlock(&devlink_mutex);
-}
-EXPORT_SYMBOL_GPL(devlink_reload_disable);
-
 /**
  *	devlink_free - Free devlink instance resources
  *