diff mbox series

[net-next] netdevsim: Forbid devlink reload when adding or deleting ports

Message ID 53cd1a28dd34ced9fb4c39885c6e13523e97d62c.1628161323.git.leonro@nvidia.com (mailing list archive)
State Accepted
Commit 23809a726c0d004b9d2474333181f8da07360469
Delegated to: Netdev Maintainers
Headers show
Series [net-next] netdevsim: Forbid devlink reload when adding or deleting ports | expand

Checks

Context Check Description
netdev/apply success Patch already applied to net-next
netdev/tree_selection success Clearly marked for net-next

Commit Message

Leon Romanovsky Aug. 5, 2021, 11:02 a.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

In order to remove complexity in devlink core related to
devlink_reload_enable/disable, let's rewrite new_port/del_port
logic to rely on internal to netdevsim lcok.

We should protect only reload_down flow because it destroys nsim_dev,
which is needed for nsim_dev_port_add/nsim_dev_port_del to hold
port_list_lock.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/netdevsim/bus.c | 16 ++++------------
 drivers/net/netdevsim/dev.c |  7 +++++++
 2 files changed, 11 insertions(+), 12 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Aug. 5, 2021, 12:40 p.m. UTC | #1
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Thu,  5 Aug 2021 14:02:45 +0300 you wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> In order to remove complexity in devlink core related to
> devlink_reload_enable/disable, let's rewrite new_port/del_port
> logic to rely on internal to netdevsim lcok.
> 
> We should protect only reload_down flow because it destroys nsim_dev,
> which is needed for nsim_dev_port_add/nsim_dev_port_del to hold
> port_list_lock.
> 
> [...]

Here is the summary with links:
  - [net-next] netdevsim: Forbid devlink reload when adding or deleting ports
    https://git.kernel.org/netdev/net-next/c/23809a726c0d

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Jakub Kicinski Aug. 5, 2021, 1:15 p.m. UTC | #2
On Thu,  5 Aug 2021 14:05:41 +0300 Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> In order to remove complexity in devlink core related to
> devlink_reload_enable/disable, let's rewrite new_port/del_port
> logic to rely on internal to netdevsim lock.
> 
> We should protect only reload_down flow because it destroys nsim_dev,
> which is needed for nsim_dev_port_add/nsim_dev_port_del to hold
> port_list_lock.

I don't understand why we only have to protect reload_down.

What protects us from adding a port right after down? That'd hit a
destroyed mutex, up wipes the port list etc...

> +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> +	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
> +		return -EOPNOTSUPP;

Why not -EBUSY?
Leon Romanovsky Aug. 5, 2021, 1:51 p.m. UTC | #3
On Thu, Aug 05, 2021 at 06:15:47AM -0700, Jakub Kicinski wrote:
> On Thu,  5 Aug 2021 14:05:41 +0300 Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > In order to remove complexity in devlink core related to
> > devlink_reload_enable/disable, let's rewrite new_port/del_port
> > logic to rely on internal to netdevsim lock.
> > 
> > We should protect only reload_down flow because it destroys nsim_dev,
> > which is needed for nsim_dev_port_add/nsim_dev_port_del to hold
> > port_list_lock.
> 
> I don't understand why we only have to protect reload_down.

I assumed that if we succeeded to pass reload_down and we are in
reload_up stage, everything was already bailed out.

> 
> What protects us from adding a port right after down? That'd hit a
> destroyed mutex, up wipes the port list etc...

You will have very similar crash to already existing one:
* parallel call to del_device and add_port will hit same issue.

The idea is not make netdevsim universally correct, but to ensure that
it doesn't crash immediately.

> 
> > +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> > +	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
> > +		return -EOPNOTSUPP;
> 
> Why not -EBUSY?

This is what devlink_reload_disable() returns, so I kept same error.
It is not important at all.

What about the following change on top of this patch?

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index a29ec264119d..62d033a1a557 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -196,6 +196,11 @@ new_port_store(struct device *dev, struct device_attribute *attr,
 	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
 		return -EBUSY;
 
+	if (nsim_bus_dev->in_reload) {
+		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
+		return -EBUSY;
+	}
+
 	ret = nsim_dev_port_add(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
@@ -221,6 +226,11 @@ del_port_store(struct device *dev, struct device_attribute *attr,
 	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
 		return -EBUSY;
 
+	if (nsim_bus_dev->in_reload) {
+		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
+		return -EBUSY;
+	}
+
 	ret = nsim_dev_port_del(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index ff5714209b86..53068e184c90 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -878,6 +878,7 @@ static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
 		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 		return -EOPNOTSUPP;
 	}
+	nsim_bus_dev->in_reload = true;
 
 	nsim_dev_reload_destroy(nsim_dev);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
@@ -889,17 +890,26 @@ static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_actio
 			      struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+	struct nsim_bus_dev *nsim_bus_dev;
+	int ret;
+
+	nsim_bus_dev = nsim_dev->nsim_bus_dev;
+	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
+	nsim_bus_dev->in_reload = false;
 
 	if (nsim_dev->fail_reload) {
 		/* For testing purposes, user set debugfs fail_reload
 		 * value to true. Fail right away.
 		 */
 		NL_SET_ERR_MSG_MOD(extack, "User setup the reload to fail for testing purposes");
+		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 		return -EINVAL;
 	}
 
 	*actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
-	return nsim_dev_reload_create(nsim_dev, extack);
+	ret = nsim_dev_reload_create(nsim_dev, extack);
+	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
+	return ret;
 }
 
 static int nsim_dev_info_get(struct devlink *devlink,
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 1c20bcbd9d91..793c86dc5a9c 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -362,6 +362,7 @@ struct nsim_bus_dev {
 	struct nsim_vf_config *vfconfigs;
 	/* Lock for devlink->reload_enabled in netdevsim module */
 	struct mutex nsim_bus_reload_lock;
+	bool in_reload;
 	bool init;
 };
 


Thanks
Jakub Kicinski Aug. 5, 2021, 2:23 p.m. UTC | #4
On Thu, 5 Aug 2021 16:51:31 +0300 Leon Romanovsky wrote:
> > > +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> > > +	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
> > > +		return -EOPNOTSUPP;  
> > 
> > Why not -EBUSY?  
> 
> This is what devlink_reload_disable() returns, so I kept same error.
> It is not important at all.
> 
> What about the following change on top of this patch?

LGTM, the only question is whether we should leave in_reload true 
if nsim_dev->fail_reload is set.

> @@ -889,17 +890,26 @@ static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_actio
>  			      struct netlink_ext_ack *extack)
>  {
>  	struct nsim_dev *nsim_dev = devlink_priv(devlink);
> +	struct nsim_bus_dev *nsim_bus_dev;
> +	int ret;
> +
> +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> +	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
> +	nsim_bus_dev->in_reload = false;
>  
>  	if (nsim_dev->fail_reload) {
>  		/* For testing purposes, user set debugfs fail_reload
>  		 * value to true. Fail right away.
>  		 */
>  		NL_SET_ERR_MSG_MOD(extack, "User setup the reload to fail for testing purposes");
> +		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
>  		return -EINVAL;
>  	}
>  
>  	*actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
> -	return nsim_dev_reload_create(nsim_dev, extack);
> +	ret = nsim_dev_reload_create(nsim_dev, extack);
> +	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
> +	return ret;
>  }
Leon Romanovsky Aug. 5, 2021, 2:33 p.m. UTC | #5
On Thu, Aug 05, 2021 at 07:23:42AM -0700, Jakub Kicinski wrote:
> On Thu, 5 Aug 2021 16:51:31 +0300 Leon Romanovsky wrote:
> > > > +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> > > > +	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
> > > > +		return -EOPNOTSUPP;  
> > > 
> > > Why not -EBUSY?  
> > 
> > This is what devlink_reload_disable() returns, so I kept same error.
> > It is not important at all.
> > 
> > What about the following change on top of this patch?
> 
> LGTM, the only question is whether we should leave in_reload true 
> if nsim_dev->fail_reload is set.

I don't think so, it will block add/delete ports.

> 
> > @@ -889,17 +890,26 @@ static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_actio
> >  			      struct netlink_ext_ack *extack)
> >  {
> >  	struct nsim_dev *nsim_dev = devlink_priv(devlink);
> > +	struct nsim_bus_dev *nsim_bus_dev;
> > +	int ret;
> > +
> > +	nsim_bus_dev = nsim_dev->nsim_bus_dev;
> > +	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
> > +	nsim_bus_dev->in_reload = false;
> >  
> >  	if (nsim_dev->fail_reload) {
> >  		/* For testing purposes, user set debugfs fail_reload
> >  		 * value to true. Fail right away.
> >  		 */
> >  		NL_SET_ERR_MSG_MOD(extack, "User setup the reload to fail for testing purposes");
> > +		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
> >  		return -EINVAL;
> >  	}
> >  
> >  	*actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
> > -	return nsim_dev_reload_create(nsim_dev, extack);
> > +	ret = nsim_dev_reload_create(nsim_dev, extack);
> > +	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
> > +	return ret;
> >  }
> 
>
Jakub Kicinski Aug. 5, 2021, 3:27 p.m. UTC | #6
On Thu, 5 Aug 2021 17:33:35 +0300 Leon Romanovsky wrote:
> On Thu, Aug 05, 2021 at 07:23:42AM -0700, Jakub Kicinski wrote:
> > > This is what devlink_reload_disable() returns, so I kept same error.
> > > It is not important at all.
> > > 
> > > What about the following change on top of this patch?  
> > 
> > LGTM, the only question is whether we should leave in_reload true 
> > if nsim_dev->fail_reload is set.  
> 
> I don't think so, it will block add/delete ports.

As it should, given add/delete ports takes the port_list_lock which is
destroyed by down but not (due to the forced failure) re-initialized by
up.

If we want to handle adding ports while down we can just bump port
count and return, although I don't think there's a practical need
to support that.
Leon Romanovsky Aug. 5, 2021, 5:35 p.m. UTC | #7
On Thu, Aug 05, 2021 at 08:27:56AM -0700, Jakub Kicinski wrote:
> On Thu, 5 Aug 2021 17:33:35 +0300 Leon Romanovsky wrote:
> > On Thu, Aug 05, 2021 at 07:23:42AM -0700, Jakub Kicinski wrote:
> > > > This is what devlink_reload_disable() returns, so I kept same error.
> > > > It is not important at all.
> > > > 
> > > > What about the following change on top of this patch?  
> > > 
> > > LGTM, the only question is whether we should leave in_reload true 
> > > if nsim_dev->fail_reload is set.  
> > 
> > I don't think so, it will block add/delete ports.
> 
> As it should, given add/delete ports takes the port_list_lock which is
> destroyed by down but not (due to the forced failure) re-initialized by
> up.
> 
> If we want to handle adding ports while down we can just bump port
> count and return, although I don't think there's a practical need
> to support that.

Sorry, but for me netdevsim looks like complete dumpster. It was
intended for fast prototyping, but ended to be huge pile of debugfs
entries and selftest to execute random flows.

Do you want me to move in_reload = false line to be after if (nsim_dev->fail_reload)
check?

Thanks
Leon Romanovsky Aug. 5, 2021, 6:02 p.m. UTC | #8
On Thu, Aug 05, 2021 at 08:35:59PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 05, 2021 at 08:27:56AM -0700, Jakub Kicinski wrote:
> > On Thu, 5 Aug 2021 17:33:35 +0300 Leon Romanovsky wrote:
> > > On Thu, Aug 05, 2021 at 07:23:42AM -0700, Jakub Kicinski wrote:
> > > > > This is what devlink_reload_disable() returns, so I kept same error.
> > > > > It is not important at all.
> > > > > 
> > > > > What about the following change on top of this patch?  
> > > > 
> > > > LGTM, the only question is whether we should leave in_reload true 
> > > > if nsim_dev->fail_reload is set.  
> > > 
> > > I don't think so, it will block add/delete ports.
> > 
> > As it should, given add/delete ports takes the port_list_lock which is
> > destroyed by down but not (due to the forced failure) re-initialized by
> > up.
> > 
> > If we want to handle adding ports while down we can just bump port
> > count and return, although I don't think there's a practical need
> > to support that.
> 
> Sorry, but for me netdevsim looks like complete dumpster. It was
> intended for fast prototyping, but ended to be huge pile of debugfs
> entries and selftest to execute random flows.
> 
> Do you want me to move in_reload = false line to be after if (nsim_dev->fail_reload)
> check?

BTW, the current implementation where in_reload before if, actually
preserves same behaviour as was with devlink_reload_enable() implementation.

Thanks

> 
> Thanks
Jakub Kicinski Aug. 5, 2021, 7:12 p.m. UTC | #9
On Thu, 5 Aug 2021 21:02:23 +0300 Leon Romanovsky wrote:
> > > As it should, given add/delete ports takes the port_list_lock which is
> > > destroyed by down but not (due to the forced failure) re-initialized by
> > > up.
> > > 
> > > If we want to handle adding ports while down we can just bump port
> > > count and return, although I don't think there's a practical need
> > > to support that.  
> > 
> > Sorry, but for me netdevsim looks like complete dumpster. 

I worry that netdevsim's gone unwieldy as a reflection of the quality of
the devlink APIs that got added, not by itself :/

> > It was intended for fast prototyping, but ended to be huge pile of
> > debugfs entries and selftest to execute random flows.

It's for selftests, IDK what fast prototyping is in terms of driver
APIs. Fast prototyping makes me think of the "it works" attitude which
is not sufficiently high bar for core APIs IMO, I'm sure you'll agree.

netdevsim was written specifically to be able to exercise HW APIs which
are implemented by small fraction of drivers. Especially offload APIs
as those can easily be broken by people changing the SW implementation
without capable HW at hand.

BTW I wonder if there is a term in human science of situation like when
a recent contributor tells the guy who wrote the code what the code was
intended for :)

> > Do you want me to move in_reload = false line to be after if (nsim_dev->fail_reload)
> > check?  
> 
> BTW, the current implementation where in_reload before if, actually
> preserves same behaviour as was with devlink_reload_enable() implementation.

Right, but I think as you rightly pointed out the current protection
of reload is broken. I'm not saying you must make it perfect or else..
just pointing out a gap you could address if you so choose.
Leon Romanovsky Aug. 6, 2021, 11:19 a.m. UTC | #10
On Thu, Aug 05, 2021 at 12:12:03PM -0700, Jakub Kicinski wrote:
> On Thu, 5 Aug 2021 21:02:23 +0300 Leon Romanovsky wrote:
> > > > As it should, given add/delete ports takes the port_list_lock which is
> > > > destroyed by down but not (due to the forced failure) re-initialized by
> > > > up.
> > > > 
> > > > If we want to handle adding ports while down we can just bump port
> > > > count and return, although I don't think there's a practical need
> > > > to support that.  
> > > 
> > > Sorry, but for me netdevsim looks like complete dumpster. 
> 
> I worry that netdevsim's gone unwieldy as a reflection of the quality of
> the devlink APIs that got added, not by itself :/
> 
> > > It was intended for fast prototyping, but ended to be huge pile of
> > > debugfs entries and selftest to execute random flows.
> 
> It's for selftests, IDK what fast prototyping is in terms of driver
> APIs. Fast prototyping makes me think of the "it works" attitude which
> is not sufficiently high bar for core APIs IMO, I'm sure you'll agree.
> 
> netdevsim was written specifically to be able to exercise HW APIs which
> are implemented by small fraction of drivers. Especially offload APIs
> as those can easily be broken by people changing the SW implementation
> without capable HW at hand.
> 
> BTW I wonder if there is a term in human science of situation like when
> a recent contributor tells the guy who wrote the code what the code was
> intended for :)

"Teaching grandmother to suck eggs" ? :)

> 
> > > Do you want me to move in_reload = false line to be after if (nsim_dev->fail_reload)
> > > check?  
> > 
> > BTW, the current implementation where in_reload before if, actually
> > preserves same behaviour as was with devlink_reload_enable() implementation.
> 
> Right, but I think as you rightly pointed out the current protection
> of reload is broken. I'm not saying you must make it perfect or else..
> just pointing out a gap you could address if you so choose.

I don't know, netdevsim needs some dedicated cleanup.

Thanks
diff mbox series

Patch

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index ff01e5bdc72e..a29ec264119d 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -183,8 +183,6 @@  new_port_store(struct device *dev, struct device_attribute *attr,
 	       const char *buf, size_t count)
 {
 	struct nsim_bus_dev *nsim_bus_dev = to_nsim_bus_dev(dev);
-	struct nsim_dev *nsim_dev = dev_get_drvdata(dev);
-	struct devlink *devlink;
 	unsigned int port_index;
 	int ret;
 
@@ -195,12 +193,10 @@  new_port_store(struct device *dev, struct device_attribute *attr,
 	if (ret)
 		return ret;
 
-	devlink = priv_to_devlink(nsim_dev);
+	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
+		return -EBUSY;
 
-	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
-	devlink_reload_disable(devlink);
 	ret = nsim_dev_port_add(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
-	devlink_reload_enable(devlink);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
 }
@@ -212,8 +208,6 @@  del_port_store(struct device *dev, struct device_attribute *attr,
 	       const char *buf, size_t count)
 {
 	struct nsim_bus_dev *nsim_bus_dev = to_nsim_bus_dev(dev);
-	struct nsim_dev *nsim_dev = dev_get_drvdata(dev);
-	struct devlink *devlink;
 	unsigned int port_index;
 	int ret;
 
@@ -224,12 +218,10 @@  del_port_store(struct device *dev, struct device_attribute *attr,
 	if (ret)
 		return ret;
 
-	devlink = priv_to_devlink(nsim_dev);
+	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
+		return -EBUSY;
 
-	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
-	devlink_reload_disable(devlink);
 	ret = nsim_dev_port_del(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
-	devlink_reload_enable(devlink);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
 }
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index d538a39d4225..ff5714209b86 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -864,16 +864,23 @@  static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
 				struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+	struct nsim_bus_dev *nsim_bus_dev;
+
+	nsim_bus_dev = nsim_dev->nsim_bus_dev;
+	if (!mutex_trylock(&nsim_bus_dev->nsim_bus_reload_lock))
+		return -EOPNOTSUPP;
 
 	if (nsim_dev->dont_allow_reload) {
 		/* For testing purposes, user set debugfs dont_allow_reload
 		 * value to true. So forbid it.
 		 */
 		NL_SET_ERR_MSG_MOD(extack, "User forbid the reload for testing purposes");
+		mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 		return -EOPNOTSUPP;
 	}
 
 	nsim_dev_reload_destroy(nsim_dev);
+	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return 0;
 }