Message ID | 20240703145159.80128-1-aha310510@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] team: Fix ABBA deadlock caused by race in team_del_slave | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/apply | fail | Patch does not apply to net-0 |
On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote: > CPU0 CPU1 > ---- ---- > lock(&rdev->wiphy.mtx); > lock(team->team_lock_key#4); > lock(&rdev->wiphy.mtx); > lock(team->team_lock_key#4); > > Deadlock occurs due to the above scenario. Therefore, > modify the code as shown in the patch below to prevent deadlock. > > Regards, > Jeongjun Park. The commit message should contain the patch description only (without salutations, etc.). > > Reported-and-tested-by: syzbot+705c61d60b091ef42c04@syzkaller.appspotmail.com > Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex") > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > --- > drivers/net/team/team_core.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > index ab1935a4aa2c..3ac82df876b0 100644 > --- a/drivers/net/team/team_core.c > +++ b/drivers/net/team/team_core.c > @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > struct netlink_ext_ack *extack) > { > struct team *team = netdev_priv(dev); > - int err; > + int err, locked; > > - mutex_lock(&team->lock); > + locked = mutex_trylock(&team->lock); > err = team_port_add(team, port_dev, extack); > - mutex_unlock(&team->lock); > + if (locked) > + mutex_unlock(&team->lock); This is not correct usage of 'mutex_trylock()' API. In such a case you could as well remove the lock completely from that part of code. If "mutex_trylock()" returns false it means the mutex cannot be taken (because it was already taken by other thread), so you should not modify the resources that were expected to be protected by the mutex. In other words, there is a risk of modifying resources using "team_port_add()" by several threads at a time. > > if (!err) > netdev_change_features(dev); > @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > { > struct team *team = netdev_priv(dev); > - int err; > + int err, locked; > > - mutex_lock(&team->lock); > + locked = mutex_trylock(&team->lock); > err = team_port_del(team, port_dev); > - mutex_unlock(&team->lock); > + if (locked) > + mutex_unlock(&team->lock); The same story as in case of "team_add_slave()". > > if (err) > return err; > -- > The patch does not seem to be a correct solution to remove a deadlock. Most probably a synchronization design needs an inspection. If you really want to use "mutex_trylock()" API, please consider several attempts of taking the mutex, but never modify the protected resources when the mutex is not taken successfully. Thanks, Michal
> > On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote: > > CPU0 CPU1 > > ---- ---- > > lock(&rdev->wiphy.mtx); > > lock(team->team_lock_key#4); > > lock(&rdev->wiphy.mtx); > > lock(team->team_lock_key#4); > > > > Deadlock occurs due to the above scenario. Therefore, > > modify the code as shown in the patch below to prevent deadlock. > > > > Regards, > > Jeongjun Park. > > The commit message should contain the patch description only (without > salutations, etc.). > > > > > Reported-and-tested-by: syzbot+705c61d60b091ef42c04@syzkaller.appspotmail.com > > Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex") > > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > > --- > > drivers/net/team/team_core.c | 14 ++++++++------ > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > > index ab1935a4aa2c..3ac82df876b0 100644 > > --- a/drivers/net/team/team_core.c > > +++ b/drivers/net/team/team_core.c > > @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > struct netlink_ext_ack *extack) > > { > > struct team *team = netdev_priv(dev); > > - int err; > > + int err, locked; > > > > - mutex_lock(&team->lock); > > + locked = mutex_trylock(&team->lock); > > err = team_port_add(team, port_dev, extack); > > - mutex_unlock(&team->lock); > > + if (locked) > > + mutex_unlock(&team->lock); > > This is not correct usage of 'mutex_trylock()' API. In such a case you > could as well remove the lock completely from that part of code. > If "mutex_trylock()" returns false it means the mutex cannot be taken > (because it was already taken by other thread), so you should not modify > the resources that were expected to be protected by the mutex. > In other words, there is a risk of modifying resources using > "team_port_add()" by several threads at a time. > > > > > if (!err) > > netdev_change_features(dev); > > @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > > { > > struct team *team = netdev_priv(dev); > > - int err; > > + int err, locked; > > > > - mutex_lock(&team->lock); > > + locked = mutex_trylock(&team->lock); > > err = team_port_del(team, port_dev); > > - mutex_unlock(&team->lock); > > + if (locked) > > + mutex_unlock(&team->lock); > > The same story as in case of "team_add_slave()". > > > > > if (err) > > return err; > > -- > > > > The patch does not seem to be a correct solution to remove a deadlock. > Most probably a synchronization design needs an inspection. > If you really want to use "mutex_trylock()" API, please consider several > attempts of taking the mutex, but never modify the protected resources when > the mutex is not taken successfully. > Thanks for your comment. I rewrote the patch based on those comments. This time, we modified it to return an error so that resources are not modified when a race situation occurs. We would appreciate your feedback on what this patch would be like. > Thanks, > Michal > > Regards, Jeongjun Park --- drivers/net/team/team_core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ab1935a4aa2c..43d7c73b25aa 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, struct team *team = netdev_priv(dev); int err; - mutex_lock(&team->lock); + if (!mutex_trylock(&team->lock)) + return -EBUSY; err = team_port_add(team, port_dev, extack); mutex_unlock(&team->lock); @@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) struct team *team = netdev_priv(dev); int err; - mutex_lock(&team->lock); + if (!mutex_trylock(&team->lock)) + return -EBUSY; err = team_port_del(team, port_dev); mutex_unlock(&team->lock); --
On Wed, Jul 3, 2024 at 6:02 PM Jeongjun Park <aha310510@gmail.com> wrote: > > > > > On Wed, Jul 03, 2024 at 11:51:59PM +0900, Jeongjun Park wrote: > > > CPU0 CPU1 > > > ---- ---- > > > lock(&rdev->wiphy.mtx); > > > lock(team->team_lock_key#4); > > > lock(&rdev->wiphy.mtx); > > > lock(team->team_lock_key#4); > > > > > > Deadlock occurs due to the above scenario. Therefore, > > > modify the code as shown in the patch below to prevent deadlock. > > > > > > Regards, > > > Jeongjun Park. > > > > The commit message should contain the patch description only (without > > salutations, etc.). > > > > > > > > Reported-and-tested-by: syzbot+705c61d60b091ef42c04@syzkaller.appspotmail.com > > > Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex") > > > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > > > --- > > > drivers/net/team/team_core.c | 14 ++++++++------ > > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > > > index ab1935a4aa2c..3ac82df876b0 100644 > > > --- a/drivers/net/team/team_core.c > > > +++ b/drivers/net/team/team_core.c > > > @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > > struct netlink_ext_ack *extack) > > > { > > > struct team *team = netdev_priv(dev); > > > - int err; > > > + int err, locked; > > > > > > - mutex_lock(&team->lock); > > > + locked = mutex_trylock(&team->lock); > > > err = team_port_add(team, port_dev, extack); > > > - mutex_unlock(&team->lock); > > > + if (locked) > > > + mutex_unlock(&team->lock); > > > > This is not correct usage of 'mutex_trylock()' API. In such a case you > > could as well remove the lock completely from that part of code. > > If "mutex_trylock()" returns false it means the mutex cannot be taken > > (because it was already taken by other thread), so you should not modify > > the resources that were expected to be protected by the mutex. > > In other words, there is a risk of modifying resources using > > "team_port_add()" by several threads at a time. > > > > > > > > if (!err) > > > netdev_change_features(dev); > > > @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > > static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > > > { > > > struct team *team = netdev_priv(dev); > > > - int err; > > > + int err, locked; > > > > > > - mutex_lock(&team->lock); > > > + locked = mutex_trylock(&team->lock); > > > err = team_port_del(team, port_dev); > > > - mutex_unlock(&team->lock); > > > + if (locked) > > > + mutex_unlock(&team->lock); > > > > The same story as in case of "team_add_slave()". > > > > > > > > if (err) > > > return err; > > > -- > > > > > > > The patch does not seem to be a correct solution to remove a deadlock. > > Most probably a synchronization design needs an inspection. > > If you really want to use "mutex_trylock()" API, please consider several > > attempts of taking the mutex, but never modify the protected resources when > > the mutex is not taken successfully. > > > > Thanks for your comment. I rewrote the patch based on those comments. > This time, we modified it to return an error so that resources are not > modified when a race situation occurs. We would appreciate your > feedback on what this patch would be like. > > > Thanks, > > Michal > > > > > > Regards, > Jeongjun Park > > --- > drivers/net/team/team_core.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > index ab1935a4aa2c..43d7c73b25aa 100644 > --- a/drivers/net/team/team_core.c > +++ b/drivers/net/team/team_core.c > @@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > struct team *team = netdev_priv(dev); > int err; > > - mutex_lock(&team->lock); > + if (!mutex_trylock(&team->lock)) > + return -EBUSY; > err = team_port_add(team, port_dev, extack); > mutex_unlock(&team->lock); > > @@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > struct team *team = netdev_priv(dev); > int err; > > - mutex_lock(&team->lock); > + if (!mutex_trylock(&team->lock)) > + return -EBUSY; > err = team_port_del(team, port_dev); > mutex_unlock(&team->lock); > > -- Failing team_del_slave() is not an option. It will add various issues.
> > > > Thanks for your comment. I rewrote the patch based on those comments. > > This time, we modified it to return an error so that resources are not > > modified when a race situation occurs. We would appreciate your > > feedback on what this patch would be like. > > > > > Thanks, > > > Michal > > > > > > > > > > Regards, > > Jeongjun Park > > > > --- > > drivers/net/team/team_core.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > > index ab1935a4aa2c..43d7c73b25aa 100644 > > --- a/drivers/net/team/team_core.c > > +++ b/drivers/net/team/team_core.c > > @@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > struct team *team = netdev_priv(dev); > > int err; > > > > - mutex_lock(&team->lock); > > + if (!mutex_trylock(&team->lock)) > > + return -EBUSY; > > err = team_port_add(team, port_dev, extack); > > mutex_unlock(&team->lock); > > > > @@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > > struct team *team = netdev_priv(dev); > > int err; > > > > - mutex_lock(&team->lock); > > + if (!mutex_trylock(&team->lock)) > > + return -EBUSY; > > err = team_port_del(team, port_dev); > > mutex_unlock(&team->lock); > > > > -- > > Failing team_del_slave() is not an option. It will add various issues. Thank you for comment. So, how about briefly releasing the lock before calling dev_open() in team_port_add() and then locking it again? dev_open() does not use &team, so disabling it briefly will not cause any major problems. Regards, Jeongjun Park --- drivers/net/team/team_core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ab1935a4aa2c..245566a1875d 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1213,7 +1213,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, goto err_port_enter; } + mutex_unlock(&team->lock); err = dev_open(port_dev, extack); + mutex_lock(&team->lock); if (err) { netdev_dbg(dev, "Device %s opening failed\n", portname); --
> > > > Thanks for your comment. I rewrote the patch based on those comments. > > This time, we modified it to return an error so that resources are not > > modified when a race situation occurs. We would appreciate your > > feedback on what this patch would be like. > > > > > Thanks, > > > Michal > > > > > > > > > > Regards, > > Jeongjun Park > > > > --- > > drivers/net/team/team_core.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > > index ab1935a4aa2c..43d7c73b25aa 100644 > > --- a/drivers/net/team/team_core.c > > +++ b/drivers/net/team/team_core.c > > @@ -1972,7 +1972,8 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, > > struct team *team = netdev_priv(dev); > > int err; > > > > - mutex_lock(&team->lock); > > + if (!mutex_trylock(&team->lock)) > > + return -EBUSY; > > err = team_port_add(team, port_dev, extack); > > mutex_unlock(&team->lock); > > > > @@ -1987,7 +1988,8 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) > > struct team *team = netdev_priv(dev); > > int err; > > > > - mutex_lock(&team->lock); > > + if (!mutex_trylock(&team->lock)) > > + return -EBUSY; > > err = team_port_del(team, port_dev); > > mutex_unlock(&team->lock); > > > > -- > > Failing team_del_slave() is not an option. It will add various issues. Thank you for comment. So, how about briefly releasing the lock before calling dev_open() in team_port_add() and then locking it again? dev_open() does not use &team, so disabling it briefly will not cause any major problems. Regards, Jeongjun Park --- drivers/net/team/team_core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ab1935a4aa2c..245566a1875d 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1213,7 +1213,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, goto err_port_enter; } + mutex_unlock(&team->lock); err = dev_open(port_dev, extack); + mutex_lock(&team->lock); if (err) { netdev_dbg(dev, "Device %s opening failed\n", portname); --
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ab1935a4aa2c..3ac82df876b0 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1970,11 +1970,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, struct netlink_ext_ack *extack) { struct team *team = netdev_priv(dev); - int err; + int err, locked; - mutex_lock(&team->lock); + locked = mutex_trylock(&team->lock); err = team_port_add(team, port_dev, extack); - mutex_unlock(&team->lock); + if (locked) + mutex_unlock(&team->lock); if (!err) netdev_change_features(dev); @@ -1985,11 +1986,12 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, static int team_del_slave(struct net_device *dev, struct net_device *port_dev) { struct team *team = netdev_priv(dev); - int err; + int err, locked; - mutex_lock(&team->lock); + locked = mutex_trylock(&team->lock); err = team_port_del(team, port_dev); - mutex_unlock(&team->lock); + if (locked) + mutex_unlock(&team->lock); if (err) return err;
CPU0 CPU1 ---- ---- lock(&rdev->wiphy.mtx); lock(team->team_lock_key#4); lock(&rdev->wiphy.mtx); lock(team->team_lock_key#4); Deadlock occurs due to the above scenario. Therefore, modify the code as shown in the patch below to prevent deadlock. Regards, Jeongjun Park. Reported-and-tested-by: syzbot+705c61d60b091ef42c04@syzkaller.appspotmail.com Fixes: 61dc3461b954 ("team: convert overall spinlock to mutex") Signed-off-by: Jeongjun Park <aha310510@gmail.com> --- drivers/net/team/team_core.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) --