Message ID | 1465042524-25852-4-git-send-email-leon@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote: > Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") > Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never got reports on this ABBA deadlock happens in real life - any idea, did you see the deadlock in action, ever? We did had lockdep warnings and thanks you Erez for finally getting there to fix after me... I will look on your patch. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote: > >> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") >> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") > > These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never > got reports on this ABBA deadlock happens in real life - any idea, did > you see the deadlock in Sure, it is from the real life, you can see the stack that i added to the commit message. If you want to simulate that just add sleep in the remove_one while trying to change mode/add-child etc. > action, ever? We did had lockdep warnings and thanks you Erez for welcome. > finally getting there to fix after me... I will look on your patch. > > Or. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 7, 2016 at 1:45 PM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote: > On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >> On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote: >> >>> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") >>> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") >> >> These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never >> got reports on this ABBA deadlock happens in real life - any idea, did >> you see the deadlock in > > Sure, it is from the real life, you can see the stack that i added to > the commit message. yes, but the bug you're pointing on is out there for whole four years and no one complained including you, can you explain that? > If you want to simulate that just add sleep in the remove_one while > trying to change mode/add-child etc. so why need to simulate? is that impossible to get it on non simulated env? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 7, 2016 at 11:22 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Tue, Jun 7, 2016 at 1:45 PM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote: >> On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >>> On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote: >>> >>>> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") >>>> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") >>> >>> These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never >>> got reports on this ABBA deadlock happens in real life - any idea, did >>> you see the deadlock in >> >> Sure, it is from the real life, you can see the stack that i added to >> the commit message. > > yes, but the bug you're pointing on is out there for whole four years > and no one complained including you, can you explain that? Not really, it is a race, not easy to see it, but it is there, waiting.. > >> If you want to simulate that just add sleep in the remove_one while >> trying to change mode/add-child etc. > > so why need to simulate? is that impossible to get it on non simulated env? It happened in our regression system. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 8, 2016 at 10:01 AM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote: > On Tue, Jun 7, 2016 at 11:22 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >> yes, but the bug you're pointing on is out there for whole four years >> and no one complained including you, can you explain that? > Not really, it is a race, not easy to see it, but it is there, waiting.. yes, seems this is the case, >> so why need to simulate? is that impossible to get it on non simulated env? > It happened in our regression system. so after four years the deadlock happened once in the regression system and no one else complained, that sounds like hard to reproduce one... good catch. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index bab7db6..4f7d9b4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -94,6 +94,7 @@ enum { IPOIB_NEIGH_TBL_FLUSH = 12, IPOIB_FLAG_DEV_ADDR_SET = 13, IPOIB_FLAG_DEV_ADDR_CTRL = 14, + IPOIB_FLAG_GOING_DOWN = 15, IPOIB_MAX_BACKOFF_SECONDS = 16, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index b2f4283..951d9ab 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1486,6 +1486,10 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr, { struct net_device *dev = to_net_dev(d); int ret; + struct ipoib_dev_priv *priv = netdev_priv(dev); + + if (test_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags)) + return -EPERM; if (!rtnl_trylock()) return restart_syscall(); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index c558c32..1cd569f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -2143,6 +2143,9 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data) ib_unregister_event_handler(&priv->event_handler); flush_workqueue(ipoib_workqueue); + /* mark interface in the middle of destruction */ + set_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags); + rtnl_lock(); dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP); rtnl_unlock(); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 64a3559..a2f9f29 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -131,6 +131,9 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) ppriv = netdev_priv(pdev); + if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) + return -EPERM; + snprintf(intf_name, sizeof intf_name, "%s.%04x", ppriv->dev->name, pkey); priv = ipoib_intf_alloc(intf_name); @@ -183,6 +186,9 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) ppriv = netdev_priv(pdev); + if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) + return -EPERM; + if (!rtnl_trylock()) return restart_syscall();