diff mbox

[rdma-rc,3/7] IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions

Message ID 1465042524-25852-4-git-send-email-leon@kernel.org (mailing list archive)
State Accepted
Headers show

Commit Message

Leon Romanovsky June 4, 2016, 12:15 p.m. UTC
From: Erez Shitrit <erezsh@mellanox.com>

In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
operation like dev_change_flags or unregister_netdev, while sysfs
callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
free its sysfs directory, and we will get  a->b, b->a deadlock.

    Trace like the following:

        schedule+0x37/0x80
        schedule_preempt_disabled+0xe/0x10
        __mutex_lock_slowpath+0xb5/0x120
        mutex_lock+0x23/0x40
        rtnl_lock+0x15/0x20
        netdev_run_todo+0x17c/0x320
        rtnl_unlock+0xe/0x10
        ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
        delete_child+0x54/0x80 [ib_ipoib]
        dev_attr_store+0x18/0x30
        sysfs_kf_write+0x37/0x40
        mutex_lock+0x16/0x40
        SyS_write+0x55/0xc0
        entry_SYSCALL_64_fastpath+0x16/0x75
    And
        schedule+0x37/0x80
        __kernfs_remove+0x1a8/0x260
        ? wake_atomic_t_function+0x60/0x60
        kernfs_remove+0x25/0x40
        sysfs_remove_dir+0x50/0x80
        kobject_del+0x18/0x50
        device_del+0x19f/0x260
        netdev_unregister_kobject+0x6a/0x80
        rollback_registered_many+0x1fd/0x340
        rollback_registered+0x3c/0x70
        unregister_netdevice_queue+0x55/0xc0
        unregister_netdev+0x20/0x30
        ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
        ib_unregister_client+0x4a/0x170 [ib_core]
        ? find_module_all+0x71/0xa0
        ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
        SyS_delete_module+0x1b5/0x210
        entry_SYSCALL_64_fastpath+0x16/0x75

The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
get out from the sysfs function.

Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      | 1 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 4 ++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 3 +++
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 6 ++++++
 4 files changed, 14 insertions(+)

Comments

Or Gerlitz June 7, 2016, 7:31 a.m. UTC | #1
On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote:

> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")

These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never
got reports on this ABBA deadlock happens in real life - any idea, did
you see the deadlock in
action, ever? We did had lockdep warnings and thanks you Erez for
finally getting there to fix after me... I will look on your patch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Erez Shitrit June 7, 2016, 10:45 a.m. UTC | #2
On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote:
>
>> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
>> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
>
> These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never
> got reports on this ABBA deadlock happens in real life - any idea, did
> you see the deadlock in

Sure, it is from the real life, you can see the stack that i added to
the commit message.
If you want to simulate that just add sleep in the remove_one while
trying to change mode/add-child etc.

> action, ever? We did had lockdep warnings and thanks you Erez for

welcome.

> finally getting there to fix after me... I will look on your patch.

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz June 7, 2016, 8:22 p.m. UTC | #3
On Tue, Jun 7, 2016 at 1:45 PM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote:
> On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote:
>>
>>> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
>>> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
>>
>> These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never
>> got reports on this ABBA deadlock happens in real life - any idea, did
>> you see the deadlock in
>
> Sure, it is from the real life, you can see the stack that i added to
> the commit message.

yes, but the bug you're pointing on is out there for whole four years
and no one complained including you, can you explain that?

> If you want to simulate that just add sleep in the remove_one while
> trying to change mode/add-child etc.

so why need to simulate? is that impossible to get it on non simulated env?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Erez Shitrit June 8, 2016, 7:01 a.m. UTC | #4
On Tue, Jun 7, 2016 at 11:22 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Jun 7, 2016 at 1:45 PM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote:
>> On Tue, Jun 7, 2016 at 10:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote:
>>>
>>>> Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
>>>> Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
>>>
>>> These two commits go back to 3.7-rc1 / Oct 2012 -- so I wonder how come we never
>>> got reports on this ABBA deadlock happens in real life - any idea, did
>>> you see the deadlock in
>>
>> Sure, it is from the real life, you can see the stack that i added to
>> the commit message.
>
> yes, but the bug you're pointing on is out there for whole four years
> and no one complained including you, can you explain that?

Not really, it is a race, not easy to see it, but it is there, waiting..

>
>> If you want to simulate that just add sleep in the remove_one while
>> trying to change mode/add-child etc.
>
> so why need to simulate? is that impossible to get it on non simulated env?

It happened in our regression system.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz June 8, 2016, 7:29 a.m. UTC | #5
On Wed, Jun 8, 2016 at 10:01 AM, Erez Shitrit <erezsh@dev.mellanox.co.il> wrote:
> On Tue, Jun 7, 2016 at 11:22 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:

>> yes, but the bug you're pointing on is out there for whole four years
>> and no one complained including you, can you explain that?

> Not really, it is a race, not easy to see it, but it is there, waiting..

yes, seems this is the case,

>> so why need to simulate? is that impossible to get it on non simulated env?

> It happened in our regression system.

so after four years the deadlock happened once in the regression
system and no one
else complained, that sounds like hard to reproduce one... good catch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index bab7db6..4f7d9b4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -94,6 +94,7 @@  enum {
 	IPOIB_NEIGH_TBL_FLUSH	  = 12,
 	IPOIB_FLAG_DEV_ADDR_SET	  = 13,
 	IPOIB_FLAG_DEV_ADDR_CTRL  = 14,
+	IPOIB_FLAG_GOING_DOWN	  = 15,
 
 	IPOIB_MAX_BACKOFF_SECONDS = 16,
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index b2f4283..951d9ab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1486,6 +1486,10 @@  static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 {
 	struct net_device *dev = to_net_dev(d);
 	int ret;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	if (test_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags))
+		return -EPERM;
 
 	if (!rtnl_trylock())
 		return restart_syscall();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index c558c32..1cd569f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2143,6 +2143,9 @@  static void ipoib_remove_one(struct ib_device *device, void *client_data)
 		ib_unregister_event_handler(&priv->event_handler);
 		flush_workqueue(ipoib_workqueue);
 
+		/* mark interface in the middle of destruction */
+		set_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags);
+
 		rtnl_lock();
 		dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP);
 		rtnl_unlock();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 64a3559..a2f9f29 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -131,6 +131,9 @@  int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 
 	ppriv = netdev_priv(pdev);
 
+	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
+		return -EPERM;
+
 	snprintf(intf_name, sizeof intf_name, "%s.%04x",
 		 ppriv->dev->name, pkey);
 	priv = ipoib_intf_alloc(intf_name);
@@ -183,6 +186,9 @@  int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 
 	ppriv = netdev_priv(pdev);
 
+	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
+		return -EPERM;
+
 	if (!rtnl_trylock())
 		return restart_syscall();