diff mbox series

[for-next,v4] IB/usnic: fix deadlock

Message ID 1549733310-14966-1-git-send-email-pkaustub@cisco.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [for-next,v4] IB/usnic: fix deadlock | expand

Commit Message

Parvi Kaustubhi (pkaustub) Feb. 9, 2019, 5:28 p.m. UTC
There is a dead lock in usnic ib_register and netdev_notify
path.

	usnic_ib_discover_pf()
	| mutex_lock(&usnic_ib_ibdev_list_lock);
	 | usnic_ib_device_add();
	  | ib_register_device()
	   | usnic_ib_query_port()
	    | mutex_lock(&us_ibdev->usdev_lock);
	     | ib_get_eth_speed()
	      | rtnl_lock()

order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock

	rtnl_lock()
	 | usnic_ib_netdevice_event()
	  | mutex_lock(&usnic_ib_ibdev_list_lock);

order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock

Solution is to use ib_device_get_by_netdev() to lookup ib_dev while
handling netdev/ inet events.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com>
---
Changelog:

v3->v4
* Added missing ib_device_put*  calls.
* Fixed wonky control flow

v2->v3:
* Jason: drivers should not hold any locks while calling ib_unregister_device()
* Jason: use https://github.com/jgunthorpe/linux/commits/device_locking_cleanup

v1->v2:
* Have notifier blocks in usnic_ib_dev instead of using workqueue to defer
event handling.
---
 drivers/infiniband/hw/usnic/usnic_ib_main.c | 30 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 16 deletions(-)

Comments

Jason Gunthorpe Feb. 19, 2019, 4:38 a.m. UTC | #1
On Sat, Feb 09, 2019 at 09:28:30AM -0800, Parvi Kaustubhi wrote:
> There is a dead lock in usnic ib_register and netdev_notify
> path.
> 
> 	usnic_ib_discover_pf()
> 	| mutex_lock(&usnic_ib_ibdev_list_lock);
> 	 | usnic_ib_device_add();
> 	  | ib_register_device()
> 	   | usnic_ib_query_port()
> 	    | mutex_lock(&us_ibdev->usdev_lock);
> 	     | ib_get_eth_speed()
> 	      | rtnl_lock()
> 
> order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock
> 
> 	rtnl_lock()
> 	 | usnic_ib_netdevice_event()
> 	  | mutex_lock(&usnic_ib_ibdev_list_lock);
> 
> order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock
> 
> Solution is to use ib_device_get_by_netdev() to lookup ib_dev while
> handling netdev/ inet events.
> 
> Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
> Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com>
> Changelog:
> 
> v3->v4
> * Added missing ib_device_put*  calls.
> * Fixed wonky control flow
> 
> v2->v3:
> * Jason: drivers should not hold any locks while calling ib_unregister_device()
> * Jason: use https://github.com/jgunthorpe/linux/commits/device_locking_cleanup
> 
> v1->v2:
> * Have notifier blocks in usnic_ib_dev instead of using workqueue to defer
> event handling.
>  drivers/infiniband/hw/usnic/usnic_ib_main.c | 30 ++++++++++++++---------------
>  1 file changed, 14 insertions(+), 16 deletions(-)

I added the hunks to use setnedev from my as-yet-unsent patch and
applies this to for-next. I assume you tested on top of that. You
should probably test the final result.

Thanks,
Jason
Parvi Kaustubhi (pkaustub) Feb. 19, 2019, 10:34 p.m. UTC | #2
Yes, I was testing on top of commits in https://github.com/jgunthorpe/linux/commits/device_locking_cleanup
I have tested the latest (with the updated patch) on branch wip/jgg-for-next

Thanks,
Parvi

> On Feb 18, 2019, at 8:38 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Sat, Feb 09, 2019 at 09:28:30AM -0800, Parvi Kaustubhi wrote:
>> There is a dead lock in usnic ib_register and netdev_notify
>> path.
>> 
>> 	usnic_ib_discover_pf()
>> 	| mutex_lock(&usnic_ib_ibdev_list_lock);
>> 	 | usnic_ib_device_add();
>> 	  | ib_register_device()
>> 	   | usnic_ib_query_port()
>> 	    | mutex_lock(&us_ibdev->usdev_lock);
>> 	     | ib_get_eth_speed()
>> 	      | rtnl_lock()
>> 
>> order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock
>> 
>> 	rtnl_lock()
>> 	 | usnic_ib_netdevice_event()
>> 	  | mutex_lock(&usnic_ib_ibdev_list_lock);
>> 
>> order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock
>> 
>> Solution is to use ib_device_get_by_netdev() to lookup ib_dev while
>> handling netdev/ inet events.
>> 
>> Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
>> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
>> Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com>
>> Changelog:
>> 
>> v3->v4
>> * Added missing ib_device_put*  calls.
>> * Fixed wonky control flow
>> 
>> v2->v3:
>> * Jason: drivers should not hold any locks while calling ib_unregister_device()
>> * Jason: use https://github.com/jgunthorpe/linux/commits/device_locking_cleanup
>> 
>> v1->v2:
>> * Have notifier blocks in usnic_ib_dev instead of using workqueue to defer
>> event handling.
>> drivers/infiniband/hw/usnic/usnic_ib_main.c | 30 ++++++++++++++---------------
>> 1 file changed, 14 insertions(+), 16 deletions(-)
> 
> I added the hunks to use setnedev from my as-yet-unsent patch and
> applies this to for-next. I assume you tested on top of that. You
> should probably test the final result.
> 
> Thanks,
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 9529a08..d0ddbec 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -216,18 +216,17 @@  static int usnic_ib_netdevice_event(struct notifier_block *notifier,
 					unsigned long event, void *ptr)
 {
 	struct usnic_ib_dev *us_ibdev;
+	struct ib_device *ibdev;
 
 	struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
 
-	mutex_lock(&usnic_ib_ibdev_list_lock);
-	list_for_each_entry(us_ibdev, &usnic_ib_ibdev_list, ib_dev_link) {
-		if (us_ibdev->netdev == netdev) {
-			usnic_ib_handle_usdev_event(us_ibdev, event);
-			break;
-		}
-	}
-	mutex_unlock(&usnic_ib_ibdev_list_lock);
+	ibdev = ib_device_get_by_netdev(netdev, RDMA_DRIVER_USNIC);
+	if (!ibdev)
+		return NOTIFY_DONE;
 
+	us_ibdev = container_of(ibdev, struct usnic_ib_dev, ib_dev);
+	usnic_ib_handle_usdev_event(us_ibdev, event);
+	ib_device_put(ibdev);
 	return NOTIFY_DONE;
 }
 
@@ -282,16 +281,15 @@  static int usnic_ib_inetaddr_event(struct notifier_block *notifier,
 	struct usnic_ib_dev *us_ibdev;
 	struct in_ifaddr *ifa = ptr;
 	struct net_device *netdev = ifa->ifa_dev->dev;
+	struct ib_device *ibdev;
 
-	mutex_lock(&usnic_ib_ibdev_list_lock);
-	list_for_each_entry(us_ibdev, &usnic_ib_ibdev_list, ib_dev_link) {
-		if (us_ibdev->netdev == netdev) {
-			usnic_ib_handle_inet_event(us_ibdev, event, ptr);
-			break;
-		}
-	}
-	mutex_unlock(&usnic_ib_ibdev_list_lock);
+	ibdev = ib_device_get_by_netdev(netdev, RDMA_DRIVER_USNIC);
+	if (!ibdev)
+		return NOTIFY_DONE;
 
+	us_ibdev = container_of(ibdev, struct usnic_ib_dev, ib_dev);
+	usnic_ib_handle_inet_event(us_ibdev, event, ptr);
+	ib_device_put(ibdev);
 	return NOTIFY_DONE;
 }
 static struct notifier_block usnic_ib_inetaddr_notifier = {