diff mbox series

[for-next,2/4] RDMA/bnxt_re: Refactor rdev init/uninit path and simplifying rtnl_lock usage

Message ID 1544122186-7610-3-git-send-email-selvin.xavier@broadcom.com (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series RDMA/bnxt_re: driver update | expand

Commit Message

Selvin Xavier Dec. 6, 2018, 6:49 p.m. UTC
From: Somnath Kotur <somnath.kotur@broadcom.com>

 - Add more flags for better granularity in rdev init/uninit path.
 - bnxt_re_ib_reg() handles two main functionalities - initializing
   the device and registering with the IB stack.  Split it into 2
   functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
   for the same thereby improve modularity. Do the same for
   bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
   and  bnxt_re_ib_uninit().
 - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
   takes rtnl_lock inside for the entire function, so it is better to
   invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
   already for  NETDEV_EVENT.

 - Ensure bnxt_re_dev_uninit() is always invoked with rtnl_lock held
   as is necessitated by L2 driver when invoking it's APIs.

 - Simplify the code by combining the different steps to add and
   remove the device into two functions.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h |  16 ++-
 drivers/infiniband/hw/bnxt_re/main.c    | 204 ++++++++++++++++++--------------
 2 files changed, 128 insertions(+), 92 deletions(-)

Comments

Jason Gunthorpe Dec. 6, 2018, 6:57 p.m. UTC | #1
On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote:
> From: Somnath Kotur <somnath.kotur@broadcom.com>
> 
>  - Add more flags for better granularity in rdev init/uninit path.
>  - bnxt_re_ib_reg() handles two main functionalities - initializing
>    the device and registering with the IB stack.  Split it into 2
>    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
>    for the same thereby improve modularity. Do the same for
>    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
>    and  bnxt_re_ib_uninit().
>  - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
>    takes rtnl_lock inside for the entire function, so it is better to
>    invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
>    already for  NETDEV_EVENT.

Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or
really any lock, unless done carefully). Looks like bnxt does this
already and prehaps this patch makes it worse?

Jason
Selvin Xavier Dec. 7, 2018, 6:13 a.m. UTC | #2
On Fri, Dec 7, 2018 at 12:27 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote:
> > From: Somnath Kotur <somnath.kotur@broadcom.com>
> >
> >  - Add more flags for better granularity in rdev init/uninit path.
> >  - bnxt_re_ib_reg() handles two main functionalities - initializing
> >    the device and registering with the IB stack.  Split it into 2
> >    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
> >    for the same thereby improve modularity. Do the same for
> >    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
> >    and  bnxt_re_ib_uninit().
> >  - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
> >    takes rtnl_lock inside for the entire function, so it is better to
> >    invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
> >    already for  NETDEV_EVENT.
>
> Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or
> really any lock, unless done carefully). Looks like bnxt does this
> already and prehaps this patch makes it worse?
>
Not really. You are right that bnxt is already calling
ib_unregister_device from rtnl_lock.
Current code has three instances of calling ib_unregister_device from
rtnl_lock context,
We  are removing two of them in this patch.

Un-int routine is split into two. bnxt_re_ib_uninit and bnxt_re_remove_device .
bnxt_re_ib_uninit  calls ib_unregister_device and
bnxt_re_remove_device destroys the resources from HW.

In this patch, we call bnxt_re_ib_uninit outside rtnl_lock and
bnxt_re_remove_device with rtnl_lock
 (except in bnxt_re_shutdown which i will explain later).

In case of mod_exit, bnxt_re_ib_uninit  is invoked first outside the rtnl_lock.
After this, rtnl_lock is acquired and bnxt_re_remove_device is invoked.

In the netdev_event to unregister the device,  we handle the uninit in
two stages now

>>         case NETDEV_UNREGISTER:
>> @@ -1555,9 +1581,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
>>                  */
>>                 if (atomic_read(&rdev->sched_count) > 0)
>>                         goto exit;
>> -               bnxt_re_ib_unreg(rdev);
>> -               bnxt_re_remove_one(rdev);
>> -               bnxt_re_dev_unreg(rdev);
>> +               /* Schedule work for unregistering from IB stack */
>> +               if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
>> +                       sch_work = true;
>> +                       break;
>> +               }

if the IB device is registered, we schedule a work to invoke
ib_unregister_device and return notifier.
So bnxt_re_ib_uninit is invoked from the bnxt_re_task.

>> +
>> +               /* IB device is removed now. Destroy the device */
>> +               bnxt_re_remove_device(rdev);
>>                 break;

Since the netdev reference is not released by bnxt_re driver (ref is
released in bnxt_re_remove_device),
the network stack will invoke all notifiers again till all the
references are released.
During the second or later invocation of the netdev notifier, we
proceed with the bnxt_re_remove_device.

bnxt_re_shutdown is a call back from the L2 driver. L2 driver acquires
the rtnl_lock
and invokes this hook.  Current Interface doesn't allow us to return a
failure and allow L2
driver to call back again. We will come up with a solution for this
and will fix it in another patch.
Jason Gunthorpe Dec. 7, 2018, 9:13 p.m. UTC | #3
On Fri, Dec 07, 2018 at 11:43:50AM +0530, Selvin Xavier wrote:
> On Fri, Dec 7, 2018 at 12:27 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote:
> > > From: Somnath Kotur <somnath.kotur@broadcom.com>
> > >
> > >  - Add more flags for better granularity in rdev init/uninit path.
> > >  - bnxt_re_ib_reg() handles two main functionalities - initializing
> > >    the device and registering with the IB stack.  Split it into 2
> > >    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
> > >    for the same thereby improve modularity. Do the same for
> > >    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
> > >    and  bnxt_re_ib_uninit().
> > >  - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
> > >    takes rtnl_lock inside for the entire function, so it is better to
> > >    invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
> > >    already for  NETDEV_EVENT.
> >
> > Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or
> > really any lock, unless done carefully). Looks like bnxt does this
> > already and prehaps this patch makes it worse?
> >
> Not really. You are right that bnxt is already calling
> ib_unregister_device from rtnl_lock.
> Current code has three instances of calling ib_unregister_device from
> rtnl_lock context,
> We  are removing two of them in this patch.
> 
> Un-int routine is split into two. bnxt_re_ib_uninit and bnxt_re_remove_device .
> bnxt_re_ib_uninit  calls ib_unregister_device and
> bnxt_re_remove_device destroys the resources from HW.
> 
> In this patch, we call bnxt_re_ib_uninit outside rtnl_lock and
> bnxt_re_remove_device with rtnl_lock
>  (except in bnxt_re_shutdown which i will explain later).
> 
> In case of mod_exit, bnxt_re_ib_uninit  is invoked first outside the rtnl_lock.
> After this, rtnl_lock is acquired and bnxt_re_remove_device is invoked.

Yuk.

I'm looking at this driver and I see the same basic racy mess I see in
rxe regarding liftime and locking. Look at the patch I just sent to
Steve and consider using its new functions in this driver too instead
of the bnxt_re_dev_list, and other related buggyness.

Obviously the bnxt_re_shutdown problem has to be fixed too..

> >> @@ -1555,9 +1581,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
> >>                  */
> >>                 if (atomic_read(&rdev->sched_count) > 0)
> >>                         goto exit;
> >> -               bnxt_re_ib_unreg(rdev);
> >> -               bnxt_re_remove_one(rdev);
> >> -               bnxt_re_dev_unreg(rdev);
> >> +               /* Schedule work for unregistering from IB stack */
> >> +               if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
> >> +                       sch_work = true;
> >> +                       break;
> >> +               }
> 
> if the IB device is registered, we schedule a work to invoke
> ib_unregister_device and return notifier.
> So bnxt_re_ib_uninit is invoked from the bnxt_re_task.

BNXT_RE_FLAG_IBDEV_REGISTERED is also all racy and wrong.

> Since the netdev reference is not released by bnxt_re driver (ref is
> released in bnxt_re_remove_device),
> the network stack will invoke all notifiers again till all the
> references are released.
> During the second or later invocation of the netdev notifier, we
> proceed with the bnxt_re_remove_device.

That is also racy. unregister_netdevice_notifier() could have
happened between the first and second call, so now it leaks.

If ib_unregister happens the driver must do the remove_device stuff
immediately.

So, I think you need to use that stuff I sent to Steve. This patch
might be an improvement if the above is fixed, maybe, it isn't
clear. (and Steve, now that I see the same mess in bnxt, I'm more than
convinced that is the right approach to your problem too)

However, I'd rather see this as part of a series that comprehensively
fixes all of this.

Jason
Selvin Xavier Dec. 10, 2018, 5:36 a.m. UTC | #4
On Sat, Dec 8, 2018 at 2:43 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Fri, Dec 07, 2018 at 11:43:50AM +0530, Selvin Xavier wrote:
> > On Fri, Dec 7, 2018 at 12:27 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote:
> > > > From: Somnath Kotur <somnath.kotur@broadcom.com>
> > > >
> > > >  - Add more flags for better granularity in rdev init/uninit path.
> > > >  - bnxt_re_ib_reg() handles two main functionalities - initializing
> > > >    the device and registering with the IB stack.  Split it into 2
> > > >    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
> > > >    for the same thereby improve modularity. Do the same for
> > > >    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
> > > >    and  bnxt_re_ib_uninit().
> > > >  - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
> > > >    takes rtnl_lock inside for the entire function, so it is better to
> > > >    invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
> > > >    already for  NETDEV_EVENT.
> > >
> > > Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or
> > > really any lock, unless done carefully). Looks like bnxt does this
> > > already and prehaps this patch makes it worse?
> > >
> > Not really. You are right that bnxt is already calling
> > ib_unregister_device from rtnl_lock.
> > Current code has three instances of calling ib_unregister_device from
> > rtnl_lock context,
> > We  are removing two of them in this patch.
> >
> > Un-int routine is split into two. bnxt_re_ib_uninit and bnxt_re_remove_device .
> > bnxt_re_ib_uninit  calls ib_unregister_device and
> > bnxt_re_remove_device destroys the resources from HW.
> >
> > In this patch, we call bnxt_re_ib_uninit outside rtnl_lock and
> > bnxt_re_remove_device with rtnl_lock
> >  (except in bnxt_re_shutdown which i will explain later).
> >
> > In case of mod_exit, bnxt_re_ib_uninit  is invoked first outside the rtnl_lock.
> > After this, rtnl_lock is acquired and bnxt_re_remove_device is invoked.
>
> Yuk.
>
> I'm looking at this driver and I see the same basic racy mess I see in
> rxe regarding liftime and locking. Look at the patch I just sent to
> Steve and consider using its new functions in this driver too instead
> of the bnxt_re_dev_list, and other related buggyness.
>
> Obviously the bnxt_re_shutdown problem has to be fixed too..
>
> > >> @@ -1555,9 +1581,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
> > >>                  */
> > >>                 if (atomic_read(&rdev->sched_count) > 0)
> > >>                         goto exit;
> > >> -               bnxt_re_ib_unreg(rdev);
> > >> -               bnxt_re_remove_one(rdev);
> > >> -               bnxt_re_dev_unreg(rdev);
> > >> +               /* Schedule work for unregistering from IB stack */
> > >> +               if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
> > >> +                       sch_work = true;
> > >> +                       break;
> > >> +               }
> >
> > if the IB device is registered, we schedule a work to invoke
> > ib_unregister_device and return notifier.
> > So bnxt_re_ib_uninit is invoked from the bnxt_re_task.
>
> BNXT_RE_FLAG_IBDEV_REGISTERED is also all racy and wrong.
>
> > Since the netdev reference is not released by bnxt_re driver (ref is
> > released in bnxt_re_remove_device),
> > the network stack will invoke all notifiers again till all the
> > references are released.
> > During the second or later invocation of the netdev notifier, we
> > proceed with the bnxt_re_remove_device.
>
> That is also racy. unregister_netdevice_notifier() could have
> happened between the first and second call, so now it leaks.
>
> If ib_unregister happens the driver must do the remove_device stuff
> immediately.
>
> So, I think you need to use that stuff I sent to Steve. This patch
> might be an improvement if the above is fixed, maybe, it isn't
> clear. (and Steve, now that I see the same mess in bnxt, I'm more than
> convinced that is the right approach to your problem too)
>
> However, I'd rather see this as part of a series that comprehensively
> fixes all of this.
This patch is definitely an improvement on the existing driver issues.
But i am okay to rework with your patch. I will take your patch, test
it  and get back.
 Was your patch cut against for-next branch? it was not applying
cleanly on for-next.

>
> Jason
Jason Gunthorpe Dec. 10, 2018, 9 p.m. UTC | #5
On Mon, Dec 10, 2018 at 11:06:29AM +0530, Selvin Xavier wrote:
> > If ib_unregister happens the driver must do the remove_device stuff
> > immediately.
> >
> > So, I think you need to use that stuff I sent to Steve. This patch
> > might be an improvement if the above is fixed, maybe, it isn't
> > clear. (and Steve, now that I see the same mess in bnxt, I'm more than
> > convinced that is the right approach to your problem too)
> >
> > However, I'd rather see this as part of a series that comprehensively
> > fixes all of this.
>
> This patch is definitely an improvement on the existing driver
> issues.

I'm happier if we fix all the bug we can see when we start looking at
an issue, rather than just the ones testing hits or something...

> But i am okay to rework with your patch. I will take your patch,
> test it and get back.  Was your patch cut against for-next branch?
> it was not applying cleanly on for-next.

Yes.. It applies on top of e7521d82b33593c9b3ffd1e49a7ea2999ddc2285

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index d2fa2a6b..d8c218c 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -120,11 +120,17 @@  struct bnxt_re_dev {
 #define BNXT_RE_FLAG_IBDEV_REGISTERED		1
 #define BNXT_RE_FLAG_GOT_MSIX			2
 #define BNXT_RE_FLAG_HAVE_L2_REF		3
-#define BNXT_RE_FLAG_RCFW_CHANNEL_EN		4
-#define BNXT_RE_FLAG_QOS_WORK_REG		5
-#define BNXT_RE_FLAG_RESOURCES_ALLOCATED	7
-#define BNXT_RE_FLAG_RESOURCES_INITIALIZED	8
-#define BNXT_RE_FLAG_ISSUE_ROCE_STATS          29
+#define BNXT_RE_FLAG_ALLOC_RCFW			4
+#define BNXT_RE_FLAG_NET_RING_ALLOC		5
+#define BNXT_RE_FLAG_RCFW_CHANNEL_EN		6
+#define BNXT_RE_FLAG_ALLOC_CTX			7
+#define BNXT_RE_FLAG_STATS_CTX_ALLOC		8
+#define BNXT_RE_FLAG_STATS_CTX2_ALLOC		9
+#define BNXT_RE_FLAG_RCFW_CHANNEL_INIT		10
+#define BNXT_RE_FLAG_QOS_WORK_REG		11
+#define BNXT_RE_FLAG_RESOURCES_ALLOCATED	12
+#define BNXT_RE_FLAG_RESOURCES_INITIALIZED	13
+#define BNXT_RE_FLAG_ISSUE_ROCE_STATS		29
 	struct net_device		*netdev;
 	unsigned int			version, major, minor;
 	struct bnxt_en_dev		*en_dev;
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 77f095e..bf533a5 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -78,7 +78,8 @@  static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
 /* Mutex to protect the list of bnxt_re devices added */
 static DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev);
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
+static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
 
 /* SR-IOV helper functions */
 
@@ -182,7 +183,9 @@  static void bnxt_re_shutdown(void *p)
 	if (!rdev)
 		return;
 
-	bnxt_re_ib_unreg(rdev);
+	bnxt_re_ib_uninit(rdev);
+	/* rtnl_lock held by L2 before coming here */
+	bnxt_re_remove_device(rdev);
 }
 
 static void bnxt_re_stop_irq(void *handle)
@@ -563,11 +566,6 @@  static const struct attribute_group bnxt_re_dev_attr_group = {
 	.attrs = bnxt_re_attributes,
 };
 
-static void bnxt_re_unregister_ib(struct bnxt_re_dev *rdev)
-{
-	ib_unregister_device(&rdev->ibdev);
-}
-
 static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 {
 	struct ib_device *ibdev = &rdev->ibdev;
@@ -1203,14 +1201,41 @@  static int bnxt_re_setup_qos(struct bnxt_re_dev *rdev)
 	return 0;
 }
 
-static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
+static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev)
 {
-	int rc;
 
-	if (test_and_clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
-		/* Cleanup ib dev */
-		bnxt_re_unregister_ib(rdev);
+	/* Cleanup ib dev */
+	if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
+		ib_unregister_device(&rdev->ibdev);
+		clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
 	}
+}
+
+int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
+{
+	int rc = 0;
+
+	/* Register ib dev */
+	rc = bnxt_re_register_ib(rdev);
+	if (rc) {
+		pr_err("Failed to register with IB: %#x\n", rc);
+		return rc;
+	}
+	set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	dev_info(rdev_to_dev(rdev), "Device registered successfully");
+	ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
+			 &rdev->active_width);
+	set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
+	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_GID_CHANGE);
+
+	return rc;
+}
+
+static void bnxt_re_dev_uninit(struct bnxt_re_dev *rdev)
+{
+	int rc;
+
 	if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
 		cancel_delayed_work_sync(&rdev->worker);
 
@@ -1220,17 +1245,22 @@  static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev)
 	if (test_and_clear_bit(BNXT_RE_FLAG_RESOURCES_ALLOCATED, &rdev->flags))
 		bnxt_re_free_res(rdev);
 
-	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags)) {
+	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_INIT, &rdev->flags)) {
 		rc = bnxt_qplib_deinit_rcfw(&rdev->rcfw);
 		if (rc)
 			dev_warn(rdev_to_dev(rdev),
 				 "Failed to deinitialize RCFW: %#x", rc);
+	}
+	if (test_and_clear_bit(BNXT_RE_FLAG_STATS_CTX_ALLOC, &rdev->flags))
 		bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
+	if (test_and_clear_bit(BNXT_RE_FLAG_ALLOC_CTX, &rdev->flags))
 		bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
+	if (test_and_clear_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags))
 		bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
+	if (test_and_clear_bit(BNXT_RE_FLAG_NET_RING_ALLOC, &rdev->flags))
 		bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id);
+	if (test_and_clear_bit(BNXT_RE_FLAG_ALLOC_RCFW, &rdev->flags))
 		bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
-	}
 	if (test_and_clear_bit(BNXT_RE_FLAG_GOT_MSIX, &rdev->flags)) {
 		rc = bnxt_re_free_msix(rdev);
 		if (rc)
@@ -1255,20 +1285,14 @@  static void bnxt_re_worker(struct work_struct *work)
 	schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
 }
 
-static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
+static int bnxt_re_dev_init(struct bnxt_re_dev *rdev)
 {
-	int rc;
-
-	bool locked;
+	int rc = 0;
 
-	/* Acquire rtnl lock through out this function */
-	rtnl_lock();
-	locked = true;
 
 	/* Registered a new RoCE device instance to netdev */
 	rc = bnxt_re_register_netdev(rdev);
 	if (rc) {
-		rtnl_unlock();
 		pr_err("Failed to register with netedev: %#x\n", rc);
 		return -EINVAL;
 	}
@@ -1294,6 +1318,9 @@  static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		pr_err("Failed to allocate RCFW Channel: %#x\n", rc);
 		goto fail;
 	}
+
+	set_bit(BNXT_RE_FLAG_ALLOC_RCFW, &rdev->flags);
+
 	rc = bnxt_re_net_ring_alloc
 			(rdev, rdev->rcfw.creq.pbl[PBL_LVL_0].pg_map_arr,
 			 rdev->rcfw.creq.pbl[rdev->rcfw.creq.level].pg_count,
@@ -1302,8 +1329,10 @@  static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 			 &rdev->rcfw.creq_ring_id);
 	if (rc) {
 		pr_err("Failed to allocate CREQ: %#x\n", rc);
-		goto free_rcfw;
+		goto fail;
 	}
+	set_bit(BNXT_RE_FLAG_NET_RING_ALLOC, &rdev->flags);
+
 	rc = bnxt_qplib_enable_rcfw_channel
 				(rdev->en_dev->pdev, &rdev->rcfw,
 				 rdev->msix_entries[BNXT_RE_AEQ_IDX].vector,
@@ -1311,36 +1340,43 @@  static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 				 rdev->is_virtfn, &bnxt_re_aeq_handler);
 	if (rc) {
 		pr_err("Failed to enable RCFW channel: %#x\n", rc);
-		goto free_ring;
+		goto fail;
 	}
 
+	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags);
+
 	rc = bnxt_qplib_get_dev_attr(&rdev->rcfw, &rdev->dev_attr,
 				     rdev->is_virtfn);
 	if (rc)
-		goto disable_rcfw;
+		goto fail;
 	if (!rdev->is_virtfn)
 		bnxt_re_set_resource_limits(rdev);
 
 	rc = bnxt_qplib_alloc_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx, 0);
 	if (rc) {
 		pr_err("Failed to allocate QPLIB context: %#x\n", rc);
-		goto disable_rcfw;
+		goto fail;
 	}
+
+	set_bit(BNXT_RE_FLAG_ALLOC_CTX, &rdev->flags);
+
 	rc = bnxt_re_net_stats_ctx_alloc(rdev,
 					 rdev->qplib_ctx.stats.dma_map,
 					 &rdev->qplib_ctx.stats.fw_id);
 	if (rc) {
 		pr_err("Failed to allocate stats context: %#x\n", rc);
-		goto free_ctx;
+		goto fail;
 	}
 
+	set_bit(BNXT_RE_FLAG_STATS_CTX_ALLOC, &rdev->flags);
+
 	rc = bnxt_qplib_init_rcfw(&rdev->rcfw, &rdev->qplib_ctx,
 				  rdev->is_virtfn);
 	if (rc) {
 		pr_err("Failed to initialize RCFW: %#x\n", rc);
-		goto free_sctx;
+		goto fail;
 	}
-	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_EN, &rdev->flags);
+	set_bit(BNXT_RE_FLAG_RCFW_CHANNEL_INIT, &rdev->flags);
 
 	/* Resources based on the 'new' device caps */
 	rc = bnxt_re_alloc_res(rdev);
@@ -1367,40 +1403,10 @@  static int bnxt_re_ib_reg(struct bnxt_re_dev *rdev)
 		schedule_delayed_work(&rdev->worker, msecs_to_jiffies(30000));
 	}
 
-	rtnl_unlock();
-	locked = false;
-
-	/* Register ib dev */
-	rc = bnxt_re_register_ib(rdev);
-	if (rc) {
-		pr_err("Failed to register with IB: %#x\n", rc);
-		goto fail;
-	}
-	set_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
-	dev_info(rdev_to_dev(rdev), "Device registered successfully");
-	ib_get_eth_speed(&rdev->ibdev, 1, &rdev->active_speed,
-			 &rdev->active_width);
-	set_bit(BNXT_RE_FLAG_ISSUE_ROCE_STATS, &rdev->flags);
-	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_PORT_ACTIVE);
-	bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1, IB_EVENT_GID_CHANGE);
+	return rc;
 
-	return 0;
-free_sctx:
-	bnxt_re_net_stats_ctx_free(rdev, rdev->qplib_ctx.stats.fw_id);
-free_ctx:
-	bnxt_qplib_free_ctx(rdev->en_dev->pdev, &rdev->qplib_ctx);
-disable_rcfw:
-	bnxt_qplib_disable_rcfw_channel(&rdev->rcfw);
-free_ring:
-	bnxt_re_net_ring_free(rdev, rdev->rcfw.creq_ring_id);
-free_rcfw:
-	bnxt_qplib_free_rcfw_channel(&rdev->rcfw);
 fail:
-	if (!locked)
-		rtnl_lock();
-	bnxt_re_ib_unreg(rdev);
-	rtnl_unlock();
-
+	bnxt_re_dev_uninit(rdev);
 	return rc;
 }
 
@@ -1440,9 +1446,35 @@  static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device *netdev)
 	return rc;
 }
 
-static void bnxt_re_remove_one(struct bnxt_re_dev *rdev)
+static void bnxt_re_remove_device(struct bnxt_re_dev *rdev)
 {
+	bnxt_re_dev_uninit(rdev);
 	pci_dev_put(rdev->en_dev->pdev);
+	bnxt_re_dev_unreg(rdev);
+}
+
+static int bnxt_re_add_device(struct bnxt_re_dev **rdev,
+			      struct net_device *netdev)
+{
+	int rc;
+
+	rc = bnxt_re_dev_reg(rdev, netdev);
+	if (rc == -ENODEV)
+		return rc;
+	if (rc) {
+		pr_err("Failed to register with the device %s: %#x\n",
+		       netdev->name, rc);
+		return rc;
+	}
+
+	pci_dev_get((*rdev)->en_dev->pdev);
+	rc = bnxt_re_dev_init(*rdev);
+	if (rc) {
+		pci_dev_put((*rdev)->en_dev->pdev);
+		bnxt_re_dev_unreg(*rdev);
+	}
+
+	return rc;
 }
 
 /* Handle all deferred netevents tasks */
@@ -1457,19 +1489,23 @@  static void bnxt_re_task(struct work_struct *work)
 
 	if (re_work->event != NETDEV_REGISTER &&
 	    !test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
-		return;
+		goto done;
 
 	switch (re_work->event) {
 	case NETDEV_REGISTER:
-		rc = bnxt_re_ib_reg(rdev);
+		rc = bnxt_re_ib_init(rdev);
 		if (rc) {
 			dev_err(rdev_to_dev(rdev),
 				"Failed to register with IB: %#x", rc);
-			bnxt_re_remove_one(rdev);
-			bnxt_re_dev_unreg(rdev);
+			rtnl_lock();
+			bnxt_re_remove_device(rdev);
+			rtnl_unlock();
 			goto exit;
 		}
 		break;
+	case NETDEV_UNREGISTER:
+		bnxt_re_ib_uninit(rdev);
+		break;
 	case NETDEV_UP:
 		bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
 				       IB_EVENT_PORT_ACTIVE);
@@ -1489,17 +1525,14 @@  static void bnxt_re_task(struct work_struct *work)
 	default:
 		break;
 	}
+
+done:
 	smp_mb__before_atomic();
 	atomic_dec(&rdev->sched_count);
 exit:
 	kfree(re_work);
 }
 
-static void bnxt_re_init_one(struct bnxt_re_dev *rdev)
-{
-	pci_dev_get(rdev->en_dev->pdev);
-}
-
 /*
  * "Notifier chain callback can be invoked for the same chain from
  * different CPUs at the same time".
@@ -1537,16 +1570,9 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	case NETDEV_REGISTER:
 		if (rdev)
 			break;
-		rc = bnxt_re_dev_reg(&rdev, real_dev);
-		if (rc == -ENODEV)
-			break;
-		if (rc) {
-			pr_err("Failed to register with the device %s: %#x\n",
-			       real_dev->name, rc);
-			break;
-		}
-		bnxt_re_init_one(rdev);
-		sch_work = true;
+		rc = bnxt_re_add_device(&rdev, real_dev);
+		if (!rc)
+			sch_work = true;
 		break;
 
 	case NETDEV_UNREGISTER:
@@ -1555,9 +1581,14 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		 */
 		if (atomic_read(&rdev->sched_count) > 0)
 			goto exit;
-		bnxt_re_ib_unreg(rdev);
-		bnxt_re_remove_one(rdev);
-		bnxt_re_dev_unreg(rdev);
+		/* Schedule work for unregistering from IB stack */
+		if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
+			sch_work = true;
+			break;
+		}
+
+		/* IB device is removed now. Destroy the device */
+		bnxt_re_remove_device(rdev);
 		break;
 
 	default:
@@ -1634,12 +1665,11 @@  static void __exit bnxt_re_mod_exit(void)
 		 */
 		flush_workqueue(bnxt_re_wq);
 		bnxt_re_dev_stop(rdev);
+		bnxt_re_ib_uninit(rdev);
 		/* Acquire the rtnl_lock as the L2 resources are freed here */
 		rtnl_lock();
-		bnxt_re_ib_unreg(rdev);
+		bnxt_re_remove_device(rdev);
 		rtnl_unlock();
-		bnxt_re_remove_one(rdev);
-		bnxt_re_dev_unreg(rdev);
 	}
 	unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
 	if (bnxt_re_wq)