diff mbox series

[for-next,5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API

Message ID 1574671174-5064-6-git-send-email-selvin.xavier@broadcom.com (mailing list archive)
State Changes Requested
Headers show
Series RDMA/bnxt_re driver update | expand

Commit Message

Selvin Xavier Nov. 25, 2019, 8:39 a.m. UTC
Using the new unregister APIs provided by the core.
Provide the dealloc_driver hook for the core to callback at the time
of device un-registration. Also, simplify bnxt_re_from_netdev by
retrieving the netdev information from the core API- ib_device_get_by_netdev.

bnxt_re VF resources are created by the corresponding PF driver.
During driver unload, if PFs gets destroyed first, driver need
not send down any command to FW. So adding a state (res_deinit) in the
per device structures to detect driver removal and avoid failures
VF destroy due to driver removal.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/main.c       | 100 ++++++++++++-----------------
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |   7 ++
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |   1 +
 3 files changed, 48 insertions(+), 60 deletions(-)

Comments

Jason Gunthorpe Jan. 3, 2020, 7:44 p.m. UTC | #1
On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:

>  static void __exit bnxt_re_mod_exit(void)
>  {
> -	struct bnxt_re_dev *rdev, *next;
> -	LIST_HEAD(to_be_deleted);
> +	struct bnxt_re_dev *rdev;
>  
> +	flush_workqueue(bnxt_re_wq);
>  	mutex_lock(&bnxt_re_dev_lock);
> -	/* Free all adapter allocated resources */
> -	if (!list_empty(&bnxt_re_dev_list))
> -		list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> -	mutex_unlock(&bnxt_re_dev_lock);
> -       /*
> -	* Cleanup the devices in reverse order so that the VF device
> -	* cleanup is done before PF cleanup
> -	*/
> -	list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> -		dev_info(rdev_to_dev(rdev), "Unregistering Device");
> +	list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
>  		/*
> -		 * Flush out any scheduled tasks before destroying the
> -		 * resources
> +		 * Set unreg flag to avoid VF resource cleanup
> +		 * in module unload path. This is required because
> +		 * dealloc_driver for VF can come after PF cleaning
> +		 * the VF resources.
>  		 */
> -		flush_workqueue(bnxt_re_wq);
> -		bnxt_re_dev_stop(rdev);
> -		bnxt_re_ib_uninit(rdev);
> -		/* Acquire the rtnl_lock as the L2 resources are freed here */
> -		rtnl_lock();
> -		bnxt_re_remove_device(rdev);
> -		rtnl_unlock();
> +		if (rdev->is_virtfn)
> +			rdev->rcfw.res_deinit = true;
>  	}
> +	mutex_unlock(&bnxt_re_dev_lock);

This is super ugly. This driver already has bugs if it has a
dependency on driver unbinding order as drivers can become unbound
from userspace using sysfs or hot un-plug in any ordering.

If the VF driver somehow depends on the PF driver then destruction of
the PF must synchronize and fence the VFs during it's own shutdown.

But this seems very very strange, how can it work if the VF is in a VM
or something and the PF driver is unplugged?

Jason
Selvin Xavier Feb. 18, 2020, 11:49 a.m. UTC | #2
On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
>
> >  static void __exit bnxt_re_mod_exit(void)
> >  {
> > -     struct bnxt_re_dev *rdev, *next;
> > -     LIST_HEAD(to_be_deleted);
> > +     struct bnxt_re_dev *rdev;
> >
> > +     flush_workqueue(bnxt_re_wq);
> >       mutex_lock(&bnxt_re_dev_lock);
> > -     /* Free all adapter allocated resources */
> > -     if (!list_empty(&bnxt_re_dev_list))
> > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > -     mutex_unlock(&bnxt_re_dev_lock);
> > -       /*
> > -     * Cleanup the devices in reverse order so that the VF device
> > -     * cleanup is done before PF cleanup
> > -     */
> > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> >               /*
> > -              * Flush out any scheduled tasks before destroying the
> > -              * resources
> > +              * Set unreg flag to avoid VF resource cleanup
> > +              * in module unload path. This is required because
> > +              * dealloc_driver for VF can come after PF cleaning
> > +              * the VF resources.
> >                */
> > -             flush_workqueue(bnxt_re_wq);
> > -             bnxt_re_dev_stop(rdev);
> > -             bnxt_re_ib_uninit(rdev);
> > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > -             rtnl_lock();
> > -             bnxt_re_remove_device(rdev);
> > -             rtnl_unlock();
> > +             if (rdev->is_virtfn)
> > +                     rdev->rcfw.res_deinit = true;
> >       }
> > +     mutex_unlock(&bnxt_re_dev_lock);
>
> This is super ugly. This driver already has bugs if it has a
> dependency on driver unbinding order as drivers can become unbound
> from userspace using sysfs or hot un-plug in any ordering.
>
The dependency is from the HW side and not from the driver side.
In some of the HW versions, RoCE PF driver is allowed to allocate the
host memory
for VFs and this dependency is due to this.
> If the VF driver somehow depends on the PF driver then destruction of
> the PF must synchronize and fence the VFs during it's own shutdown.
>
Do you suggest that synchronization should be done in the stack before
invoking the
dealloc_driver for VF?

> But this seems very strange, how can it work if the VF is in a VM
> or something and the PF driver is unplugged?
This code is not handling the case where the VF is attached to a VM.
 First command to HW after the removal of PF will fail with timeout.
Driver will stop sending commands to HW once it sees this timeout. VF
driver removal
will proceed with cleaning up of host resources without sending any
commands to FW
and exit the removal process.

On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
patch, when VF removal is initiated,
the first command will timeout and driver will stop sending any more commands
to FW and proceed with removal. All VFs will exit in the same way.
Just that each
function will wait for one command to timeout. Setting
rdev->rcfw.res_deinit was added
as a hack to avoid this  waiting time.
> Jason
Jason Gunthorpe Feb. 19, 2020, 12:15 a.m. UTC | #3
On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> >
> > >  static void __exit bnxt_re_mod_exit(void)
> > >  {
> > > -     struct bnxt_re_dev *rdev, *next;
> > > -     LIST_HEAD(to_be_deleted);
> > > +     struct bnxt_re_dev *rdev;
> > >
> > > +     flush_workqueue(bnxt_re_wq);
> > >       mutex_lock(&bnxt_re_dev_lock);
> > > -     /* Free all adapter allocated resources */
> > > -     if (!list_empty(&bnxt_re_dev_list))
> > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > -       /*
> > > -     * Cleanup the devices in reverse order so that the VF device
> > > -     * cleanup is done before PF cleanup
> > > -     */
> > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > >               /*
> > > -              * Flush out any scheduled tasks before destroying the
> > > -              * resources
> > > +              * Set unreg flag to avoid VF resource cleanup
> > > +              * in module unload path. This is required because
> > > +              * dealloc_driver for VF can come after PF cleaning
> > > +              * the VF resources.
> > >                */
> > > -             flush_workqueue(bnxt_re_wq);
> > > -             bnxt_re_dev_stop(rdev);
> > > -             bnxt_re_ib_uninit(rdev);
> > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > -             rtnl_lock();
> > > -             bnxt_re_remove_device(rdev);
> > > -             rtnl_unlock();
> > > +             if (rdev->is_virtfn)
> > > +                     rdev->rcfw.res_deinit = true;
> > >       }
> > > +     mutex_unlock(&bnxt_re_dev_lock);
> >
> > This is super ugly. This driver already has bugs if it has a
> > dependency on driver unbinding order as drivers can become unbound
> > from userspace using sysfs or hot un-plug in any ordering.
> >
> The dependency is from the HW side and not from the driver side.
> In some of the HW versions, RoCE PF driver is allowed to allocate the
> host memory
> for VFs and this dependency is due to this.
> > If the VF driver somehow depends on the PF driver then destruction of
> > the PF must synchronize and fence the VFs during it's own shutdown.
>
> Do you suggest that synchronization should be done in the stack before
> invoking the
> dealloc_driver for VF?

'in the stack'? This is a driver problem.. You can't assume ordering
of driver detaches in Linux, and drivers should really not be
co-ordinating across their instances.

> > But this seems very strange, how can it work if the VF is in a VM
> > or something and the PF driver is unplugged?

> This code is not handling the case where the VF is attached to a VM.
> First command to HW after the removal of PF will fail with timeout.
> Driver will stop sending commands to HW once it sees this timeout. VF
> driver removal
> will proceed with cleaning up of host resources without sending any
> commands to FW
> and exit the removal process.

So why not use this for the host case as well? The timeout is too
long?

> On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> patch, when VF removal is initiated,
> the first command will timeout and driver will stop sending any more commands
> to FW and proceed with removal. All VFs will exit in the same way.
> Just that each
> function will wait for one command to timeout. Setting
> rdev->rcfw.res_deinit was added
> as a hack to avoid this  waiting time.

The issue is that pci_driver_unregister undoes the driver binds in
FIFO not LIFO order?

What happens when the VF binds after the PF?

Jason
Selvin Xavier Feb. 19, 2020, 9:27 a.m. UTC | #4
On Wed, Feb 19, 2020 at 5:45 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> > On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> > >
> > > >  static void __exit bnxt_re_mod_exit(void)
> > > >  {
> > > > -     struct bnxt_re_dev *rdev, *next;
> > > > -     LIST_HEAD(to_be_deleted);
> > > > +     struct bnxt_re_dev *rdev;
> > > >
> > > > +     flush_workqueue(bnxt_re_wq);
> > > >       mutex_lock(&bnxt_re_dev_lock);
> > > > -     /* Free all adapter allocated resources */
> > > > -     if (!list_empty(&bnxt_re_dev_list))
> > > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > > -       /*
> > > > -     * Cleanup the devices in reverse order so that the VF device
> > > > -     * cleanup is done before PF cleanup
> > > > -     */
> > > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > > >               /*
> > > > -              * Flush out any scheduled tasks before destroying the
> > > > -              * resources
> > > > +              * Set unreg flag to avoid VF resource cleanup
> > > > +              * in module unload path. This is required because
> > > > +              * dealloc_driver for VF can come after PF cleaning
> > > > +              * the VF resources.
> > > >                */
> > > > -             flush_workqueue(bnxt_re_wq);
> > > > -             bnxt_re_dev_stop(rdev);
> > > > -             bnxt_re_ib_uninit(rdev);
> > > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > > -             rtnl_lock();
> > > > -             bnxt_re_remove_device(rdev);
> > > > -             rtnl_unlock();
> > > > +             if (rdev->is_virtfn)
> > > > +                     rdev->rcfw.res_deinit = true;
> > > >       }
> > > > +     mutex_unlock(&bnxt_re_dev_lock);
> > >
> > > This is super ugly. This driver already has bugs if it has a
> > > dependency on driver unbinding order as drivers can become unbound
> > > from userspace using sysfs or hot un-plug in any ordering.
> > >
> > The dependency is from the HW side and not from the driver side.
> > In some of the HW versions, RoCE PF driver is allowed to allocate the
> > host memory
> > for VFs and this dependency is due to this.
> > > If the VF driver somehow depends on the PF driver then destruction of
> > > the PF must synchronize and fence the VFs during it's own shutdown.
> >
> > Do you suggest that synchronization should be done in the stack before
> > invoking the
> > dealloc_driver for VF?
>
> 'in the stack'? This is a driver problem.. You can't assume ordering
> of driver detaches in Linux, and drivers should really not be
> co-ordinating across their instances.
>
> > > But this seems very strange, how can it work if the VF is in a VM
> > > or something and the PF driver is unplugged?
>
> > This code is not handling the case where the VF is attached to a VM.
> > First command to HW after the removal of PF will fail with timeout.
> > Driver will stop sending commands to HW once it sees this timeout. VF
> > driver removal
> > will proceed with cleaning up of host resources without sending any
> > commands to FW
> > and exit the removal process.
>
> So why not use this for the host case as well? The timeout is too
> long?
Yeah. Timeout for the first command is 20sec now. May be, I can use
a smaller timeout in the unreg path.
>
> > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> > patch, when VF removal is initiated,
> > the first command will timeout and driver will stop sending any more commands
> > to FW and proceed with removal. All VFs will exit in the same way.
> > Just that each
> > function will wait for one command to timeout. Setting
> > rdev->rcfw.res_deinit was added
> > as a hack to avoid this  waiting time.
>
> The issue is that pci_driver_unregister undoes the driver binds in
> FIFO not LIFO order?
>
> What happens when the VF binds after the PF?

This is not dependent on PCI driver unregister. This particular issue
is happening when bnxt_re
driver only  is unloaded and the new  ib_unregister_driver is invoked
by bnxt_re driver in the mod_exit hook.
dealloc_driver for each IB device  is called mostly in FIFO
order(using xa_for_each).  So since PF ib device was added first, it
gets removed and then VF is removed.

After this discussion, now I feel, it's better to remove the hack and
allow the commands to fail with timeout and exit.
The issue is seen only when users try to unload bnxt_re alone, without
destroying the VFs.The chances of this type of usage is slim.
Anyway, it's not a Fatal error if any of the users try this sequence.
Just that the rmmod will take some time to exit.

Shall i repost by removing this hack?
>
> Jason
Jason Gunthorpe Feb. 19, 2020, 1:15 p.m. UTC | #5
On Wed, Feb 19, 2020 at 02:57:45PM +0530, Selvin Xavier wrote:
> > > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> > > patch, when VF removal is initiated,
> > > the first command will timeout and driver will stop sending any more commands
> > > to FW and proceed with removal. All VFs will exit in the same way.
> > > Just that each
> > > function will wait for one command to timeout. Setting
> > > rdev->rcfw.res_deinit was added
> > > as a hack to avoid this  waiting time.
> >
> > The issue is that pci_driver_unregister undoes the driver binds in
> > FIFO not LIFO order?
> >
> > What happens when the VF binds after the PF?
> 
> This is not dependent on PCI driver unregister. This particular issue
> is happening when bnxt_re
> driver only  is unloaded and the new  ib_unregister_driver is invoked
> by bnxt_re driver in the mod_exit hook.

Oh, now I remember, this driver sits on top of netdev using the
notification stuff so things are a little weird.

> dealloc_driver for each IB device  is called mostly in FIFO
> order(using xa_for_each).  So since PF ib device was added first, it
> gets removed and then VF is removed.

I see.. You could probably arrange to remove your VFs first, then call
ib_unregister_driver() to finish it up and serialize away any
races. That would be reasonably clean and much better than hacking
special stuff in.

> Shall i repost by removing this hack?

Please

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 0cf38a4..9ea4ce7 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -79,7 +79,7 @@  static struct list_head bnxt_re_dev_list = LIST_HEAD_INIT(bnxt_re_dev_list);
 static DEFINE_MUTEX(bnxt_re_dev_lock);
 static struct workqueue_struct *bnxt_re_wq;
 static void bnxt_re_remove_device(struct bnxt_re_dev *rdev);
-static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev);
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev);
 
 static void bnxt_re_destroy_chip_ctx(struct bnxt_re_dev *rdev)
 {
@@ -223,9 +223,7 @@  static void bnxt_re_shutdown(void *p)
 	if (!rdev)
 		return;
 
-	bnxt_re_ib_uninit(rdev);
-	/* rtnl_lock held by L2 before coming here */
-	bnxt_re_remove_device(rdev);
+	ib_unregister_device_queued(&rdev->ibdev);
 }
 
 static void bnxt_re_stop_irq(void *handle)
@@ -527,17 +525,12 @@  static bool is_bnxt_re_dev(struct net_device *netdev)
 
 static struct bnxt_re_dev *bnxt_re_from_netdev(struct net_device *netdev)
 {
-	struct bnxt_re_dev *rdev;
+	struct ib_device *ibdev =
+		ib_device_get_by_netdev(netdev, RDMA_DRIVER_BNXT_RE);
+	if (!ibdev)
+		return NULL;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(rdev, &bnxt_re_dev_list, list) {
-		if (rdev->netdev == netdev) {
-			rcu_read_unlock();
-			return rdev;
-		}
-	}
-	rcu_read_unlock();
-	return NULL;
+	return container_of(ibdev, struct bnxt_re_dev, ibdev);
 }
 
 static void bnxt_re_dev_unprobe(struct net_device *netdev,
@@ -611,11 +604,6 @@  static const struct attribute_group bnxt_re_dev_attr_group = {
 	.attrs = bnxt_re_attributes,
 };
 
-static void bnxt_re_unregister_ib(struct bnxt_re_dev *rdev)
-{
-	ib_unregister_device(&rdev->ibdev);
-}
-
 static const struct ib_device_ops bnxt_re_dev_ops = {
 	.owner = THIS_MODULE,
 	.driver_id = RDMA_DRIVER_BNXT_RE,
@@ -630,6 +618,7 @@  static const struct ib_device_ops bnxt_re_dev_ops = {
 	.create_cq = bnxt_re_create_cq,
 	.create_qp = bnxt_re_create_qp,
 	.create_srq = bnxt_re_create_srq,
+	.dealloc_driver = bnxt_re_dealloc_driver,
 	.dealloc_pd = bnxt_re_dealloc_pd,
 	.dealloc_ucontext = bnxt_re_dealloc_ucontext,
 	.del_gid = bnxt_re_del_gid,
@@ -726,15 +715,11 @@  static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
 {
 	dev_put(rdev->netdev);
 	rdev->netdev = NULL;
-
 	mutex_lock(&bnxt_re_dev_lock);
 	list_del_rcu(&rdev->list);
 	mutex_unlock(&bnxt_re_dev_lock);
 
 	synchronize_rcu();
-
-	ib_dealloc_device(&rdev->ibdev);
-	/* rdev is gone */
 }
 
 static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
@@ -767,6 +752,7 @@  static struct bnxt_re_dev *bnxt_re_dev_add(struct net_device *netdev,
 	mutex_lock(&bnxt_re_dev_lock);
 	list_add_tail_rcu(&rdev->list, &bnxt_re_dev_list);
 	mutex_unlock(&bnxt_re_dev_lock);
+
 	return rdev;
 }
 
@@ -1300,15 +1286,6 @@  static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev)
 		le16_to_cpu(resp.hwrm_intf_patch);
 }
 
-static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev)
-{
-	/* Cleanup ib dev */
-	if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
-		ib_unregister_device(&rdev->ibdev);
-		clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
-	}
-}
-
 int bnxt_re_ib_init(struct bnxt_re_dev *rdev)
 {
 	int rc = 0;
@@ -1335,10 +1312,6 @@  static void bnxt_re_dev_uninit(struct bnxt_re_dev *rdev)
 	u8 type;
 	int rc;
 
-	if (test_and_clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
-		/* Cleanup ib dev */
-		bnxt_re_unregister_ib(rdev);
-	}
 	if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
 		cancel_delayed_work_sync(&rdev->worker);
 
@@ -1595,6 +1568,19 @@  static int bnxt_re_add_device(struct bnxt_re_dev **rdev,
 	return rc;
 }
 
+static void bnxt_re_dealloc_driver(struct ib_device *ib_dev)
+{
+	struct bnxt_re_dev *rdev =
+		container_of(ib_dev, struct bnxt_re_dev, ibdev);
+
+	clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags);
+	dev_info(rdev_to_dev(rdev), "Unregistering Device");
+
+	rtnl_lock();
+	bnxt_re_remove_device(rdev);
+	rtnl_unlock();
+}
+
 /* Handle all deferred netevents tasks */
 static void bnxt_re_task(struct work_struct *work)
 {
@@ -1669,6 +1655,7 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	struct bnxt_re_dev *rdev;
 	int rc = 0;
 	bool sch_work = false;
+	bool release = true;
 
 	real_dev = rdma_vlan_dev_real_dev(netdev);
 	if (!real_dev)
@@ -1676,7 +1663,8 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 
 	rdev = bnxt_re_from_netdev(real_dev);
 	if (!rdev && event != NETDEV_REGISTER)
-		goto exit;
+		return NOTIFY_OK;
+
 	if (real_dev != netdev)
 		goto exit;
 
@@ -1687,6 +1675,7 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		rc = bnxt_re_add_device(&rdev, real_dev);
 		if (!rc)
 			sch_work = true;
+		release = false;
 		break;
 
 	case NETDEV_UNREGISTER:
@@ -1695,8 +1684,7 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 		 */
 		if (atomic_read(&rdev->sched_count) > 0)
 			goto exit;
-		bnxt_re_ib_uninit(rdev);
-		bnxt_re_remove_device(rdev);
+		ib_unregister_device_queued(&rdev->ibdev);
 		break;
 
 	default:
@@ -1718,6 +1706,8 @@  static int bnxt_re_netdev_event(struct notifier_block *notifier,
 	}
 
 exit:
+	if (release)
+		ib_device_put(&rdev->ibdev);
 	return NOTIFY_DONE;
 }
 
@@ -1753,32 +1743,22 @@  static int __init bnxt_re_mod_init(void)
 
 static void __exit bnxt_re_mod_exit(void)
 {
-	struct bnxt_re_dev *rdev, *next;
-	LIST_HEAD(to_be_deleted);
+	struct bnxt_re_dev *rdev;
 
+	flush_workqueue(bnxt_re_wq);
 	mutex_lock(&bnxt_re_dev_lock);
-	/* Free all adapter allocated resources */
-	if (!list_empty(&bnxt_re_dev_list))
-		list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
-	mutex_unlock(&bnxt_re_dev_lock);
-       /*
-	* Cleanup the devices in reverse order so that the VF device
-	* cleanup is done before PF cleanup
-	*/
-	list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
-		dev_info(rdev_to_dev(rdev), "Unregistering Device");
+	list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
 		/*
-		 * Flush out any scheduled tasks before destroying the
-		 * resources
+		 * Set unreg flag to avoid VF resource cleanup
+		 * in module unload path. This is required because
+		 * dealloc_driver for VF can come after PF cleaning
+		 * the VF resources.
 		 */
-		flush_workqueue(bnxt_re_wq);
-		bnxt_re_dev_stop(rdev);
-		bnxt_re_ib_uninit(rdev);
-		/* Acquire the rtnl_lock as the L2 resources are freed here */
-		rtnl_lock();
-		bnxt_re_remove_device(rdev);
-		rtnl_unlock();
+		if (rdev->is_virtfn)
+			rdev->rcfw.res_deinit = true;
 	}
+	mutex_unlock(&bnxt_re_dev_lock);
+	ib_unregister_driver(RDMA_DRIVER_BNXT_RE);
 	unregister_netdevice_notifier(&bnxt_re_netdev_notifier);
 	if (bnxt_re_wq)
 		destroy_workqueue(bnxt_re_wq);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 5cdfa84..2304f97 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -211,6 +211,13 @@  int bnxt_qplib_rcfw_send_message(struct bnxt_qplib_rcfw *rcfw,
 	u8 opcode, retry_cnt = 0xFF;
 	int rc = 0;
 
+	/* Check if the command needs to be send down to HW.
+	 * Return success if resources are de-initialized
+	 */
+
+	if (rcfw->res_deinit)
+		return 0;
+
 	do {
 		opcode = req->opcode;
 		rc = __send_message(rcfw, req, resp, sb, is_block);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
index dfeadc1..c2b930e 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h
@@ -265,6 +265,7 @@  struct bnxt_qplib_rcfw {
 	u64 oos_prev;
 	u32 init_oos_stats;
 	u32 cmdq_depth;
+	bool res_deinit;
 };
 
 void bnxt_qplib_free_rcfw_channel(struct bnxt_qplib_rcfw *rcfw);