diff mbox

[for-next,V2,4/4] IB/core: RoCE GID management separate cleanup and release

Message ID 1438881240-22790-5-git-send-email-matanb@mellanox.com (mailing list archive)
State Accepted
Headers show

Commit Message

Matan Barak Aug. 6, 2015, 5:14 p.m. UTC
Separate the cleanup and release IB cache functions.
The cleanup function delete all GIDs and unregister the event channel,
while the release function frees all memory. The cleanup function is
called after all clients were removed (in the device un-registration
process).

The release function is called after the device was put.
This is in order to avoid use-after-free errors if ther vendor
driver's teardown code uses IB cache.

Squash-into: 'IB/core: Add RoCE GID table management'
Signed-off-by: Matan Barak <matanb@mellanox.com>
---
 drivers/infiniband/core/cache.c     | 101 +++++++++++++++++++++++++-----------
 drivers/infiniband/core/core_priv.h |   1 +
 drivers/infiniband/core/device.c    |  17 +++---
 3 files changed, 81 insertions(+), 38 deletions(-)

Comments

Jason Gunthorpe Aug. 11, 2015, 6:02 a.m. UTC | #1
On Thu, Aug 06, 2015 at 08:14:00PM +0300, Matan Barak wrote:
> Separate the cleanup and release IB cache functions.
> The cleanup function delete all GIDs and unregister the event channel,
> while the release function frees all memory. The cleanup function is
> called after all clients were removed (in the device un-registration
> process).

I'm finding this patches ontop of patches rebase stuff to be hard to
follow, but this looks like the right general idea. Hopefully Doug can
check the net result carefully..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Aug. 15, 2015, 2:14 p.m. UTC | #2
On 08/06/2015 01:14 PM, Matan Barak wrote:
> Separate the cleanup and release IB cache functions.
> The cleanup function delete all GIDs and unregister the event channel,
> while the release function frees all memory. The cleanup function is
> called after all clients were removed (in the device un-registration
> process).
> 
> The release function is called after the device was put.
> This is in order to avoid use-after-free errors if ther vendor
> driver's teardown code uses IB cache.
> 
> Squash-into: 'IB/core: Add RoCE GID table management'
> Signed-off-by: Matan Barak <matanb@mellanox.com>

I had originally used the v1 of this patch instead of v2.  In order to
make sure I didn't miss anything, I hand compared the final results of
the v1 patch to what this patch would have created if I had used it.  I
found a few things worth nothing, comments inline below.

> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>  					  (rdma_end_port(device) -
>  					   rdma_start_port(device) + 1),
>  					  GFP_KERNEL);
> -	err = gid_table_setup_one(device);
> -
> -	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
> +	if (!device->cache.pkey_cache ||
>  	    !device->cache.lmc_cache) {
>  		printk(KERN_WARNING "Couldn't allocate cache "
>  		       "for %s\n", device->name);
> -		err = -ENOMEM;
> -		goto err;
> +		/* pkey_cache is freed here because we later access it's
> +		 * elements.
> +		 */
> +		kfree(device->cache.pkey_cache);
> +		device->cache.pkey_cache = NULL;
> +		return -ENOMEM;

This function is unnecessarily convoluted IMO.  A simple change to using
kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
result, I fixed this function up to be different than either patch.
Please look over my results and make sure you agree with my changes.

>  	}
>  
> -	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
> +	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
>  		device->cache.pkey_cache[p] = NULL;

For instance this goes away entirely by using kzalloc().

The rest looked ok.  I have to fixup the ordering changes between sysfs
and cache to make the v1 patch match the v2 patch.
Matan Barak Aug. 16, 2015, 8:19 a.m. UTC | #3
On 8/15/2015 5:14 PM, Doug Ledford wrote:
> On 08/06/2015 01:14 PM, Matan Barak wrote:
>> Separate the cleanup and release IB cache functions.
>> The cleanup function delete all GIDs and unregister the event channel,
>> while the release function frees all memory. The cleanup function is
>> called after all clients were removed (in the device un-registration
>> process).
>>
>> The release function is called after the device was put.
>> This is in order to avoid use-after-free errors if ther vendor
>> driver's teardown code uses IB cache.
>>
>> Squash-into: 'IB/core: Add RoCE GID table management'
>> Signed-off-by: Matan Barak <matanb@mellanox.com>
>
> I had originally used the v1 of this patch instead of v2.  In order to
> make sure I didn't miss anything, I hand compared the final results of
> the v1 patch to what this patch would have created if I had used it.  I
> found a few things worth nothing, comments inline below.
>
>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>   					  (rdma_end_port(device) -
>>   					   rdma_start_port(device) + 1),
>>   					  GFP_KERNEL);
>> -	err = gid_table_setup_one(device);
>> -
>> -	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>> +	if (!device->cache.pkey_cache ||
>>   	    !device->cache.lmc_cache) {
>>   		printk(KERN_WARNING "Couldn't allocate cache "
>>   		       "for %s\n", device->name);
>> -		err = -ENOMEM;
>> -		goto err;
>> +		/* pkey_cache is freed here because we later access it's
>> +		 * elements.
>> +		 */
>> +		kfree(device->cache.pkey_cache);
>> +		device->cache.pkey_cache = NULL;
>> +		return -ENOMEM;
>
> This function is unnecessarily convoluted IMO.  A simple change to using
> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
> result, I fixed this function up to be different than either patch.
> Please look over my results and make sure you agree with my changes.
>
>>   	}
>>
>> -	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
>> +	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
>>   		device->cache.pkey_cache[p] = NULL;
>
> For instance this goes away entirely by using kzalloc().
>
> The rest looked ok.  I have to fixup the ordering changes between sysfs
> and cache to make the v1 patch match the v2 patch.
>
>

Thanks for the squashing and re-ordering these patches.
I think you're missing if (device->cache.pkey_cache) over the for pkey 
free loop in ib_cache_release_one. Otherwise, the allocation of 
device->cache.pkey_cache might has failed and you dereference an invalid 
address device->cache.pkey_cache[p].

In addition, ib_device_release calls ib_cache_release_one with device 
instead of dev (which has the wrong type).
ib_register_device calls ib_cache_clean_one that I can't really find :)


Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Aug. 16, 2015, 4:24 p.m. UTC | #4
On 08/16/2015 04:19 AM, Matan Barak wrote:
> 
> 
> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>> Separate the cleanup and release IB cache functions.
>>> The cleanup function delete all GIDs and unregister the event channel,
>>> while the release function frees all memory. The cleanup function is
>>> called after all clients were removed (in the device un-registration
>>> process).
>>>
>>> The release function is called after the device was put.
>>> This is in order to avoid use-after-free errors if ther vendor
>>> driver's teardown code uses IB cache.
>>>
>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>> Signed-off-by: Matan Barak <matanb@mellanox.com>
>>
>> I had originally used the v1 of this patch instead of v2.  In order to
>> make sure I didn't miss anything, I hand compared the final results of
>> the v1 patch to what this patch would have created if I had used it.  I
>> found a few things worth nothing, comments inline below.
>>
>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>                         (rdma_end_port(device) -
>>>                          rdma_start_port(device) + 1),
>>>                         GFP_KERNEL);
>>> -    err = gid_table_setup_one(device);
>>> -
>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>> +    if (!device->cache.pkey_cache ||
>>>           !device->cache.lmc_cache) {
>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>                  "for %s\n", device->name);
>>> -        err = -ENOMEM;
>>> -        goto err;
>>> +        /* pkey_cache is freed here because we later access it's
>>> +         * elements.
>>> +         */
>>> +        kfree(device->cache.pkey_cache);
>>> +        device->cache.pkey_cache = NULL;
>>> +        return -ENOMEM;
>>
>> This function is unnecessarily convoluted IMO.  A simple change to using
>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>> result, I fixed this function up to be different than either patch.
>> Please look over my results and make sure you agree with my changes.
>>
>>>       }
>>>
>>> -    for (p = 0; p <= rdma_end_port(device) -
>>> rdma_start_port(device); ++p) {
>>> +    for (p = 0; p <= rdma_end_port(device) -
>>> rdma_start_port(device); ++p)
>>>           device->cache.pkey_cache[p] = NULL;
>>
>> For instance this goes away entirely by using kzalloc().
>>
>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>> and cache to make the v1 patch match the v2 patch.
>>
>>
> 
> Thanks for the squashing and re-ordering these patches.
> I think you're missing if (device->cache.pkey_cache) over the for pkey
> free loop in ib_cache_release_one.

I think you're right, but I think that particular issue exists in both
versions of the code (mine and yours).  In particular, a failure during
ib_register_device should result in a call to ib_dealloc_device which
will result in a call to ib_cache_release_one, which will happen whether
ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
must check pkey_cache before releasing the ports because it can't assume
the original setup succeeded.

> Otherwise, the allocation of
> device->cache.pkey_cache might has failed and you dereference an invalid
> address device->cache.pkey_cache[p].
> 
> In addition, ib_device_release calls ib_cache_release_one with device
> instead of dev (which has the wrong type).
> ib_register_device calls ib_cache_clean_one that I can't really find :)

Both of these turned up in compile tests.  I've since resolved them.
Hand merge issues ;-)
Doug Ledford Aug. 16, 2015, 4:48 p.m. UTC | #5
On 08/16/2015 12:24 PM, Doug Ledford wrote:
> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>
>>
>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>>> Separate the cleanup and release IB cache functions.
>>>> The cleanup function delete all GIDs and unregister the event channel,
>>>> while the release function frees all memory. The cleanup function is
>>>> called after all clients were removed (in the device un-registration
>>>> process).
>>>>
>>>> The release function is called after the device was put.
>>>> This is in order to avoid use-after-free errors if ther vendor
>>>> driver's teardown code uses IB cache.
>>>>
>>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>>> Signed-off-by: Matan Barak <matanb@mellanox.com>
>>>
>>> I had originally used the v1 of this patch instead of v2.  In order to
>>> make sure I didn't miss anything, I hand compared the final results of
>>> the v1 patch to what this patch would have created if I had used it.  I
>>> found a few things worth nothing, comments inline below.
>>>
>>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>>                         (rdma_end_port(device) -
>>>>                          rdma_start_port(device) + 1),
>>>>                         GFP_KERNEL);
>>>> -    err = gid_table_setup_one(device);
>>>> -
>>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>>> +    if (!device->cache.pkey_cache ||
>>>>           !device->cache.lmc_cache) {
>>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>>                  "for %s\n", device->name);
>>>> -        err = -ENOMEM;
>>>> -        goto err;
>>>> +        /* pkey_cache is freed here because we later access it's
>>>> +         * elements.
>>>> +         */
>>>> +        kfree(device->cache.pkey_cache);
>>>> +        device->cache.pkey_cache = NULL;
>>>> +        return -ENOMEM;
>>>
>>> This function is unnecessarily convoluted IMO.  A simple change to using
>>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>>> result, I fixed this function up to be different than either patch.
>>> Please look over my results and make sure you agree with my changes.
>>>
>>>>       }
>>>>
>>>> -    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p) {
>>>> +    for (p = 0; p <= rdma_end_port(device) -
>>>> rdma_start_port(device); ++p)
>>>>           device->cache.pkey_cache[p] = NULL;
>>>
>>> For instance this goes away entirely by using kzalloc().
>>>
>>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>>> and cache to make the v1 patch match the v2 patch.
>>>
>>>
>>
>> Thanks for the squashing and re-ordering these patches.
>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>> free loop in ib_cache_release_one.
> 
> I think you're right, but I think that particular issue exists in both
> versions of the code (mine and yours).  In particular, a failure during
> ib_register_device should result in a call to ib_dealloc_device which
> will result in a call to ib_cache_release_one, which will happen whether
> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
> must check pkey_cache before releasing the ports because it can't assume
> the original setup succeeded.
> 
>> Otherwise, the allocation of
>> device->cache.pkey_cache might has failed and you dereference an invalid
>> address device->cache.pkey_cache[p].
>>
>> In addition, ib_device_release calls ib_cache_release_one with device
>> instead of dev (which has the wrong type).
>> ib_register_device calls ib_cache_clean_one that I can't really find :)
> 
> Both of these turned up in compile tests.  I've since resolved them.
> Hand merge issues ;-)
> 

I've pushed what I think is a fixed version to github.  Please feel free
to review.
Matan Barak Aug. 19, 2015, 1:16 p.m. UTC | #6
On Sun, Aug 16, 2015 at 7:48 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 08/16/2015 12:24 PM, Doug Ledford wrote:
>> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>>
>>>
>>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
>>>> On 08/06/2015 01:14 PM, Matan Barak wrote:
>>>>> Separate the cleanup and release IB cache functions.
>>>>> The cleanup function delete all GIDs and unregister the event channel,
>>>>> while the release function frees all memory. The cleanup function is
>>>>> called after all clients were removed (in the device un-registration
>>>>> process).
>>>>>
>>>>> The release function is called after the device was put.
>>>>> This is in order to avoid use-after-free errors if ther vendor
>>>>> driver's teardown code uses IB cache.
>>>>>
>>>>> Squash-into: 'IB/core: Add RoCE GID table management'
>>>>> Signed-off-by: Matan Barak <matanb@mellanox.com>
>>>>
>>>> I had originally used the v1 of this patch instead of v2.  In order to
>>>> make sure I didn't miss anything, I hand compared the final results of
>>>> the v1 patch to what this patch would have created if I had used it.  I
>>>> found a few things worth nothing, comments inline below.
>>>>
>>>>> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
>>>>>                         (rdma_end_port(device) -
>>>>>                          rdma_start_port(device) + 1),
>>>>>                         GFP_KERNEL);
>>>>> -    err = gid_table_setup_one(device);
>>>>> -
>>>>> -    if (!device->cache.pkey_cache || !device->cache.gid_cache ||
>>>>> +    if (!device->cache.pkey_cache ||
>>>>>           !device->cache.lmc_cache) {
>>>>>           printk(KERN_WARNING "Couldn't allocate cache "
>>>>>                  "for %s\n", device->name);
>>>>> -        err = -ENOMEM;
>>>>> -        goto err;
>>>>> +        /* pkey_cache is freed here because we later access it's
>>>>> +         * elements.
>>>>> +         */
>>>>> +        kfree(device->cache.pkey_cache);
>>>>> +        device->cache.pkey_cache = NULL;
>>>>> +        return -ENOMEM;
>>>>
>>>> This function is unnecessarily convoluted IMO.  A simple change to using
>>>> kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
>>>> result, I fixed this function up to be different than either patch.
>>>> Please look over my results and make sure you agree with my changes.
>>>>
>>>>>       }
>>>>>
>>>>> -    for (p = 0; p <= rdma_end_port(device) -
>>>>> rdma_start_port(device); ++p) {
>>>>> +    for (p = 0; p <= rdma_end_port(device) -
>>>>> rdma_start_port(device); ++p)
>>>>>           device->cache.pkey_cache[p] = NULL;
>>>>
>>>> For instance this goes away entirely by using kzalloc().
>>>>
>>>> The rest looked ok.  I have to fixup the ordering changes between sysfs
>>>> and cache to make the v1 patch match the v2 patch.
>>>>
>>>>
>>>
>>> Thanks for the squashing and re-ordering these patches.
>>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>>> free loop in ib_cache_release_one.
>>
>> I think you're right, but I think that particular issue exists in both
>> versions of the code (mine and yours).  In particular, a failure during
>> ib_register_device should result in a call to ib_dealloc_device which
>> will result in a call to ib_cache_release_one, which will happen whether
>> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
>> must check pkey_cache before releasing the ports because it can't assume
>> the original setup succeeded.
>>
>>> Otherwise, the allocation of
>>> device->cache.pkey_cache might has failed and you dereference an invalid
>>> address device->cache.pkey_cache[p].
>>>
>>> In addition, ib_device_release calls ib_cache_release_one with device
>>> instead of dev (which has the wrong type).
>>> ib_register_device calls ib_cache_clean_one that I can't really find :)
>>
>> Both of these turned up in compile tests.  I've since resolved them.
>> Hand merge issues ;-)
>>
>
> I've pushed what I think is a fixed version to github.  Please feel free
> to review.
>
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>

I think that after applying Dan Carpenter's "[patch] IB/core: missing
curly braces in ib_find_gid()" it's fine :-)
Thanks.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 21cf94b..ba3720c 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -465,8 +465,16 @@  err_free_table:
 	return NULL;
 }
 
-static void free_gid_table(struct ib_device *ib_dev, u8 port,
-			   struct ib_gid_table *table)
+static void release_gid_table(struct ib_gid_table *table)
+{
+	if (table) {
+		kfree(table->data_vec);
+		kfree(table);
+	}
+}
+
+static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port,
+				   struct ib_gid_table *table)
 {
 	int i;
 
@@ -480,8 +488,6 @@  static void free_gid_table(struct ib_device *ib_dev, u8 port,
 				table->data_vec[i].props &
 				GID_ATTR_FIND_MASK_DEFAULT);
 	}
-	kfree(table->data_vec);
-	kfree(table);
 }
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
@@ -583,15 +589,17 @@  static int _gid_table_setup_one(struct ib_device *ib_dev)
 	return 0;
 
 rollback_table_setup:
-	for (port = 0; port < ib_dev->phys_port_cnt; port++)
-		free_gid_table(ib_dev, port + rdma_start_port(ib_dev),
-			       table[port]);
+	for (port = 0; port < ib_dev->phys_port_cnt; port++) {
+		cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
+				       table[port]);
+		release_gid_table(table[port]);
+	}
 
 	kfree(table);
 	return err;
 }
 
-static void gid_table_cleanup_one(struct ib_device *ib_dev)
+static void gid_table_release_one(struct ib_device *ib_dev)
 {
 	struct ib_gid_table **table = ib_dev->cache.gid_cache;
 	u8 port;
@@ -600,10 +608,23 @@  static void gid_table_cleanup_one(struct ib_device *ib_dev)
 		return;
 
 	for (port = 0; port < ib_dev->phys_port_cnt; port++)
-		free_gid_table(ib_dev, port + rdma_start_port(ib_dev),
-			       table[port]);
+		release_gid_table(table[port]);
 
 	kfree(table);
+	ib_dev->cache.gid_cache = NULL;
+}
+
+static void gid_table_cleanup_one(struct ib_device *ib_dev)
+{
+	struct ib_gid_table **table = ib_dev->cache.gid_cache;
+	u8 port;
+
+	if (!table)
+		return;
+
+	for (port = 0; port < ib_dev->phys_port_cnt; port++)
+		cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
+				       table[port]);
 }
 
 static int gid_table_setup_one(struct ib_device *ib_dev)
@@ -903,20 +924,28 @@  int ib_cache_setup_one(struct ib_device *device)
 					  (rdma_end_port(device) -
 					   rdma_start_port(device) + 1),
 					  GFP_KERNEL);
-	err = gid_table_setup_one(device);
-
-	if (!device->cache.pkey_cache || !device->cache.gid_cache ||
+	if (!device->cache.pkey_cache ||
 	    !device->cache.lmc_cache) {
 		printk(KERN_WARNING "Couldn't allocate cache "
 		       "for %s\n", device->name);
-		err = -ENOMEM;
-		goto err;
+		/* pkey_cache is freed here because we later access it's
+		 * elements.
+		 */
+		kfree(device->cache.pkey_cache);
+		device->cache.pkey_cache = NULL;
+		return -ENOMEM;
 	}
 
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
+	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
 		device->cache.pkey_cache[p] = NULL;
+
+	err = gid_table_setup_one(device);
+	if (err)
+		/* Allocated memory will be cleaned in the release funciton */
+		return err;
+
+	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
 		ib_cache_update(device, p + rdma_start_port(device));
-	}
 
 	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
 			      device, ib_cache_event);
@@ -927,32 +956,44 @@  int ib_cache_setup_one(struct ib_device *device)
 	return 0;
 
 err_cache:
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
-		kfree(device->cache.pkey_cache[p]);
-
-err:
-	kfree(device->cache.pkey_cache);
 	gid_table_cleanup_one(device);
-	kfree(device->cache.lmc_cache);
 
 	return err;
 }
 
-void ib_cache_cleanup_one(struct ib_device *device)
+void ib_cache_release_one(struct ib_device *device)
 {
 	int p;
 
-	ib_unregister_event_handler(&device->cache.event_handler);
-	flush_workqueue(ib_wq);
-
-	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
-		kfree(device->cache.pkey_cache[p]);
+	/* The release function frees all the cache elements.
+	 * This function should be called as part of freeing
+	 * all the device's resources when the cache could no
+	 * longer be accessed.
+	 */
+	if (device->cache.pkey_cache) {
+		for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
+			kfree(device->cache.pkey_cache[p]);
+	}
 
+	gid_table_release_one(device);
 	kfree(device->cache.pkey_cache);
-	gid_table_cleanup_one(device);
 	kfree(device->cache.lmc_cache);
 }
 
+void ib_cache_cleanup_one(struct ib_device *device)
+{
+	/* The cleanup function unregisters the event handler,
+	 * waits for all in-progress workqueue elements and cleans
+	 * up the GID cache. This function should be called after
+	 * the device was removed from the devices list and all
+	 * clients were removed, so the cache exists but is
+	 * non-functional and shouldn't be updated anymore.
+	 */
+	ib_unregister_event_handler(&device->cache.event_handler);
+	flush_workqueue(ib_wq);
+	gid_table_cleanup_one(device);
+}
+
 void __init ib_cache_setup(void)
 {
 	roce_gid_mgmt_init();
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6c4880e..0ebcd29 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -95,5 +95,6 @@  int roce_rescan_device(struct ib_device *ib_dev);
 
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
+void ib_cache_release_one(struct ib_device *device);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 25c9301..853e3d2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -165,6 +165,7 @@  static void ib_device_release(struct device *device)
 {
 	struct ib_device *dev = dev_get_drvdata(device);
 
+	ib_cache_release_one(dev);
 	kfree(dev->port_immutable);
 	kfree(dev);
 }
@@ -335,8 +336,15 @@  int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
+	ret = ib_cache_setup_one(device);
+	if (ret) {
+		printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
+		goto out;
+	}
+
 	ret = ib_device_register_sysfs(device, port_callback);
 	if (ret) {
+		ib_cache_cleanup_one(device);
 		printk(KERN_WARNING "Couldn't register device %s with driver model\n",
 		       device->name);
 		goto out;
@@ -344,14 +352,6 @@  int ib_register_device(struct ib_device *device,
 
 	device->reg_state = IB_DEV_REGISTERED;
 
-	ret = ib_cache_setup_one(device);
-	if (ret) {
-		printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
-		ib_device_unregister_sysfs(device);
-		kfree(device->port_immutable);
-		goto out;
-	}
-
 	{
 		struct ib_client *client;
 
@@ -395,6 +395,7 @@  void ib_unregister_device(struct ib_device *device)
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
+	ib_cache_cleanup_one(device);
 
 	spin_lock_irqsave(&device->client_data_lock, flags);
 	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)