diff mbox

[for-next,V1,2/3] IB/core: RoCE GID management separate cleanup and release

Message ID 20150804212334.GB10934@obsidianresearch.com (mailing list archive)
State Accepted
Headers show

Commit Message

Jason Gunthorpe Aug. 4, 2015, 9:23 p.m. UTC
On Tue, Aug 04, 2015 at 09:55:03PM +0300, Matan Barak wrote:

> If it fails in ib_device_register_sysfs, the device release function
> isn't called and the device pointer isn't freed.

Ugh, yes, the abuse in ib_dealloc_device needs to go too.

Attached is a compile tested patch that fixes that up. Feel free to
fix it up as necessary. It should be sequenced before your 'Add RoCE
GID table management'.. It would probably be helpful for doug to
resend that one patch adjusted.

> If ib_cache_setup_one fails, we call ib_device_unregister_sysfs and
> the memory will be freed.

?? ib_device_unregister_sysfs doesn't free memory..

At the driver the sequence should be:
 ib_alloc_device
 ib_register_device
 ib_unregister_device
 ib_dealloc_device

The issue I'm looking (which I suspected before, but just went and
confirmed) at is that sysfs's show_port_gid calls ib_query_gid which
your series now makes use the cache, so sysfs should ideally be shut
off before the cache stuff.

.. and we must setup the cache before setting up sysfs, otherwise
there is a race where userspace can trigger a cache call before
setup.. (null deref?)

> ib_device_unregister_sysfs (otherwise I'll have use-after-free).
> What do you think?

I'm still not sure what you are seeing..

You might also want the cache code to have an entry from
ib_alloc_device to setup locks and other no-fail initalization
things. AFIAK there isn't a strong reason to do this other than to
keep with the basic idiom.

Jason

From 32bafdef61a9e571ef1457f7f1966a7372d1b8d7 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Tue, 4 Aug 2015 15:11:41 -0600
Subject: [PATCH] IB/core: Make ib_alloc_device initialize the kobject

This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we can't duplicate its kfrees on error paths.
Choose to just drop kfree(port_immutable)

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/infiniband/core/core_priv.h |  3 --
 drivers/infiniband/core/device.c    | 92 ++++++++++++++++++++++++-------------
 drivers/infiniband/core/sysfs.c     | 51 ++------------------
 3 files changed, 65 insertions(+), 81 deletions(-)

Comments

Doug Ledford Aug. 14, 2015, 9:49 p.m. UTC | #1
On 08/04/2015 05:23 PM, Jason Gunthorpe wrote:
> On Tue, Aug 04, 2015 at 09:55:03PM +0300, Matan Barak wrote:
> 
>> If it fails in ib_device_register_sysfs, the device release function
>> isn't called and the device pointer isn't freed.
> 
> Ugh, yes, the abuse in ib_dealloc_device needs to go too.
> 
> Attached is a compile tested patch that fixes that up. Feel free to
> fix it up as necessary. It should be sequenced before your 'Add RoCE
> GID table management'.. It would probably be helpful for doug to
> resend that one patch adjusted.
> 
>> If ib_cache_setup_one fails, we call ib_device_unregister_sysfs and
>> the memory will be freed.
> 
> ?? ib_device_unregister_sysfs doesn't free memory..
> 
> At the driver the sequence should be:
>  ib_alloc_device
>  ib_register_device
>  ib_unregister_device
>  ib_dealloc_device
> 
> The issue I'm looking (which I suspected before, but just went and
> confirmed) at is that sysfs's show_port_gid calls ib_query_gid which
> your series now makes use the cache, so sysfs should ideally be shut
> off before the cache stuff.
> 
> .. and we must setup the cache before setting up sysfs, otherwise
> there is a race where userspace can trigger a cache call before
> setup.. (null deref?)
> 
>> ib_device_unregister_sysfs (otherwise I'll have use-after-free).
>> What do you think?
> 
> I'm still not sure what you are seeing..
> 
> You might also want the cache code to have an entry from
> ib_alloc_device to setup locks and other no-fail initalization
> things. AFIAK there isn't a strong reason to do this other than to
> keep with the basic idiom.
> 
> Jason
> 
> From 32bafdef61a9e571ef1457f7f1966a7372d1b8d7 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> Date: Tue, 4 Aug 2015 15:11:41 -0600
> Subject: [PATCH] IB/core: Make ib_alloc_device initialize the kobject
> 
> This gets rid of the weird in-between state where struct ib_device
> was allocated but the kobject didn't work.
> 
> Consequently ib_device_release is now guaranteed to be called in
> all situations and we can't duplicate its kfrees on error paths.
> Choose to just drop kfree(port_immutable)
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

I reworded the commit message some, but the patch was good and makes
sense.  I've inserted it into the GID series and, as expected, it caused
failures in the original "IB/core: Add RoCE GID table management" patch.
 However, the fixups for those failures seem pretty obvious to me, so I
made them myself.  When I finally push this, it will be in this branch
on my github repo:

v4.2-rc6-based/gid-table-v7

You and Matan might want to double check that I fixed things up properly
and didn't miss something.
diff mbox

Patch

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 87d1936f5c1c..950583a62e3b 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -43,9 +43,6 @@  int  ib_device_register_sysfs(struct ib_device *device,
 						   u8, struct kobject *));
 void ib_device_unregister_sysfs(struct ib_device *device);
 
-int  ib_sysfs_setup(void);
-void ib_sysfs_cleanup(void);
-
 int  ib_cache_setup(void);
 void ib_cache_cleanup(void);
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 9567756ca4f9..0a2c28610026 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -152,6 +152,35 @@  static int alloc_name(char *name)
 	return 0;
 }
 
+static void ib_device_release(struct device *device)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+
+	kfree(dev->port_immutable);
+	kfree(dev);
+}
+
+static int ib_device_uevent(struct device *device,
+			    struct kobj_uevent_env *env)
+{
+	struct ib_device *dev = container_of(device, struct ib_device, dev);
+
+	if (add_uevent_var(env, "NAME=%s", dev->name))
+		return -ENOMEM;
+
+	/*
+	 * It would be nice to pass the node GUID with the event...
+	 */
+
+	return 0;
+}
+
+static struct class ib_class = {
+	.name    = "infiniband",
+	.dev_release = ib_device_release,
+	.dev_uevent = ib_device_uevent,
+};
+
 /**
  * ib_alloc_device - allocate an IB device struct
  * @size:size of structure to allocate
@@ -164,9 +193,27 @@  static int alloc_name(char *name)
  */
 struct ib_device *ib_alloc_device(size_t size)
 {
-	BUG_ON(size < sizeof (struct ib_device));
+	struct ib_device *device;
+
+	if (WARN_ON(size < sizeof(struct ib_device)))
+		return NULL;
+
+	device = kzalloc(size, GFP_KERNEL);
+	if (!device)
+		return NULL;
+
+	device->dev.class = &ib_class;
+	device_initialize(&device->dev);
+
+	dev_set_drvdata(&device->dev, device);
+
+	INIT_LIST_HEAD(&device->event_handler_list);
+	spin_lock_init(&device->event_handler_lock);
+	spin_lock_init(&device->client_data_lock);
+	INIT_LIST_HEAD(&device->client_data_list);
+	INIT_LIST_HEAD(&device->port_list);
 
-	return kzalloc(size, GFP_KERNEL);
+	return device;
 }
 EXPORT_SYMBOL(ib_alloc_device);
 
@@ -178,13 +225,8 @@  EXPORT_SYMBOL(ib_alloc_device);
  */
 void ib_dealloc_device(struct ib_device *device)
 {
-	if (device->reg_state == IB_DEV_UNINITIALIZED) {
-		kfree(device);
-		return;
-	}
-
-	BUG_ON(device->reg_state != IB_DEV_UNREGISTERED);
-
+	WARN_ON(device->reg_state != IB_DEV_UNREGISTERED &&
+		device->reg_state != IB_DEV_UNINITIALIZED);
 	kobject_put(&device->dev.kobj);
 }
 EXPORT_SYMBOL(ib_dealloc_device);
@@ -219,7 +261,7 @@  static int verify_immutable(const struct ib_device *dev, u8 port)
 
 static int read_port_immutable(struct ib_device *device)
 {
-	int ret = -ENOMEM;
+	int ret;
 	u8 start_port = rdma_start_port(device);
 	u8 end_port = rdma_end_port(device);
 	u8 port;
@@ -235,26 +277,18 @@  static int read_port_immutable(struct ib_device *device)
 					 * (end_port + 1),
 					 GFP_KERNEL);
 	if (!device->port_immutable)
-		goto err;
+		return -ENOMEM;
 
 	for (port = start_port; port <= end_port; ++port) {
 		ret = device->get_port_immutable(device, port,
 						 &device->port_immutable[port]);
 		if (ret)
-			goto err;
+			return ret;
 
-		if (verify_immutable(device, port)) {
-			ret = -EINVAL;
-			goto err;
-		}
+		if (verify_immutable(device, port))
+			return -EINVAL;
 	}
-
-	ret = 0;
-	goto out;
-err:
-	kfree(device->port_immutable);
-out:
-	return ret;
+	return 0;
 }
 
 /**
@@ -285,11 +319,6 @@  int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
-	INIT_LIST_HEAD(&device->event_handler_list);
-	INIT_LIST_HEAD(&device->client_data_list);
-	spin_lock_init(&device->event_handler_lock);
-	spin_lock_init(&device->client_data_lock);
-
 	ret = read_port_immutable(device);
 	if (ret) {
 		printk(KERN_WARNING "Couldn't create per port immutable data %s\n",
@@ -301,7 +330,6 @@  int ib_register_device(struct ib_device *device,
 	if (ret) {
 		printk(KERN_WARNING "Couldn't register device %s with driver model\n",
 		       device->name);
-		kfree(device->port_immutable);
 		goto out;
 	}
 
@@ -737,7 +765,7 @@  static int __init ib_core_init(void)
 	if (!ib_wq)
 		return -ENOMEM;
 
-	ret = ib_sysfs_setup();
+	ret = class_register(&ib_class);
 	if (ret) {
 		printk(KERN_WARNING "Couldn't create InfiniBand device class\n");
 		goto err;
@@ -761,7 +789,7 @@  err_nl:
 	ibnl_cleanup();
 
 err_sysfs:
-	ib_sysfs_cleanup();
+	class_unregister(&ib_class);
 
 err:
 	destroy_workqueue(ib_wq);
@@ -772,7 +800,7 @@  static void __exit ib_core_cleanup(void)
 {
 	ib_cache_cleanup();
 	ibnl_cleanup();
-	ib_sysfs_cleanup();
+	class_unregister(&ib_class);
 	/* Make sure that any pending umem accounting work is done. */
 	destroy_workqueue(ib_wq);
 }
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 0b84a9cdfe5b..34cdd74b0a17 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -457,29 +457,6 @@  static struct kobj_type port_type = {
 	.default_attrs = port_default_attrs
 };
 
-static void ib_device_release(struct device *device)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-
-	kfree(dev->port_immutable);
-	kfree(dev);
-}
-
-static int ib_device_uevent(struct device *device,
-			    struct kobj_uevent_env *env)
-{
-	struct ib_device *dev = container_of(device, struct ib_device, dev);
-
-	if (add_uevent_var(env, "NAME=%s", dev->name))
-		return -ENOMEM;
-
-	/*
-	 * It would be nice to pass the node GUID with the event...
-	 */
-
-	return 0;
-}
-
 static struct attribute **
 alloc_group_attrs(ssize_t (*show)(struct ib_port *,
 				  struct port_attribute *, char *buf),
@@ -702,12 +679,6 @@  static struct device_attribute *ib_class_attributes[] = {
 	&dev_attr_node_desc
 };
 
-static struct class ib_class = {
-	.name    = "infiniband",
-	.dev_release = ib_device_release,
-	.dev_uevent = ib_device_uevent,
-};
-
 /* Show a given an attribute in the statistics group */
 static ssize_t show_protocol_stat(const struct device *device,
 			    struct device_attribute *attr, char *buf,
@@ -846,14 +817,12 @@  int ib_device_register_sysfs(struct ib_device *device,
 	int ret;
 	int i;
 
-	class_dev->class      = &ib_class;
-	class_dev->parent     = device->dma_device;
-	dev_set_name(class_dev, "%s", device->name);
-	dev_set_drvdata(class_dev, device);
-
-	INIT_LIST_HEAD(&device->port_list);
+	device->dev.parent = device->dma_device;
+	ret = dev_set_name(class_dev, "%s", device->name);
+	if (ret)
+		return ret;
 
-	ret = device_register(class_dev);
+	ret = device_add(class_dev);
 	if (ret)
 		goto err;
 
@@ -916,13 +885,3 @@  void ib_device_unregister_sysfs(struct ib_device *device)
 
 	device_unregister(&device->dev);
 }
-
-int ib_sysfs_setup(void)
-{
-	return class_register(&ib_class);
-}
-
-void ib_sysfs_cleanup(void)
-{
-	class_unregister(&ib_class);
-}