diff mbox

[1/2] block: genhd: add device_add_disk_with_groups

Message ID 20170928193637.24707-1-mwilck@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Wilck Sept. 28, 2017, 7:36 p.m. UTC
In the NVME subsystem, we're seeing a race condition with udev where
device_add_disk() is called (which triggers an "add" uevent), and a
sysfs attribute group is added to the disk device afterwards.
If udev rules access these attributes before they are created,
udev processing of the device is incomplete, in particular, device
WWIDs may not be determined correctly.

To fix this, this patch introduces a new function
device_add_disk_with_groups(), which takes a list of attribute groups
and adds them to the device before sending out uevents.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/genhd.c         | 17 ++++++++++++-----
 include/linux/genhd.h |  8 +++++++-
 2 files changed, 19 insertions(+), 6 deletions(-)

Comments

Schremmer, Steven Sept. 29, 2017, 7:27 p.m. UTC | #1
> From: Linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf Of Martin Wilck
> Sent: Thursday, September 28, 2017 2:37 PM
> To: Jens Axboe <axboe@kernel.dk>; Christoph Hellwig <hch@lst.de>; Johannes Thumshirn <jthumshirn@suse.de>
> Cc: linux-block@vger.kernel.org; Martin Wilck <mwilck@suse.de>; linux-kernel@vger.kernel.org; linux-nvme@lists.infradead.org;
> Hannes Reinecke <hare@suse.de>
> Subject: [PATCH 1/2] block: genhd: add device_add_disk_with_groups
> 

Tested-by: Steve Schremmer <steve.schremmer@netapp.com>
Keith Busch Sept. 29, 2017, 10:59 p.m. UTC | #2
On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote:
> In the NVME subsystem, we're seeing a race condition with udev where
> device_add_disk() is called (which triggers an "add" uevent), and a
> sysfs attribute group is added to the disk device afterwards.
> If udev rules access these attributes before they are created,
> udev processing of the device is incomplete, in particular, device
> WWIDs may not be determined correctly.
> 
> To fix this, this patch introduces a new function
> device_add_disk_with_groups(), which takes a list of attribute groups
> and adds them to the device before sending out uevents.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

Is NVMe the only one having this problem? Was putting our attributes in
the disk's kobj a bad choice?

Any, looks fine to me.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Christoph Hellwig Oct. 1, 2017, 8 a.m. UTC | #3
While this looks okay-ish to me I really don't want people confused
with three variants of add_disk, we really need to consolidate
our helpers there a bit..
Sagi Grimberg Oct. 2, 2017, 10:46 p.m. UTC | #4
Reviewed-by: Sagi Grimberg <sagi@gimberg.me>
Martin Wilck Oct. 4, 2017, 10:33 a.m. UTC | #5
On Sun, 2017-10-01 at 10:00 +0200, Christoph Hellwig wrote:
> While this looks okay-ish to me I really don't want people confused
> with three variants of add_disk, we really need to consolidate
> our helpers there a bit..
> 

Can you give me a hint what you'd like to see?

Martin
Martin Wilck Oct. 4, 2017, 10:46 a.m. UTC | #6
On Fri, 2017-09-29 at 16:59 -0600, Keith Busch wrote:
> On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote:
> > In the NVME subsystem, we're seeing a race condition with udev
> > where
> > device_add_disk() is called (which triggers an "add" uevent), and a
> > sysfs attribute group is added to the disk device afterwards.
> > If udev rules access these attributes before they are created,
> > udev processing of the device is incomplete, in particular, device
> > WWIDs may not be determined correctly.
> > 
> > To fix this, this patch introduces a new function
> > device_add_disk_with_groups(), which takes a list of attribute
> > groups
> > and adds them to the device before sending out uevents.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> 
> Is NVMe the only one having this problem?

There are other devices that follow the same programming pattern
(device_add_disk followed by sysfs_create_group), but I haven't tested
them all, nor reviewed whether these devices need the disk's sysfs
attributes for udev processing. If they don't, the problem will just go
unnoticed.

SCSI obviously takes a very different approach to sysfs layout.

> Was putting our attributes in the disk's kobj a bad choice?

Well, it would make sense to separate the block (disk) device from the
device representing the NVMe subsys/namespace in sysfs. But I guess
it's too late for that now. Actually, the first attempt I made to solve
this problem was exactly that, and it proved to "work", too, but only
at the cost of changing the path of the NVMe block device in sysfs,
which I considered a no-go. Thus I came up with the approach I posted.

Regards,
Martin
diff mbox

Patch

diff --git a/block/genhd.c b/block/genhd.c
index dd305c65ffb0..1900682a221e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -552,7 +552,8 @@  static int exact_lock(dev_t devt, void *data)
 	return 0;
 }
 
-static void register_disk(struct device *parent, struct gendisk *disk)
+static void register_disk(struct device *parent, struct gendisk *disk,
+			  const struct attribute_group **groups)
 {
 	struct device *ddev = disk_to_dev(disk);
 	struct block_device *bdev;
@@ -578,6 +579,9 @@  static void register_disk(struct device *parent, struct gendisk *disk)
 		}
 	}
 
+	if (groups != NULL && sysfs_create_groups(&ddev->kobj, groups))
+		dev_warn(ddev, "failed to add attribute groups");
+
 	/*
 	 * avoid probable deadlock caused by allocating memory with
 	 * GFP_KERNEL in runtime_resume callback of its all ancestor
@@ -619,16 +623,19 @@  static void register_disk(struct device *parent, struct gendisk *disk)
 }
 
 /**
- * device_add_disk - add partitioning information to kernel list
+ * device_add_disk_with_groups - add partitioning information to kernel list
  * @parent: parent device for the disk
  * @disk: per-device partitioning information
+ * @groups: NULL-terminated array of attribute groups
  *
  * This function registers the partitioning information in @disk
  * with the kernel.
  *
  * FIXME: error handling
  */
-void device_add_disk(struct device *parent, struct gendisk *disk)
+void device_add_disk_with_groups(struct device *parent,
+				struct gendisk *disk,
+				const struct attribute_group **groups)
 {
 	struct backing_dev_info *bdi;
 	dev_t devt;
@@ -664,7 +671,7 @@  void device_add_disk(struct device *parent, struct gendisk *disk)
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
 			    exact_match, exact_lock, disk);
-	register_disk(parent, disk);
+	register_disk(parent, disk, groups);
 	blk_register_queue(disk);
 
 	/*
@@ -680,7 +687,7 @@  void device_add_disk(struct device *parent, struct gendisk *disk)
 	disk_add_events(disk);
 	blk_integrity_add(disk);
 }
-EXPORT_SYMBOL(device_add_disk);
+EXPORT_SYMBOL(device_add_disk_with_groups);
 
 void del_gendisk(struct gendisk *disk)
 {
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ea652bfcd675..3404d92d5063 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -386,7 +386,13 @@  static inline void free_part_info(struct hd_struct *part)
 extern void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part);
 
 /* block/genhd.c */
-extern void device_add_disk(struct device *parent, struct gendisk *disk);
+extern void device_add_disk_with_groups(struct device *parent,
+					struct gendisk *disk,
+					const struct attribute_group **groups);
+static inline void device_add_disk(struct device *parent, struct gendisk *disk)
+{
+	device_add_disk_with_groups(parent, disk, NULL);
+}
 static inline void add_disk(struct gendisk *disk)
 {
 	device_add_disk(NULL, disk);