diff mbox series

block: emit disk ro uevent in device_add_disk()

Message ID 20220303175219.272938-1-ushankar@purestorage.com (mailing list archive)
State New, archived
Headers show
Series block: emit disk ro uevent in device_add_disk() | expand

Commit Message

Uday Shankar March 3, 2022, 5:52 p.m. UTC
Userspace learns of disk ro state via the change event emitted by
set_disk_ro_uevent. This function has cyclic dependency with
device_add_disk: the latter performs kobject initialization that is
necessary for uevents to go through, but we want to set up properties
like ro state before exposing the disk to userspace via device_add_disk.

The usual workaround is to call set_disk_ro both before and after
device_add_disk; the purpose of the "after" call is just to emit the
uevent. Moreover, because set_disk_ro only emits a uevent when the ro
state changes, set_disk_ro needs to be called twice in the "after"
position to ensure that the ro state flips. See drivers/scsi/sd.c for an
example of this pattern.

The nvme driver does not implement this pattern. It only calls
set_disk_ro before device_add_disk, and so the ro uevent is never
emitted. This breaks applications such as dm-multipath. To avoid
introducing the messy pattern above into the nvme driver, emit the disk
ro uevent immediately after announcing addition of the disk.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 block/genhd.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

Comments

Christoph Hellwig March 4, 2022, 4:08 p.m. UTC | #1
On Thu, Mar 03, 2022 at 10:52:20AM -0700, Uday Shankar wrote:
> Userspace learns of disk ro state via the change event emitted by
> set_disk_ro_uevent. This function has cyclic dependency with
> device_add_disk: the latter performs kobject initialization that is
> necessary for uevents to go through, but we want to set up properties
> like ro state before exposing the disk to userspace via device_add_disk.
> 
> The usual workaround is to call set_disk_ro both before and after
> device_add_disk; the purpose of the "after" call is just to emit the
> uevent. Moreover, because set_disk_ro only emits a uevent when the ro
> state changes, set_disk_ro needs to be called twice in the "after"
> position to ensure that the ro state flips. See drivers/scsi/sd.c for an
> example of this pattern.

I don't see any such pattern there.  I also don't see what the point
is.  KOBJ_CHANGE uevents tell about a change in device state.  But
if a device is marked read-only before disk_add that read-only
state is already visible by the time the device is added and thus
shows up in sysfs, and we do not need an extra notification.
Hannes Reinecke March 7, 2022, 6:39 a.m. UTC | #2
On 3/3/22 18:52, Uday Shankar wrote:
> Userspace learns of disk ro state via the change event emitted by
> set_disk_ro_uevent. This function has cyclic dependency with
> device_add_disk: the latter performs kobject initialization that is
> necessary for uevents to go through, but we want to set up properties
> like ro state before exposing the disk to userspace via device_add_disk.
> 
> The usual workaround is to call set_disk_ro both before and after
> device_add_disk; the purpose of the "after" call is just to emit the
> uevent. Moreover, because set_disk_ro only emits a uevent when the ro
> state changes, set_disk_ro needs to be called twice in the "after"
> position to ensure that the ro state flips. See drivers/scsi/sd.c for an
> example of this pattern.
> 
> The nvme driver does not implement this pattern. It only calls
> set_disk_ro before device_add_disk, and so the ro uevent is never
> emitted. This breaks applications such as dm-multipath. To avoid
> introducing the messy pattern above into the nvme driver, emit the disk
> ro uevent immediately after announcing addition of the disk.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>   block/genhd.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 11c761afd64f..89a110f0b002 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -394,6 +394,16 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
>   	return 0;
>   }
>   
> +static void set_disk_ro_uevent(struct gendisk *gd, int ro)
> +{
> +	char event[] = "DISK_RO=1";
> +	char *envp[] = { event, NULL };
> +
> +	if (!ro)
> +		event[8] = '0';
> +	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
> +}
> +
>   /**
>    * device_add_disk - add disk information to kernel list
>    * @parent: parent device for the disk
> @@ -522,6 +532,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
>   		 */
>   		dev_set_uevent_suppress(ddev, 0);
>   		disk_uevent(disk, KOBJ_ADD);
> +		set_disk_ro_uevent(disk, get_disk_ro(disk));
>   	}
>   
>   	disk_update_readahead(disk);
> @@ -1419,16 +1430,6 @@ void blk_cleanup_disk(struct gendisk *disk)
>   }
>   EXPORT_SYMBOL(blk_cleanup_disk);
>   
> -static void set_disk_ro_uevent(struct gendisk *gd, int ro)
> -{
> -	char event[] = "DISK_RO=1";
> -	char *envp[] = { event, NULL };
> -
> -	if (!ro)
> -		event[8] = '0';
> -	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
> -}
> -
>   /**
>    * set_disk_ro - set a gendisk read-only
>    * @disk:	gendisk to operate on

How very odd.

Why not add the 'DISK_RO=1' setting directly to the 'add' event?
That would be the logical thing to do, no?

Cheers,

Hannes
Uday Shankar March 7, 2022, 8:54 p.m. UTC | #3
On Mon, Mar 07, 2022 at 07:39:39AM +0100, Hannes Reinecke wrote:
> Why not add the 'DISK_RO=1' setting directly to the 'add' event?
> That would be the logical thing to do, no?
I agree, and initially had a patch that did just this. However, for SCSI
disks the DISK_RO property is only ever announced via change uevents,
and applications such as dm-multipath may not pick up on DISK_RO if it
shows up in an add uevent instead. This patch maintains compatibility
with SCSI in that sense. 

Christoph Hellwig wrote:
> I don't see any such pattern there.
Note how sd_revalidate_disk (which does readonly setting) is called both
before and after device_add_disk. Note also how set_disk_ro is called
twice in sd_read_write_protect_flag, to ensure that the ro state flips
(at least in the case where the ro state should be 1). The only
reasoning I can think of for this pattern is the one I mentioned.

> I also don't see what the point is.  KOBJ_CHANGE uevents tell about a
> change in device state.  But if a device is marked read-only before
> disk_add that read-only state is already visible by the time the
> device is added and thus shows up in sysfs, and we do not need an
> extra notification.
You are suggesting that I should patch the applications I care about to
pick up the ro state from sysfs instead of waiting for a change uevent,
correct?

Thanks,
Uday
Hannes Reinecke March 8, 2022, 6:42 a.m. UTC | #4
On 3/7/22 21:54, Uday Shankar wrote:
> On Mon, Mar 07, 2022 at 07:39:39AM +0100, Hannes Reinecke wrote:
>> Why not add the 'DISK_RO=1' setting directly to the 'add' event?
>> That would be the logical thing to do, no?
> I agree, and initially had a patch that did just this. However, for SCSI
> disks the DISK_RO property is only ever announced via change uevents,
> and applications such as dm-multipath may not pick up on DISK_RO if it
> shows up in an add uevent instead. This patch maintains compatibility
> with SCSI in that sense.
> 

Most rules relating to storage devices test for both, 'add' _and_ 
'change' as DM devices are only usable after a 'change' event.
In particular multipath has been coded with that in mind, so I don't see 
any issues with just adding the RO setting to the 'add' event.

Cheers,

Hannes
Christoph Hellwig March 8, 2022, 6:47 a.m. UTC | #5
On Tue, Mar 08, 2022 at 07:42:40AM +0100, Hannes Reinecke wrote:
> Most rules relating to storage devices test for both, 'add' _and_ 'change'
> as DM devices are only usable after a 'change' event.
> In particular multipath has been coded with that in mind, so I don't see any
> issues with just adding the RO setting to the 'add' event.

We don't even need that.  An application needs to look at the initial
device state at add time.  We can't add extra arguments to the add
event for every bit of state.

And in this case - if you're using dm-multipath on nvme you're already
doing something horribly wrong, and no amount of tweaking uevents is
going to fix that.
diff mbox series

Patch

diff --git a/block/genhd.c b/block/genhd.c
index 11c761afd64f..89a110f0b002 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -394,6 +394,16 @@  int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
 	return 0;
 }
 
+static void set_disk_ro_uevent(struct gendisk *gd, int ro)
+{
+	char event[] = "DISK_RO=1";
+	char *envp[] = { event, NULL };
+
+	if (!ro)
+		event[8] = '0';
+	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
+}
+
 /**
  * device_add_disk - add disk information to kernel list
  * @parent: parent device for the disk
@@ -522,6 +532,7 @@  int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 		 */
 		dev_set_uevent_suppress(ddev, 0);
 		disk_uevent(disk, KOBJ_ADD);
+		set_disk_ro_uevent(disk, get_disk_ro(disk));
 	}
 
 	disk_update_readahead(disk);
@@ -1419,16 +1430,6 @@  void blk_cleanup_disk(struct gendisk *disk)
 }
 EXPORT_SYMBOL(blk_cleanup_disk);
 
-static void set_disk_ro_uevent(struct gendisk *gd, int ro)
-{
-	char event[] = "DISK_RO=1";
-	char *envp[] = { event, NULL };
-
-	if (!ro)
-		event[8] = '0';
-	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
-}
-
 /**
  * set_disk_ro - set a gendisk read-only
  * @disk:	gendisk to operate on