diff mbox series

scsi: sd: block: Handle cases where devices come online read-only

Message ID 20190208233831.31377-1-martin.petersen@oracle.com (mailing list archive)
State Superseded
Headers show
Series scsi: sd: block: Handle cases where devices come online read-only | expand

Commit Message

Martin K. Petersen Feb. 8, 2019, 11:38 p.m. UTC
Some devices come online in write protected state and switch to
read-write once they are ready to process I/O requests. These devices
broke with commit 20bd1d026aac ("scsi: sd: Keep disk read-only when
re-reading partition") because we have no way to distinguish between a
user decision to set a block_device read-only and the disk being write
protected as a result of the hardware state.

To overcome this we add a third state to the gendisk read-only
policy. This flag is exlusively used when the user forces a struct
block_device read-only via BLKROSET. We currently don't allow
switching ro state in sysfs so the ioctl is the only entry point for
this new state.

In set_disk_ro() we check whether the user override flag is in effect
for a disk before changing read-only state based on the device
settings. This means that devices that have a delay before going
read-write will now be able to clear the read-only state. And devices
where the admin or udev has forced the disk read-only will not cause
the gendisk policy to reflect the mode reported by the device.

Cc: Jeremy Cline <jeremy@jcline.org>
Cc: Oleksii Kurochko <olkuroch@cisco.com>
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Oleksii Kurochko <olkuroch@cisco.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221
Fixes: 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading partition")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

I have verified that get_disk_ro() and bdev_read_only() callers all
handle the additional value correctly. Same is true for "ro" in
sysfs.

Note that per-partition ro settings are lost on revalidate. This has
been broken for at least a decade and it will require major surgery to
fix. To my knowledge nobody has complained about being unable to make
partition read-only settings stick through a revalidate. So hopefully
this patch will suffice as a simple fix for stable.
---
 block/genhd.c         | 13 ++++++++++++-
 block/ioctl.c         |  3 ++-
 drivers/scsi/sd.c     |  4 +---
 include/linux/genhd.h |  6 ++++++
 4 files changed, 21 insertions(+), 5 deletions(-)

Comments

Jeremy Cline Feb. 11, 2019, 3:50 p.m. UTC | #1
On 2/8/19 6:38 PM, Martin K. Petersen wrote:
> Some devices come online in write protected state and switch to
> read-write once they are ready to process I/O requests. These devices
> broke with commit 20bd1d026aac ("scsi: sd: Keep disk read-only when
> re-reading partition") because we have no way to distinguish between a
> user decision to set a block_device read-only and the disk being write
> protected as a result of the hardware state.
> 
> To overcome this we add a third state to the gendisk read-only
> policy. This flag is exlusively used when the user forces a struct
> block_device read-only via BLKROSET. We currently don't allow
> switching ro state in sysfs so the ioctl is the only entry point for
> this new state.
> 
> In set_disk_ro() we check whether the user override flag is in effect
> for a disk before changing read-only state based on the device
> settings. This means that devices that have a delay before going
> read-write will now be able to clear the read-only state. And devices
> where the admin or udev has forced the disk read-only will not cause
> the gendisk policy to reflect the mode reported by the device.
> 
> Cc: Jeremy Cline <jeremy@jcline.org>
> Cc: Oleksii Kurochko <olkuroch@cisco.com>
> Cc: stable@vger.kernel.org # v4.16+
> Reported-by: Oleksii Kurochko <olkuroch@cisco.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221
> Fixes: 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading partition")
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 
> I have verified that get_disk_ro() and bdev_read_only() callers all
> handle the additional value correctly. Same is true for "ro" in
> sysfs.
> 
> Note that per-partition ro settings are lost on revalidate. This has
> been broken for at least a decade and it will require major surgery to
> fix. To my knowledge nobody has complained about being unable to make
> partition read-only settings stick through a revalidate. So hopefully
> this patch will suffice as a simple fix for stable.
> ---

Oof, my apologies for this regression. This looks like nice, tidy way to
fix it.

>   block/genhd.c         | 13 ++++++++++++-
>   block/ioctl.c         |  3 ++-
>   drivers/scsi/sd.c     |  4 +---
>   include/linux/genhd.h |  6 ++++++
>   4 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 1dd8fd6613b8..e29805bfa989 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1549,11 +1549,22 @@ void set_disk_ro(struct gendisk *disk, int flag)
>   	struct disk_part_iter piter;
>   	struct hd_struct *part;
>   
> +	/*
> +	 * If the user has forced disk read-only with BLKROSET, ignore
> +	 * any device state change requested by the driver.
> +	 */
> +	if (disk->part0.policy == DISK_POLICY_USER_WRITE_PROTECT)
> +		return;

I noticed drivers/s390/block/dasd_ioctl.c calls set_disk_ro() to set the
policy, where-as the policy is set with set_device_ro() in the generic
ioctl.

It's not setting the policy to DISK_POLICY_USER_WRITE_PROTECT so I think
it would only be a problem if the user set it to 2 instead of 1 assuming
any truthy value is acceptable. Then the user wouldn't be able to mark
the disk as writable again since this would be true. Perhaps it's a
somewhat far-fetched scenario.

>   	if (disk->part0.policy != flag) {
>   		set_disk_ro_uevent(disk, flag);
>   		disk->part0.policy = flag;
>   	}
> -
> +	/*
> +	 * If set_disk_ro() is called from revalidate, all partitions
> +	 * have already been dropped at this point and thus any
> +	 * per-partition user setting lost. Each partition will
> +	 * inherit part0 policy when subsequently re-added.
> +	 */
>   	disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY);
>   	while ((part = disk_part_iter_next(&piter)))
>   		part->policy = flag;
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 4825c78a6baa..16c42e1b18c8 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -451,7 +451,8 @@ static int blkdev_roset(struct block_device *bdev, fmode_t mode,
>   		return ret;
>   	if (get_user(n, (int __user *)arg))
>   		return -EFAULT;
> -	set_device_ro(bdev, n);
> +	set_device_ro(bdev, n ? DISK_POLICY_USER_WRITE_PROTECT :
> +		      DISK_POLICY_WRITABLE);
>   	return 0;
>   }
>   
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index b2da8a00ec33..9aa409b38765 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2591,10 +2591,8 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
>   	int res;
>   	struct scsi_device *sdp = sdkp->device;
>   	struct scsi_mode_data data;
> -	int disk_ro = get_disk_ro(sdkp->disk);
>   	int old_wp = sdkp->write_prot;
>   
> -	set_disk_ro(sdkp->disk, 0);
>   	if (sdp->skip_ms_page_3f) {
>   		sd_first_printk(KERN_NOTICE, sdkp, "Assuming Write Enabled\n");
>   		return;
> @@ -2632,7 +2630,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
>   			  "Test WP failed, assume Write Enabled\n");
>   	} else {
>   		sdkp->write_prot = ((data.device_specific & 0x80) != 0);
> -		set_disk_ro(sdkp->disk, sdkp->write_prot || disk_ro);
> +		set_disk_ro(sdkp->disk, sdkp->write_prot);
>   		if (sdkp->first_scan || old_wp != sdkp->write_prot) {
>   			sd_printk(KERN_NOTICE, sdkp, "Write Protect is %s\n",
>   				  sdkp->write_prot ? "on" : "off");
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 06c0fd594097..2bef434d4dff 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -150,6 +150,12 @@ enum {
>   	DISK_EVENT_EJECT_REQUEST		= 1 << 1, /* eject requested */
>   };
>   
> +enum {
> +	DISK_POLICY_WRITABLE			= 0, /* Default */
> +	DISK_POLICY_DEVICE_WRITE_PROTECT	= 1, /* Set by device driver */
> +	DISK_POLICY_USER_WRITE_PROTECT		= 2, /* Set via BLKROSET */
> +};
> +
>   struct disk_part_tbl {
>   	struct rcu_head rcu_head;
>   	int len;
>
Christoph Hellwig Feb. 12, 2019, 8:03 a.m. UTC | #2
On Fri, Feb 08, 2019 at 06:38:31PM -0500, Martin K. Petersen wrote:
> Some devices come online in write protected state and switch to
> read-write once they are ready to process I/O requests.

That is really weird.  What kind of devices are these?

> Note that per-partition ro settings are lost on revalidate. This has
> been broken for at least a decade and it will require major surgery to
> fix. To my knowledge nobody has complained about being unable to make
> partition read-only settings stick through a revalidate. So hopefully
> this patch will suffice as a simple fix for stable.

Should we warn when we lost these settings on a revalidate?

I have to say I don't like the tristate too much - it seems to allow
setting a hardware write protected device writable again by user
interfaction, right?

Should we just have a hardware and a user policy field that are separate
instead?
Hannes Reinecke Feb. 12, 2019, 8:08 a.m. UTC | #3
On 2/12/19 9:03 AM, Christoph Hellwig wrote:
> On Fri, Feb 08, 2019 at 06:38:31PM -0500, Martin K. Petersen wrote:
>> Some devices come online in write protected state and switch to
>> read-write once they are ready to process I/O requests.
> 
> That is really weird.  What kind of devices are these?
> 
No, not really.
Some arrays (HP IIRC) use a similar mechanism for volume copy or snapshots.

>> Note that per-partition ro settings are lost on revalidate. This has
>> been broken for at least a decade and it will require major surgery to
>> fix. To my knowledge nobody has complained about being unable to make
>> partition read-only settings stick through a revalidate. So hopefully
>> this patch will suffice as a simple fix for stable.
> 
> Should we warn when we lost these settings on a revalidate?
> 
> I have to say I don't like the tristate too much - it seems to allow
> setting a hardware write protected device writable again by user
> interfaction, right?
> 
> Should we just have a hardware and a user policy field that are separate
> instead?
> 
Problem is how the mechanism should work.
Thing is, once a device goes read-only we pretty much stop accessing it 
(as it will send any filesystem down onto the recovery path).
And once we stop sending I/O to it we'll lose the ability to figure out 
that the device switched back to R/W mode.

(Always assuming that we'll be getting a sense code in the first place).

But overall I have to agree with Christoph.
Read-only devices and the handling of which is pretty much 
array-specific, so I doubt we can generalize much here.

Cheers,

Hannes
Martin K. Petersen Feb. 12, 2019, 4:26 p.m. UTC | #4
Jeremy,

> I noticed drivers/s390/block/dasd_ioctl.c calls set_disk_ro() to set the
> policy, where-as the policy is set with set_device_ro() in the generic
> ioctl.

There's a subtle distinction here:

 - set_disk_ro() sets the policy for a whole disk

 - set_device_ro() sets the policy for a block_device (typically
   partition)

> It's not setting the policy to DISK_POLICY_USER_WRITE_PROTECT so I
> think it would only be a problem if the user set it to 2 instead of 1
> assuming any truthy value is acceptable. Then the user wouldn't be
> able to mark the disk as writable again since this would be
> true. Perhaps it's a somewhat far-fetched scenario.

OK, I missed that particular entry point. Will fix.
Bart Van Assche Feb. 12, 2019, 4:27 p.m. UTC | #5
On Fri, 2019-02-08 at 18:38 -0500, Martin K. Petersen wrote:
> diff --git a/block/genhd.c b/block/genhd.c
> index 1dd8fd6613b8..e29805bfa989 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1549,11 +1549,22 @@ void set_disk_ro(struct gendisk *disk, int flag)

Hi Martin,

Please change the data type of flag from 'int' into enum disk_policy. Please
also change the return type of get_disk_ro() and please also update the data
type of the policy member in struct gendisk.

Thanks,

Bart.
Martin K. Petersen Feb. 12, 2019, 4:47 p.m. UTC | #6
Christoph,

>> Some devices come online in write protected state and switch to
>> read-write once they are ready to process I/O requests.
>
> That is really weird.  What kind of devices are these?

There are several. I have many bug reports, ranging from USB sticks to
arrays.

>> Note that per-partition ro settings are lost on revalidate. This has
>> been broken for at least a decade and it will require major surgery to
>> fix. To my knowledge nobody has complained about being unable to make
>> partition read-only settings stick through a revalidate. So hopefully
>> this patch will suffice as a simple fix for stable.
>
> Should we warn when we lost these settings on a revalidate?

Would be nice. But as I wrote, it's going to require major surgery to
the gendisk code to handle this particular scenario correctly since we
currently do not keep any partition state between revalidate calls.

> I have to say I don't like the tristate too much - it seems to allow
> setting a hardware write protected device writable again by user
> interfaction, right?

The original policy was that the user policy was ineffective since the
device setting always won.

Jeremy's patch allowed the user setting to override the device setting
but broke the case where devices subsequently change state at runtime.

My workaround is that the user ioctl ro state, if set, overrides
whatever the device reports. And once the user clears the flag, the
current device setting will take effect on revalidate.

> Should we just have a hardware and a user policy field that are
> separate instead?

All this needs to be completely rewritten. However, my attempt at fixing
it up properly got pretty convoluted and thus unsuitable for stable.

The intent with this patch was merely as a workaround for people stuck
with write-protected drives after boot. The tristate wasn't my first
choice, but it turned out to be the path of least resistance for a
stable fix.
Martin K. Petersen Feb. 12, 2019, 4:50 p.m. UTC | #7
Hannes,

> And once we stop sending I/O to it we'll lose the ability to figure
> out that the device switched back to R/W mode.
>
> (Always assuming that we'll be getting a sense code in the first
> place).

FWIW, I did get correct sense on all the drives I tested and verified
that the code does the right thing.

But obviously I have my doubts about $RANDOM_USB_GIZMO.
diff mbox series

Patch

diff --git a/block/genhd.c b/block/genhd.c
index 1dd8fd6613b8..e29805bfa989 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1549,11 +1549,22 @@  void set_disk_ro(struct gendisk *disk, int flag)
 	struct disk_part_iter piter;
 	struct hd_struct *part;
 
+	/*
+	 * If the user has forced disk read-only with BLKROSET, ignore
+	 * any device state change requested by the driver.
+	 */
+	if (disk->part0.policy == DISK_POLICY_USER_WRITE_PROTECT)
+		return;
 	if (disk->part0.policy != flag) {
 		set_disk_ro_uevent(disk, flag);
 		disk->part0.policy = flag;
 	}
-
+	/*
+	 * If set_disk_ro() is called from revalidate, all partitions
+	 * have already been dropped at this point and thus any
+	 * per-partition user setting lost. Each partition will
+	 * inherit part0 policy when subsequently re-added.
+	 */
 	disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY);
 	while ((part = disk_part_iter_next(&piter)))
 		part->policy = flag;
diff --git a/block/ioctl.c b/block/ioctl.c
index 4825c78a6baa..16c42e1b18c8 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -451,7 +451,8 @@  static int blkdev_roset(struct block_device *bdev, fmode_t mode,
 		return ret;
 	if (get_user(n, (int __user *)arg))
 		return -EFAULT;
-	set_device_ro(bdev, n);
+	set_device_ro(bdev, n ? DISK_POLICY_USER_WRITE_PROTECT :
+		      DISK_POLICY_WRITABLE);
 	return 0;
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b2da8a00ec33..9aa409b38765 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2591,10 +2591,8 @@  sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
 	int res;
 	struct scsi_device *sdp = sdkp->device;
 	struct scsi_mode_data data;
-	int disk_ro = get_disk_ro(sdkp->disk);
 	int old_wp = sdkp->write_prot;
 
-	set_disk_ro(sdkp->disk, 0);
 	if (sdp->skip_ms_page_3f) {
 		sd_first_printk(KERN_NOTICE, sdkp, "Assuming Write Enabled\n");
 		return;
@@ -2632,7 +2630,7 @@  sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
 			  "Test WP failed, assume Write Enabled\n");
 	} else {
 		sdkp->write_prot = ((data.device_specific & 0x80) != 0);
-		set_disk_ro(sdkp->disk, sdkp->write_prot || disk_ro);
+		set_disk_ro(sdkp->disk, sdkp->write_prot);
 		if (sdkp->first_scan || old_wp != sdkp->write_prot) {
 			sd_printk(KERN_NOTICE, sdkp, "Write Protect is %s\n",
 				  sdkp->write_prot ? "on" : "off");
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 06c0fd594097..2bef434d4dff 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -150,6 +150,12 @@  enum {
 	DISK_EVENT_EJECT_REQUEST		= 1 << 1, /* eject requested */
 };
 
+enum {
+	DISK_POLICY_WRITABLE			= 0, /* Default */
+	DISK_POLICY_DEVICE_WRITE_PROTECT	= 1, /* Set by device driver */
+	DISK_POLICY_USER_WRITE_PROTECT		= 2, /* Set via BLKROSET */
+};
+
 struct disk_part_tbl {
 	struct rcu_head rcu_head;
 	int len;