diff mbox series

[RFC,1/5] add tweakable bounds_check flag, now off by default

Message ID 20180806045115.7725-2-dgilbert@interlog.com (mailing list archive)
State Deferred
Headers show
Series sd: init_command() and sd_done() speed-ups | expand

Commit Message

Douglas Gilbert Aug. 6, 2018, 4:51 a.m. UTC
Add bounds_check "rw" attribute to the sd driver. It controls whether
each read/write operation submission does an "out of range" bounds check
and a LBA/number_of_blocks alignment bounds check. The mainline kernel
currently does both these checks. This patch changes that default
to bounds_check=false, that is: the two checks are not done.

SBC, SBC-2, SBC-3 and draft SBC-4 are require a device server (i.e. a
SCSI disk) to fail any media command in which the LBA+number_of_blocks
exceeds the capacity of the disk. So why should the sd driver also
check it, given the block layer generated the request and knows the
disk capacity and the disk itself also checks it?

The block layer does almost all its block handling in units of 512 byte
blocks whereas SCSI disks may have other logical block sizes (e.g. 4096
byte logical blocks). So when the block sizes are different it is
possible that there is an alignment issue. But if that occurs, it
would be a logic issue in the block layer. Note that the current
mainline code does this check even when it is not needed (e.g. when
the logical block size of the disk is 512 bytes).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---

Should a mechanism be added so this could set/cleared by:
  - a LLD for all disks it controls
  - kernel boot time parameter?

 drivers/scsi/sd.c | 55 +++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/sd.h |  1 +
 2 files changed, 45 insertions(+), 11 deletions(-)

Comments

Damien Le Moal Aug. 6, 2018, 5:41 a.m. UTC | #1
Douglas,

On 2018/08/06 13:51, Douglas Gilbert wrote:
> Add bounds_check "rw" attribute to the sd driver. It controls whether
> each read/write operation submission does an "out of range" bounds check
> and a LBA/number_of_blocks alignment bounds check. The mainline kernel
> currently does both these checks. This patch changes that default
> to bounds_check=false, that is: the two checks are not done.
> 
> SBC, SBC-2, SBC-3 and draft SBC-4 are require a device server (i.e. a

"are require" -> "require"

> SCSI disk) to fail any media command in which the LBA+number_of_blocks
> exceeds the capacity of the disk. So why should the sd driver also
> check it, given the block layer generated the request and knows the
> disk capacity and the disk itself also checks it?

What about SATA drives for non read/write commands when accessible max address
is set lower to a value lower then native max address ? Need to dig again in the
code and the specs, but I think to remember that Linux disk capacity is in that
case set to accessible max address, but some commands may address beyond that
value.
> The block layer does almost all its block handling in units of 512 byte
> blocks whereas SCSI disks may have other logical block sizes (e.g. 4096
> byte logical blocks). So when the block sizes are different it is
> possible that there is an alignment issue. But if that occurs, it
> would be a logic issue in the block layer. Note that the current
> mainline code does this check even when it is not needed (e.g. when
> the logical block size of the disk is 512 bytes).

Of note here is that there is a special case that is not being handled: ZBC
drives with 512B logical and 4K physical blocks will accept any 512B aligned
read commands (that is any read command), but will require 4K aligned write
commands for sequential write preferred zones (writes to conventional zones can
be 512B aligned). Since there is currently no differentiation between read and
write commands for the alignment check, much less based on the target zone type
of these commands, checks are incomplete for ZBC & ZAC disks in the generic
block layer. These checks could be added to SD though now that a bitmap is
attached to the device request queue to indicate if a zone is conventional or
sequential. For such test to be effective, sdkp->bounds_check would need to be
forced to 1 for host managed zoned disks.

> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
> 
> Should a mechanism be added so this could set/cleared by:
>   - a LLD for all disks it controls
>   - kernel boot time parameter?
> 
>  drivers/scsi/sd.c | 55 +++++++++++++++++++++++++++++++++++++----------
>  drivers/scsi/sd.h |  1 +
>  2 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d1d08f039bdd..b17b8c66881d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -522,6 +522,32 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RW(max_write_same_blocks);
>  
> +static ssize_t
> +bounds_check_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct scsi_disk *sdkp = to_scsi_disk(dev);
> +
> +	return sprintf(buf, "%u\n", sdkp->bounds_check);
> +}
> +
> +static ssize_t
> +bounds_check_store(struct device *dev, struct device_attribute *attr,
> +		   const char *buf, size_t count)
> +{
> +	struct scsi_disk *sdkp = to_scsi_disk(dev);
> +	int err, n;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EACCES;
> +
> +	err = kstrtoint(buf, 10, &n);
> +	if (!err)
> +		sdkp->bounds_check = !!n;
> +
> +	return err ? err : count;
> +}
> +static DEVICE_ATTR_RW(bounds_check);
> +
>  static struct attribute *sd_disk_attrs[] = {
>  	&dev_attr_cache_type.attr,
>  	&dev_attr_FUA.attr,
> @@ -535,6 +561,7 @@ static struct attribute *sd_disk_attrs[] = {
>  	&dev_attr_zeroing_mode.attr,
>  	&dev_attr_max_write_same_blocks.attr,
>  	&dev_attr_max_medium_access_timeouts.attr,
> +	&dev_attr_bounds_check.attr,
>  	NULL,
>  };
>  ATTRIBUTE_GROUPS(sd_disk);
> @@ -1088,7 +1115,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>  	sector_t lba = sectors_to_logical(sdp, blk_rq_pos(rq));
>  	unsigned int nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
> -	unsigned int mask = logical_to_sectors(sdp, 1) - 1;
>  	unsigned char protect, fua;
>  	bool write = rq_data_dir(rq) == WRITE;
>  	bool dif, dix;
> @@ -1106,17 +1132,24 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  		goto out;
>  	}
>  
> -	if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
> -		     > logical_to_sectors(sdp, sdkp->capacity))) {
> -		scmd_printk(KERN_ERR, cmd, "access beyond end of device\n");
> -		ret = BLKPREP_KILL;
> -		goto out;
> -	}
> +	if (sdkp->bounds_check) {
> +		unsigned int mask = logical_to_sectors(sdp, 1) - 1;
>  
> -	if (unlikely((blk_rq_pos(rq) & mask) || (blk_rq_sectors(rq) & mask))) {
> -		scmd_printk(KERN_ERR, cmd, "request not aligned to the logical block size\n");
> -		ret = BLKPREP_KILL;
> -		goto out;
> +		if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
> +			     > logical_to_sectors(sdp, sdkp->capacity))) {
> +			scmd_printk(KERN_ERR, cmd,
> +				    "access beyond end of device\n");
> +			ret = BLKPREP_KILL;
> +			goto out;
> +		}
> +
> +		if (unlikely((blk_rq_pos(rq) & mask) ||
> +			     (blk_rq_sectors(rq) & mask))) {
> +			scmd_printk(KERN_ERR, cmd,
> +			    "request not aligned to logical block size\n");
> +			ret = BLKPREP_KILL;
> +			goto out;
> +		}
>  	}
>  
>  	/*
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 392c7d078ae3..6f58d130fb75 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -117,6 +117,7 @@ struct scsi_disk {
>  	unsigned	urswrz : 1;
>  	unsigned	security : 1;
>  	unsigned	ignore_medium_access_errors : 1;
> +	unsigned        bounds_check : 1;
>  };
>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>  
>
Douglas Gilbert Aug. 6, 2018, 4:12 p.m. UTC | #2
On 2018-08-06 01:41 AM, Damien Le Moal wrote:
> Douglas,
> 
> On 2018/08/06 13:51, Douglas Gilbert wrote:
>> Add bounds_check "rw" attribute to the sd driver. It controls whether
>> each read/write operation submission does an "out of range" bounds check
>> and a LBA/number_of_blocks alignment bounds check. The mainline kernel
>> currently does both these checks. This patch changes that default
>> to bounds_check=false, that is: the two checks are not done.
>>
>> SBC, SBC-2, SBC-3 and draft SBC-4 are require a device server (i.e. a
> 
> "are require" -> "require"

Yes.

>> SCSI disk) to fail any media command in which the LBA+number_of_blocks
>> exceeds the capacity of the disk. So why should the sd driver also
>> check it, given the block layer generated the request and knows the
>> disk capacity and the disk itself also checks it?
> 
> What about SATA drives for non read/write commands when accessible max address
> is set lower to a value lower then native max address ? Need to dig again in the
> code and the specs, but I think to remember that Linux disk capacity is in that
> case set to accessible max address, but some commands may address beyond that
> value.

I do not consider this patch complete. You make a good argument why SATA
disks should have the "out of range" test done.

The worst case is that the bounds_check is not done and the device silently
wraps the LBA then overwrites some innocent blocks.

So perhaps struct scsi_host_template and scsi_host need a bounds_check:1
flag so certain "SCSI" hosts (e.g. ATA and USB mass storage but not UAS(P))
could set it by default. So the SATA host (libata ?) could set the bit in
its template.

Would the block layer also want to tweak bounds_check?

>> The block layer does almost all its block handling in units of 512 byte
>> blocks whereas SCSI disks may have other logical block sizes (e.g. 4096
>> byte logical blocks). So when the block sizes are different it is
>> possible that there is an alignment issue. But if that occurs, it
>> would be a logic issue in the block layer. Note that the current
>> mainline code does this check even when it is not needed (e.g. when
>> the logical block size of the disk is 512 bytes).
> 
> Of note here is that there is a special case that is not being handled: ZBC
> drives with 512B logical and 4K physical blocks will accept any 512B aligned
> read commands (that is any read command), but will require 4K aligned write
> commands for sequential write preferred zones (writes to conventional zones can
> be 512B aligned). Since there is currently no differentiation between read and
> write commands for the alignment check, much less based on the target zone type
> of these commands, checks are incomplete for ZBC & ZAC disks in the generic
> block layer. These checks could be added to SD though now that a bitmap is
> attached to the device request queue to indicate if a zone is conventional or
> sequential. For such test to be effective, sdkp->bounds_check would need to be
> forced to 1 for host managed zoned disks.

Like the "out of range" case I guess that compliant ZBC & ZAC disks are required
to fail this case and send back an indicative error message. But yes, extra code
could be added to do that. If there are enough ZBC/ZAC special cases (like the
protect code) we could split the submission side into three fast paths on
entry to sd_init_command(): one for protect, one for ZBC/ZAC and one for the
rest (the most common case, I assume). That leaves open the question can a ZBC
disk also have protection information :-)

Doug Gilbert

>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>
>> Should a mechanism be added so this could set/cleared by:
>>    - a LLD for all disks it controls
>>    - kernel boot time parameter?
>>
>>   drivers/scsi/sd.c | 55 +++++++++++++++++++++++++++++++++++++----------
>>   drivers/scsi/sd.h |  1 +
>>   2 files changed, 45 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index d1d08f039bdd..b17b8c66881d 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -522,6 +522,32 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
>>   }
>>   static DEVICE_ATTR_RW(max_write_same_blocks);
>>   
>> +static ssize_t
>> +bounds_check_show(struct device *dev, struct device_attribute *attr, char *buf)
>> +{
>> +	struct scsi_disk *sdkp = to_scsi_disk(dev);
>> +
>> +	return sprintf(buf, "%u\n", sdkp->bounds_check);
>> +}
>> +
>> +static ssize_t
>> +bounds_check_store(struct device *dev, struct device_attribute *attr,
>> +		   const char *buf, size_t count)
>> +{
>> +	struct scsi_disk *sdkp = to_scsi_disk(dev);
>> +	int err, n;
>> +
>> +	if (!capable(CAP_SYS_ADMIN))
>> +		return -EACCES;
>> +
>> +	err = kstrtoint(buf, 10, &n);
>> +	if (!err)
>> +		sdkp->bounds_check = !!n;
>> +
>> +	return err ? err : count;
>> +}
>> +static DEVICE_ATTR_RW(bounds_check);
>> +
>>   static struct attribute *sd_disk_attrs[] = {
>>   	&dev_attr_cache_type.attr,
>>   	&dev_attr_FUA.attr,
>> @@ -535,6 +561,7 @@ static struct attribute *sd_disk_attrs[] = {
>>   	&dev_attr_zeroing_mode.attr,
>>   	&dev_attr_max_write_same_blocks.attr,
>>   	&dev_attr_max_medium_access_timeouts.attr,
>> +	&dev_attr_bounds_check.attr,
>>   	NULL,
>>   };
>>   ATTRIBUTE_GROUPS(sd_disk);
>> @@ -1088,7 +1115,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>   	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>>   	sector_t lba = sectors_to_logical(sdp, blk_rq_pos(rq));
>>   	unsigned int nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
>> -	unsigned int mask = logical_to_sectors(sdp, 1) - 1;
>>   	unsigned char protect, fua;
>>   	bool write = rq_data_dir(rq) == WRITE;
>>   	bool dif, dix;
>> @@ -1106,17 +1132,24 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>   		goto out;
>>   	}
>>   
>> -	if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
>> -		     > logical_to_sectors(sdp, sdkp->capacity))) {
>> -		scmd_printk(KERN_ERR, cmd, "access beyond end of device\n");
>> -		ret = BLKPREP_KILL;
>> -		goto out;
>> -	}
>> +	if (sdkp->bounds_check) {
>> +		unsigned int mask = logical_to_sectors(sdp, 1) - 1;
>>   
>> -	if (unlikely((blk_rq_pos(rq) & mask) || (blk_rq_sectors(rq) & mask))) {
>> -		scmd_printk(KERN_ERR, cmd, "request not aligned to the logical block size\n");
>> -		ret = BLKPREP_KILL;
>> -		goto out;
>> +		if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
>> +			     > logical_to_sectors(sdp, sdkp->capacity))) {
>> +			scmd_printk(KERN_ERR, cmd,
>> +				    "access beyond end of device\n");
>> +			ret = BLKPREP_KILL;
>> +			goto out;
>> +		}
>> +
>> +		if (unlikely((blk_rq_pos(rq) & mask) ||
>> +			     (blk_rq_sectors(rq) & mask))) {
>> +			scmd_printk(KERN_ERR, cmd,
>> +			    "request not aligned to logical block size\n");
>> +			ret = BLKPREP_KILL;
>> +			goto out;
>> +		}
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 392c7d078ae3..6f58d130fb75 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -117,6 +117,7 @@ struct scsi_disk {
>>   	unsigned	urswrz : 1;
>>   	unsigned	security : 1;
>>   	unsigned	ignore_medium_access_errors : 1;
>> +	unsigned        bounds_check : 1;
>>   };
>>   #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>>   
>>
> 
>
Damien Le Moal Aug. 21, 2018, 3:37 p.m. UTC | #3
Douglas,

On 2018/08/06 9:12, Douglas Gilbert wrote:
> ...
> So perhaps struct scsi_host_template and scsi_host need a bounds_check:1
> flag so certain "SCSI" hosts (e.g. ATA and USB mass storage but not UAS(P))
> could set it by default. So the SATA host (libata ?) could set the bit in
> its template.
> 
> Would the block layer also want to tweak bounds_check?

Hmm... I do not think so. The block device capacity is set by the device driver
when the device queue is created and that value is used to check and validate
the bios/requests being built/issues by the device user. I think that check
needs to stay as it is since it is common to many different block devices that
may not represent a single HW device (e.g. device mappers, md, etc).

>>> The block layer does almost all its block handling in units of 512 byte
>>> blocks whereas SCSI disks may have other logical block sizes (e.g. 4096
>>> byte logical blocks). So when the block sizes are different it is
>>> possible that there is an alignment issue. But if that occurs, it
>>> would be a logic issue in the block layer. Note that the current
>>> mainline code does this check even when it is not needed (e.g. when
>>> the logical block size of the disk is 512 bytes).
>>
>> Of note here is that there is a special case that is not being handled: ZBC
>> drives with 512B logical and 4K physical blocks will accept any 512B aligned
>> read commands (that is any read command), but will require 4K aligned write
>> commands for sequential write preferred zones (writes to conventional zones can
>> be 512B aligned). Since there is currently no differentiation between read and
>> write commands for the alignment check, much less based on the target zone type
>> of these commands, checks are incomplete for ZBC & ZAC disks in the generic
>> block layer. These checks could be added to SD though now that a bitmap is
>> attached to the device request queue to indicate if a zone is conventional or
>> sequential. For such test to be effective, sdkp->bounds_check would need to be
>> forced to 1 for host managed zoned disks.
> 
> Like the "out of range" case I guess that compliant ZBC & ZAC disks are required
> to fail this case and send back an indicative error message. But yes, extra code
> could be added to do that. If there are enough ZBC/ZAC special cases (like the
> protect code) we could split the submission side into three fast paths on
> entry to sd_init_command(): one for protect, one for ZBC/ZAC and one for the
> rest (the most common case, I assume). That leaves open the question can a ZBC
> disk also have protection information :-)

I do not know of any such drive on the market. The standards certainly do not
prevent it, so that is something that we should have as a possibility.

Best regards.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d1d08f039bdd..b17b8c66881d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -522,6 +522,32 @@  max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(max_write_same_blocks);
 
+static ssize_t
+bounds_check_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+	return sprintf(buf, "%u\n", sdkp->bounds_check);
+}
+
+static ssize_t
+bounds_check_store(struct device *dev, struct device_attribute *attr,
+		   const char *buf, size_t count)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+	int err, n;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	err = kstrtoint(buf, 10, &n);
+	if (!err)
+		sdkp->bounds_check = !!n;
+
+	return err ? err : count;
+}
+static DEVICE_ATTR_RW(bounds_check);
+
 static struct attribute *sd_disk_attrs[] = {
 	&dev_attr_cache_type.attr,
 	&dev_attr_FUA.attr,
@@ -535,6 +561,7 @@  static struct attribute *sd_disk_attrs[] = {
 	&dev_attr_zeroing_mode.attr,
 	&dev_attr_max_write_same_blocks.attr,
 	&dev_attr_max_medium_access_timeouts.attr,
+	&dev_attr_bounds_check.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(sd_disk);
@@ -1088,7 +1115,6 @@  static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	sector_t lba = sectors_to_logical(sdp, blk_rq_pos(rq));
 	unsigned int nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
-	unsigned int mask = logical_to_sectors(sdp, 1) - 1;
 	unsigned char protect, fua;
 	bool write = rq_data_dir(rq) == WRITE;
 	bool dif, dix;
@@ -1106,17 +1132,24 @@  static int sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 		goto out;
 	}
 
-	if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
-		     > logical_to_sectors(sdp, sdkp->capacity))) {
-		scmd_printk(KERN_ERR, cmd, "access beyond end of device\n");
-		ret = BLKPREP_KILL;
-		goto out;
-	}
+	if (sdkp->bounds_check) {
+		unsigned int mask = logical_to_sectors(sdp, 1) - 1;
 
-	if (unlikely((blk_rq_pos(rq) & mask) || (blk_rq_sectors(rq) & mask))) {
-		scmd_printk(KERN_ERR, cmd, "request not aligned to the logical block size\n");
-		ret = BLKPREP_KILL;
-		goto out;
+		if (unlikely(blk_rq_pos(rq) + blk_rq_sectors(rq)
+			     > logical_to_sectors(sdp, sdkp->capacity))) {
+			scmd_printk(KERN_ERR, cmd,
+				    "access beyond end of device\n");
+			ret = BLKPREP_KILL;
+			goto out;
+		}
+
+		if (unlikely((blk_rq_pos(rq) & mask) ||
+			     (blk_rq_sectors(rq) & mask))) {
+			scmd_printk(KERN_ERR, cmd,
+			    "request not aligned to logical block size\n");
+			ret = BLKPREP_KILL;
+			goto out;
+		}
 	}
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 392c7d078ae3..6f58d130fb75 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -117,6 +117,7 @@  struct scsi_disk {
 	unsigned	urswrz : 1;
 	unsigned	security : 1;
 	unsigned	ignore_medium_access_errors : 1;
+	unsigned        bounds_check : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)