nvme/scsi: Consider LBA format in IO splitting calculation
diff mbox

Message ID 20170425000243.2019-1-jonathan.derrick@intel.com
State New
Headers show

Commit Message

Derrick, Jonathan April 25, 2017, 12:02 a.m. UTC
The current command submission code uses a sector-based value when
considering the maximum number of blocks per command. With a
4k-formatted namespace and a command exceeding max hardware limits, this
calculation doesn't split IOs which should be split and fails in the
nvme layer. This patch fixes that calculation and enables IO splitting
in these circumstances.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/nvme/host/scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jens Axboe April 25, 2017, 4 a.m. UTC | #1
On Mon, Apr 24 2017, Jon Derrick wrote:
> The current command submission code uses a sector-based value when
> considering the maximum number of blocks per command. With a
> 4k-formatted namespace and a command exceeding max hardware limits, this
> calculation doesn't split IOs which should be split and fails in the
> nvme layer. This patch fixes that calculation and enables IO splitting
> in these circumstances.
> 
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
>  drivers/nvme/host/scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
> index f49ae27..988da61 100644
> --- a/drivers/nvme/host/scsi.c
> +++ b/drivers/nvme/host/scsi.c
> @@ -1609,7 +1609,7 @@ static int nvme_trans_do_nvme_io(struct nvme_ns *ns, struct sg_io_hdr *hdr,
>  	struct nvme_command c;
>  	u8 opcode = (is_write ? nvme_cmd_write : nvme_cmd_read);
>  	u16 control;
> -	u32 max_blocks = queue_max_hw_sectors(ns->queue);
> +	u32 max_blocks = queue_max_hw_sectors(ns->queue) >> (ns->lba_shift - 9);
>  
>  	num_cmds = nvme_trans_io_get_num_cmds(hdr, cdb_info, max_blocks);

Patch looks correct to me, as we always consider the hw sectors settings
in units of 512b blocks.

Reviewed-by: Jens Axboe <axboe@fb.com>
Damien Le Moal April 25, 2017, 4:27 a.m. UTC | #2
Jon,

On 4/25/17 13:00, Jens Axboe wrote:
> On Mon, Apr 24 2017, Jon Derrick wrote:
>> The current command submission code uses a sector-based value when
>> considering the maximum number of blocks per command. With a
>> 4k-formatted namespace and a command exceeding max hardware limits, this
>> calculation doesn't split IOs which should be split and fails in the
>> nvme layer. This patch fixes that calculation and enables IO splitting
>> in these circumstances.
>>
>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
>> ---
>>  drivers/nvme/host/scsi.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
>> index f49ae27..988da61 100644
>> --- a/drivers/nvme/host/scsi.c
>> +++ b/drivers/nvme/host/scsi.c
>> @@ -1609,7 +1609,7 @@ static int nvme_trans_do_nvme_io(struct nvme_ns *ns, struct sg_io_hdr *hdr,
>>  	struct nvme_command c;
>>  	u8 opcode = (is_write ? nvme_cmd_write : nvme_cmd_read);
>>  	u16 control;
>> -	u32 max_blocks = queue_max_hw_sectors(ns->queue);
>> +	u32 max_blocks = queue_max_hw_sectors(ns->queue) >> (ns->lba_shift - 9);
>>  
>>  	num_cmds = nvme_trans_io_get_num_cmds(hdr, cdb_info, max_blocks);
> 
> Patch looks correct to me, as we always consider the hw sectors settings
> in units of 512b blocks.
> 
> Reviewed-by: Jens Axboe <axboe@fb.com>

May be replace 9 with SECTOR_SHIFT ?

Jens,

I just realized that this macro is defined in linux/device-mapper.h,
which does not seem like to best place to have it. Why not blkdev.h ?
Any particular reason ? This leads to some strange include dependencies,
like many nfs/blocklayout/ files including device-mapper.h just to get
that definition.

Best regards.
Jens Axboe April 25, 2017, 4:32 a.m. UTC | #3
On 04/24/2017 09:27 PM, Damien Le Moal wrote:
> Jon,
> 
> On 4/25/17 13:00, Jens Axboe wrote:
>> On Mon, Apr 24 2017, Jon Derrick wrote:
>>> The current command submission code uses a sector-based value when
>>> considering the maximum number of blocks per command. With a
>>> 4k-formatted namespace and a command exceeding max hardware limits, this
>>> calculation doesn't split IOs which should be split and fails in the
>>> nvme layer. This patch fixes that calculation and enables IO splitting
>>> in these circumstances.
>>>
>>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
>>> ---
>>>  drivers/nvme/host/scsi.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
>>> index f49ae27..988da61 100644
>>> --- a/drivers/nvme/host/scsi.c
>>> +++ b/drivers/nvme/host/scsi.c
>>> @@ -1609,7 +1609,7 @@ static int nvme_trans_do_nvme_io(struct nvme_ns *ns, struct sg_io_hdr *hdr,
>>>  	struct nvme_command c;
>>>  	u8 opcode = (is_write ? nvme_cmd_write : nvme_cmd_read);
>>>  	u16 control;
>>> -	u32 max_blocks = queue_max_hw_sectors(ns->queue);
>>> +	u32 max_blocks = queue_max_hw_sectors(ns->queue) >> (ns->lba_shift - 9);
>>>  
>>>  	num_cmds = nvme_trans_io_get_num_cmds(hdr, cdb_info, max_blocks);
>>
>> Patch looks correct to me, as we always consider the hw sectors settings
>> in units of 512b blocks.
>>
>> Reviewed-by: Jens Axboe <axboe@fb.com>
> 
> May be replace 9 with SECTOR_SHIFT ?
> 
> Jens,
> 
> I just realized that this macro is defined in linux/device-mapper.h,
> which does not seem like to best place to have it. Why not blkdev.h ?
> Any particular reason ? This leads to some strange include dependencies,
> like many nfs/blocklayout/ files including device-mapper.h just to get
> that definition.

I'm fine with moving it and using it everywhere. Right now we don't in
the block core at all, 9 is always hard coded. While that change would
be fine, it should be done independently of this patch, which is a real
bug fix.
Damien Le Moal April 25, 2017, 4:43 a.m. UTC | #4
Jens,

On 4/25/17 13:32, Jens Axboe wrote:
>> I just realized that this macro is defined in linux/device-mapper.h,
>> which does not seem like to best place to have it. Why not blkdev.h ?
>> Any particular reason ? This leads to some strange include dependencies,
>> like many nfs/blocklayout/ files including device-mapper.h just to get
>> that definition.
> 
> I'm fine with moving it and using it everywhere. Right now we don't in
> the block core at all, 9 is always hard coded. While that change would
> be fine, it should be done independently of this patch, which is a real
> bug fix.

Thank you for the clarification. I will try to send something later.

Best regards.
Christoph Hellwig April 25, 2017, 5:57 p.m. UTC | #5
Applied to nvme-4.12.

Patch
diff mbox

diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index f49ae27..988da61 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -1609,7 +1609,7 @@  static int nvme_trans_do_nvme_io(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	struct nvme_command c;
 	u8 opcode = (is_write ? nvme_cmd_write : nvme_cmd_read);
 	u16 control;
-	u32 max_blocks = queue_max_hw_sectors(ns->queue);
+	u32 max_blocks = queue_max_hw_sectors(ns->queue) >> (ns->lba_shift - 9);
 
 	num_cmds = nvme_trans_io_get_num_cmds(hdr, cdb_info, max_blocks);