diff mbox

Should queue_max_hw_sectors() always be <= max_dev_sectors?

Message ID 202d350b-0e59-1a10-4b65-3563f0afa084@redhat.com
State New, archived
Headers show

Commit Message

Andy Grover Aug. 30, 2016, 4:54 p.m. UTC
Hi Martin, (you're the last person to touch this? :)

If I create a local SCSI device using LIO's loopback fabric, and backed 
by a file, /sys/block/sda/queue/max_hw_sectors_kb reports 32767. This 
reflects the loopback scsi_host_template reporting 65536 max_sectors, 
but that's the "controller" -- the actual device reports via the Block 
Limits VPD of supporting a max size of 16384 512-byte blocks.

Therefore, using max_hw_sectors_kb will give a wrong impression for what 
the max data size that can be issued to the device is -- 32767KiB 
instead of 8192KiB. Should max_hw_sectors_kb be reported as the lesser 
of max_hw_sectors and max_dev_sectors? Something like:


...because limits.max_hw_sectors internally seems to be just the 
controller limit, but what users of queue_max_hw_sectors (including 
sysfs) really want is the lesser of limits.max_hw_sectors and 
limits.max_dev_sectors.

BTW the docs say:

max_hw_sectors_kb (RO)
----------------------
This is the maximum number of kilobytes supported in a single data transfer.

Thanks -- Regards -- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jens Axboe Aug. 31, 2016, 2:11 a.m. UTC | #1
On 08/30/2016 10:54 AM, Andy Grover wrote:
> Hi Martin, (you're the last person to touch this? :)
>
> If I create a local SCSI device using LIO's loopback fabric, and backed
> by a file, /sys/block/sda/queue/max_hw_sectors_kb reports 32767. This
> reflects the loopback scsi_host_template reporting 65536 max_sectors,
> but that's the "controller" -- the actual device reports via the Block
> Limits VPD of supporting a max size of 16384 512-byte blocks.
>
> Therefore, using max_hw_sectors_kb will give a wrong impression for what
> the max data size that can be issued to the device is -- 32767KiB
> instead of 8192KiB. Should max_hw_sectors_kb be reported as the lesser
> of max_hw_sectors and max_dev_sectors? Something like:
>
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1196,7 +1196,7 @@ static inline unsigned int
> queue_max_sectors(struct request_queue *q)
>
>  static inline unsigned int queue_max_hw_sectors(struct request_queue *q)
>  {
> -       return q->limits.max_hw_sectors;
> +       return min_not_zero(q->limits.max_hw_sectors,
> q->limits.max_dev_sectors);
>  }
>
> ...because limits.max_hw_sectors internally seems to be just the
> controller limit, but what users of queue_max_hw_sectors (including
> sysfs) really want is the lesser of limits.max_hw_sectors and
> limits.max_dev_sectors.

It should be the maximum supported to the device, which is the minimum
of any limitations in that path. So I'd say your change is correct.
Martin K. Petersen Aug. 31, 2016, 2:15 a.m. UTC | #2
>>>>> "Andy" == Andy Grover <agrover@redhat.com> writes:

Hey Andy,

Andy> ...because limits.max_hw_sectors internally seems to be just the
Andy> controller limit, but what users of queue_max_hw_sectors
Andy> (including sysfs) really want is the lesser of
Andy> limits.max_hw_sectors and limits.max_dev_sectors.

max_dev_sectors is a limit reported by the target device for
READ/WRITE/VERIFY requests. Whereas max_hw_sectors describes the DMA
constraints of the controller.

It is a requirement for SG_IO, firmware updates, tape drives, etc. that
the controller limits are reported correctly since they do I/Os that are
much bigger than typical filesystem requests. When we tried to conflate
the two things broke in interesting ways.

I can't remember why the max_dev_sectors sysfs export patch didn't go
in. But instead (or in addition) we could entertain having a
max_rw_sectors_kb file that reflects the biggest READ/WRITE/VERIFY
request the device can consume if that's the information you're after?
Jens Axboe Aug. 31, 2016, 2:36 a.m. UTC | #3
On 08/30/2016 08:15 PM, Martin K. Petersen wrote:
>>>>>> "Andy" == Andy Grover <agrover@redhat.com> writes:
>
> Hey Andy,
>
> Andy> ...because limits.max_hw_sectors internally seems to be just the
> Andy> controller limit, but what users of queue_max_hw_sectors
> Andy> (including sysfs) really want is the lesser of
> Andy> limits.max_hw_sectors and limits.max_dev_sectors.
>
> max_dev_sectors is a limit reported by the target device for
> READ/WRITE/VERIFY requests. Whereas max_hw_sectors describes the DMA
> constraints of the controller.
>
> It is a requirement for SG_IO, firmware updates, tape drives, etc. that
> the controller limits are reported correctly since they do I/Os that are
> much bigger than typical filesystem requests. When we tried to conflate
> the two things broke in interesting ways.

That's crazy. If max_hw_sectors_kb is expected to be the DMA constraint,
then it makes no sense to name it 'sectors'. We currently have the one
exported constraints, and making it anything but the 'this is the
maximum IO size we can support' would be nonsensical.

> I can't remember why the max_dev_sectors sysfs export patch didn't go
> in. But instead (or in addition) we could entertain having a
> max_rw_sectors_kb file that reflects the biggest READ/WRITE/VERIFY
> request the device can consume if that's the information you're after?

Which, coincidentally, is what that file already is for most other
devices. It'd make more sense to bring SCSI into this realm, instead of
imposing this weird world view on others. We export max_segments and
max_segments_size, which are basically a window into what the DMA engine
can do.
Martin K. Petersen Aug. 31, 2016, 4:11 a.m. UTC | #4
>>>>> "Jens" == Jens Axboe <axboe@kernel.dk> writes:

Jens,

>> It is a requirement for SG_IO, firmware updates, tape drives,
>> etc. that the controller limits are reported correctly since they do
>> I/Os that are much bigger than typical filesystem requests. When we
>> tried to conflate the two things broke in interesting ways.

Jens> That's crazy. If max_hw_sectors_kb is expected to be the DMA
Jens> constraint, then it makes no sense to name it 'sectors'.

I don't believe I named it :)

It's the maximum DMA transfer size supported by the controller. Some
controllers have limitations above and beyond their scatter-gather
capabilities.

You could argue that they should use max_dev_sectors since the limit is
probably more of a firmware virtual disk thing than a DMA engine
constraint. But max_dev_sectors and the matching SCSI transfer size cap
are both very new to the game.

Jens> We currently have the one exported constraints, and making it
Jens> anything but the 'this is the maximum IO size we can support'
Jens> would be nonsensical.

That implies that the value is the same for all types of I/O which is
not the case. Hasn't been for a long time. See discard, write same,
etc. blk_rq_get_max_sectors() is passed a struct request because the
answer depends on what type of request and which operation it is.

With max_hw_sectors_kb we're not so lucky. We could assume read/write
but it would break existing apps that check it for SG_IO or to set tape
block size. And they do exist.

Jens> Which, coincidentally, is what that file already is for most other
Jens> devices.

Because none of the other protocols happen to have device limits.

Jens> It'd make more sense to bring SCSI into this realm, instead of
Jens> imposing this weird world view on others.

Hey, I didn't invent the SCSI device constraint. I think it sucks. But
it is a hard limit and we absolutely have to enforce it. And we can't
lower the hw limit willy-nilly because it breaks non-REQ_TYPE_FS
requests.

As a result, max_dev_sectors is only used by sd and it's only used to
gate max_sectors_kb for filesystem requests. I would have happily kept
it internal to SCSI but users need to be able to set max_sectors_kb
through sysfs. And consequently I need to be able to clamp on it there.

I'm happy to clean things up if you have a better idea how to enforce
the device limit without breaking existing apps. It's possible that with
the arbitrary bio size changes some of the original pain points have
gone away. Will take a look.

But implementation aesthetics aside, Andy's problem still
persists. Because at least in the context of SCSI the answer to his
question is "it depends" :(
diff mbox

Patch

--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1196,7 +1196,7 @@  static inline unsigned int 
queue_max_sectors(struct request_queue *q)

  static inline unsigned int queue_max_hw_sectors(struct request_queue *q)
  {
-       return q->limits.max_hw_sectors;
+       return min_not_zero(q->limits.max_hw_sectors, 
q->limits.max_dev_sectors);
  }