Message ID | 202d350b-0e59-1a10-4b65-3563f0afa084@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 08/30/2016 10:54 AM, Andy Grover wrote: > Hi Martin, (you're the last person to touch this? :) > > If I create a local SCSI device using LIO's loopback fabric, and backed > by a file, /sys/block/sda/queue/max_hw_sectors_kb reports 32767. This > reflects the loopback scsi_host_template reporting 65536 max_sectors, > but that's the "controller" -- the actual device reports via the Block > Limits VPD of supporting a max size of 16384 512-byte blocks. > > Therefore, using max_hw_sectors_kb will give a wrong impression for what > the max data size that can be issued to the device is -- 32767KiB > instead of 8192KiB. Should max_hw_sectors_kb be reported as the lesser > of max_hw_sectors and max_dev_sectors? Something like: > > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1196,7 +1196,7 @@ static inline unsigned int > queue_max_sectors(struct request_queue *q) > > static inline unsigned int queue_max_hw_sectors(struct request_queue *q) > { > - return q->limits.max_hw_sectors; > + return min_not_zero(q->limits.max_hw_sectors, > q->limits.max_dev_sectors); > } > > ...because limits.max_hw_sectors internally seems to be just the > controller limit, but what users of queue_max_hw_sectors (including > sysfs) really want is the lesser of limits.max_hw_sectors and > limits.max_dev_sectors. It should be the maximum supported to the device, which is the minimum of any limitations in that path. So I'd say your change is correct.
>>>>> "Andy" == Andy Grover <agrover@redhat.com> writes:
Hey Andy,
Andy> ...because limits.max_hw_sectors internally seems to be just the
Andy> controller limit, but what users of queue_max_hw_sectors
Andy> (including sysfs) really want is the lesser of
Andy> limits.max_hw_sectors and limits.max_dev_sectors.
max_dev_sectors is a limit reported by the target device for
READ/WRITE/VERIFY requests. Whereas max_hw_sectors describes the DMA
constraints of the controller.
It is a requirement for SG_IO, firmware updates, tape drives, etc. that
the controller limits are reported correctly since they do I/Os that are
much bigger than typical filesystem requests. When we tried to conflate
the two things broke in interesting ways.
I can't remember why the max_dev_sectors sysfs export patch didn't go
in. But instead (or in addition) we could entertain having a
max_rw_sectors_kb file that reflects the biggest READ/WRITE/VERIFY
request the device can consume if that's the information you're after?
On 08/30/2016 08:15 PM, Martin K. Petersen wrote: >>>>>> "Andy" == Andy Grover <agrover@redhat.com> writes: > > Hey Andy, > > Andy> ...because limits.max_hw_sectors internally seems to be just the > Andy> controller limit, but what users of queue_max_hw_sectors > Andy> (including sysfs) really want is the lesser of > Andy> limits.max_hw_sectors and limits.max_dev_sectors. > > max_dev_sectors is a limit reported by the target device for > READ/WRITE/VERIFY requests. Whereas max_hw_sectors describes the DMA > constraints of the controller. > > It is a requirement for SG_IO, firmware updates, tape drives, etc. that > the controller limits are reported correctly since they do I/Os that are > much bigger than typical filesystem requests. When we tried to conflate > the two things broke in interesting ways. That's crazy. If max_hw_sectors_kb is expected to be the DMA constraint, then it makes no sense to name it 'sectors'. We currently have the one exported constraints, and making it anything but the 'this is the maximum IO size we can support' would be nonsensical. > I can't remember why the max_dev_sectors sysfs export patch didn't go > in. But instead (or in addition) we could entertain having a > max_rw_sectors_kb file that reflects the biggest READ/WRITE/VERIFY > request the device can consume if that's the information you're after? Which, coincidentally, is what that file already is for most other devices. It'd make more sense to bring SCSI into this realm, instead of imposing this weird world view on others. We export max_segments and max_segments_size, which are basically a window into what the DMA engine can do.
>>>>> "Jens" == Jens Axboe <axboe@kernel.dk> writes: Jens, >> It is a requirement for SG_IO, firmware updates, tape drives, >> etc. that the controller limits are reported correctly since they do >> I/Os that are much bigger than typical filesystem requests. When we >> tried to conflate the two things broke in interesting ways. Jens> That's crazy. If max_hw_sectors_kb is expected to be the DMA Jens> constraint, then it makes no sense to name it 'sectors'. I don't believe I named it :) It's the maximum DMA transfer size supported by the controller. Some controllers have limitations above and beyond their scatter-gather capabilities. You could argue that they should use max_dev_sectors since the limit is probably more of a firmware virtual disk thing than a DMA engine constraint. But max_dev_sectors and the matching SCSI transfer size cap are both very new to the game. Jens> We currently have the one exported constraints, and making it Jens> anything but the 'this is the maximum IO size we can support' Jens> would be nonsensical. That implies that the value is the same for all types of I/O which is not the case. Hasn't been for a long time. See discard, write same, etc. blk_rq_get_max_sectors() is passed a struct request because the answer depends on what type of request and which operation it is. With max_hw_sectors_kb we're not so lucky. We could assume read/write but it would break existing apps that check it for SG_IO or to set tape block size. And they do exist. Jens> Which, coincidentally, is what that file already is for most other Jens> devices. Because none of the other protocols happen to have device limits. Jens> It'd make more sense to bring SCSI into this realm, instead of Jens> imposing this weird world view on others. Hey, I didn't invent the SCSI device constraint. I think it sucks. But it is a hard limit and we absolutely have to enforce it. And we can't lower the hw limit willy-nilly because it breaks non-REQ_TYPE_FS requests. As a result, max_dev_sectors is only used by sd and it's only used to gate max_sectors_kb for filesystem requests. I would have happily kept it internal to SCSI but users need to be able to set max_sectors_kb through sysfs. And consequently I need to be able to clamp on it there. I'm happy to clean things up if you have a better idea how to enforce the device limit without breaking existing apps. It's possible that with the arbitrary bio size changes some of the original pain points have gone away. Will take a look. But implementation aesthetics aside, Andy's problem still persists. Because at least in the context of SCSI the answer to his question is "it depends" :(
--- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1196,7 +1196,7 @@ static inline unsigned int queue_max_sectors(struct request_queue *q) static inline unsigned int queue_max_hw_sectors(struct request_queue *q) { - return q->limits.max_hw_sectors; + return min_not_zero(q->limits.max_hw_sectors, q->limits.max_dev_sectors); }