diff mbox series

[RFC] bdev: use bdev_io_min() for statx DIO min IO

Message ID 20240628212350.3577766-1-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series [RFC] bdev: use bdev_io_min() for statx DIO min IO | expand

Commit Message

Luis Chamberlain June 28, 2024, 9:23 p.m. UTC
We currently rely on the block device logical block size for the
offset alignment. While this *works* it doesn't work with performance
in mind. That's exactly what the minimum_io_size attribute is for.

This would for example enhance performance for DIO on 4k IU drives which
have for example an LBA format of 512 bytes for both HDDs and NVMe.
Another use case is to ensure that DIO will be used with 16k IOs on
existing market 16k IU drives with an LBA format of 4k or 512 bytes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig June 29, 2024, 6:25 a.m. UTC | #1
On Fri, Jun 28, 2024 at 02:23:50PM -0700, Luis Chamberlain wrote:
> We currently rely on the block device logical block size for the
> offset alignment. While this *works* it doesn't work with performance
> in mind. That's exactly what the minimum_io_size attribute is for.
> 
> This would for example enhance performance for DIO on 4k IU drives which
> have for example an LBA format of 512 bytes for both HDDs and NVMe.
> Another use case is to ensure that DIO will be used with 16k IOs on
> existing market 16k IU drives with an LBA format of 4k or 512 bytes.

The minimum_io_size clearly is the minimum I/O size, not the minimal
nice to have one.  Changing this will break existing setups.
Luis Chamberlain June 30, 2024, 3:24 a.m. UTC | #2
On Fri, Jun 28, 2024 at 11:25:34PM -0700, Christoph Hellwig wrote:
> On Fri, Jun 28, 2024 at 02:23:50PM -0700, Luis Chamberlain wrote:
> > We currently rely on the block device logical block size for the
> > offset alignment. While this *works* it doesn't work with performance
> > in mind. That's exactly what the minimum_io_size attribute is for.
> > 
> > This would for example enhance performance for DIO on 4k IU drives which
> > have for example an LBA format of 512 bytes for both HDDs and NVMe.
> > Another use case is to ensure that DIO will be used with 16k IOs on
> > existing market 16k IU drives with an LBA format of 4k or 512 bytes.
> 
> The minimum_io_size clearly is the minimum I/O size, not the minimal
> nice to have one. 

I may have misread the below documentation then, because it seems to
suggest this is a performance parameter, not a real minimum. Do we need
to update it?

What:           /sys/block/<disk>/queue/minimum_io_size
Date:           April 2009
Contact:        Martin K. Petersen <martin.petersen@oracle.com>
Description:
                [RO] Storage devices may report a granularity or preferred
                minimum I/O size which is the smallest request the device can
                perform without incurring a performance penalty.  For disk
                drives this is often the physical block size.  For RAID arrays
                it is often the stripe chunk size.  A properly aligned multiple
                of minimum_io_size is the preferred request size for workloads
                where a high number of I/O operations is desired.

If this is not the right place, do we need to use a new topology entry
for the IU? Today the NVMe drive uses it for the NPWG which these days
is the IU.

> Changing this will break existing setups.

My impression was that STATX_DIOALIGN was rather new, so any new
enhancements due to the difficulties in getting DIO alignment right,
this was the right place to do so.

Let me  know if you have other suggestions.

  Luis
Christoph Hellwig June 30, 2024, 5:54 a.m. UTC | #3
On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote:
> > The minimum_io_size clearly is the minimum I/O size, not the minimal
> > nice to have one. 
> 
> I may have misread the below documentation then, because it seems to
> suggest this is a performance parameter, not a real minimum. Do we need
> to update it?

queue_limits.min_io is corretly described and a performance hint.
The statx dio_offset_align is actual minimum I/O size and alignment and
not in any way related to the performance hint in minimum_io_size.
Luis Chamberlain June 30, 2024, 8:42 p.m. UTC | #4
On Sat, Jun 29, 2024 at 10:54:19PM -0700, Christoph Hellwig wrote:
> On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote:
> > > The minimum_io_size clearly is the minimum I/O size, not the minimal
> > > nice to have one. 
> > 
> > I may have misread the below documentation then, because it seems to
> > suggest this is a performance parameter, not a real minimum. Do we need
> > to update it?
> 
> queue_limits.min_io is corretly described and a performance hint.

OK, great!

> The statx dio_offset_align is actual minimum I/O size and alignment and
> not in any way related to the performance hint in minimum_io_size.

Oh, darn, I just read again 825cf206ed510 ("statx: add direct I/O
alignment information") and the block layer change through commit
2d985f8c6b91b ("vfs: support STATX_DIOALIGN on block devices") and
no where do I see any mention of it being  a min. Should we clarify
that?

And should we add a respective value for performance? I suspect
userspace will want to work with optimal values, not ones which could
for instance incur read-modify-write. Altough we have BLKIOMIN to get
the optimal performance min IO and BLKIOOPT to get the optimal size it
is not terribly clear to me that users know they should prefer to align
to BLKIOMIN and use that for an DIO size for writes when possible.

  Luis
Dave Chinner June 30, 2024, 10:35 p.m. UTC | #5
On Sun, Jun 30, 2024 at 01:42:45PM -0700, Luis Chamberlain wrote:
> On Sat, Jun 29, 2024 at 10:54:19PM -0700, Christoph Hellwig wrote:
> > On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote:
> > > > The minimum_io_size clearly is the minimum I/O size, not the minimal
> > > > nice to have one. 
> > > 
> > > I may have misread the below documentation then, because it seems to
> > > suggest this is a performance parameter, not a real minimum. Do we need
> > > to update it?
> > 
> > queue_limits.min_io is corretly described and a performance hint.
> 
> OK, great!
> 
> > The statx dio_offset_align is actual minimum I/O size and alignment and
> > not in any way related to the performance hint in minimum_io_size.
> 
> Oh, darn, I just read again 825cf206ed510 ("statx: add direct I/O
> alignment information") and the block layer change through commit
> 2d985f8c6b91b ("vfs: support STATX_DIOALIGN on block devices") and
> no where do I see any mention of it being  a min. Should we clarify
> that?

dio_offset_align is an _alignment_ parameter, not a "size"
parameter. In no way does it define either the absolute or "best
performance" minimum IO size - it just defines the minimum valid
alignment for the file offset that the filesystem/device supports.

It is implied that an IO of dio_offset_align bytes in length will be
supported, because that is the minimum length IO that meets the
offset alignment requirements defined by dio_offset_align. 

> And should we add a respective value for performance?

We could, but we already have:

	__u32 stx_blksize;     /* Block size for filesystem I/O */

which is defined as:

	stx_blksize
	      The "preferred" block size for efficient filesystem
	      I/O.  (Writing to a file in smaller chunks may cause
	      an inefficient read-modify-rewrite.)

> I suspect
> userspace will want to work with optimal values, not ones which
> could for instance incur read-modify-write.

Yup, that's the very definition of what stx_blksize should contain.

-Dave.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 1b4af2cc3b1e..5d0874aa8661 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1282,7 +1282,7 @@  void bdev_statx(struct inode *backing_inode, struct kstat *stat,
 
 	if (request_mask & STATX_DIOALIGN) {
 		stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
-		stat->dio_offset_align = bdev_logical_block_size(bdev);
+		stat->dio_offset_align = (unsigned int) bdev_io_min(bdev);
 		stat->result_mask |= STATX_DIOALIGN;
 	}