Message ID | 20230929102726.2985188-3-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | block atomic writes | expand |
On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote: > We rely the block layer always being able to send a bio of size > atomic_write_unit_max without being required to split it due to request > queue or other bio limits. > > A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, > and each vector is at worst case the device logical block size from > direct IO alignment requirement. A bio can have more than BIO_MAX_VECS if you use bio_init. > +static unsigned int blk_queue_max_guaranteed_bio_size_sectors( > + struct request_queue *q) > +{ > + struct queue_limits *limits = &q->limits; > + unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS, > + limits->max_segments); > + /* Limit according to dev sector size as we only support direct-io */ Who is "we", and how tells the caller to only ever use direct I/O? And how would a type of userspace I/O even matter for low-level block code. What if I wanted to use this for file system metadata?
On 09/11/2023 15:13, Christoph Hellwig wrote: > On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote: >> We rely the block layer always being able to send a bio of size >> atomic_write_unit_max without being required to split it due to request >> queue or other bio limits. >> >> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, >> and each vector is at worst case the device logical block size from >> direct IO alignment requirement. > A bio can have more than BIO_MAX_VECS if you use bio_init. Right, FWIW we are only concerned with codepaths which use BIO_MAX_VECS, but I suppose that is not good enough as a guarantee. > >> +static unsigned int blk_queue_max_guaranteed_bio_size_sectors( >> + struct request_queue *q) >> +{ >> + struct queue_limits *limits = &q->limits; >> + unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS, >> + limits->max_segments); >> + /* Limit according to dev sector size as we only support direct-io */ > Who is "we", and how tells the caller to only ever use direct I/O? I think that this can be dropped as a comment. My earlier series used PAGE_SIZE and not sector size here, which I think was proper. > And how would a type of userspace I/O even matter for low-level > block code. It shouldn't do, but we still need to limit according to request queue limits. > What if I wanted to use this for file system metadata? > As mentioned, I think that the direct-IO comment can be dropped. Thanks, John
On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote: > We rely the block layer always being able to send a bio of size > atomic_write_unit_max without being required to split it due to request > queue or other bio limits. > > A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, > and each vector is at worst case the device logical block size from > direct IO alignment requirement. Both unit_max and unit_min are applied to FS bio, which is built over single userspace buffer, so only the 1st and last vector can include partial page, and the other vectors should always cover whole page, then the minimal size could be: (max_segments - 2) * PAGE_SIZE + 2 * queue_logical_block_size(q) Thanks, Ming
On Mon, Dec 04, 2023 at 11:19:20AM +0800, Ming Lei wrote: > On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote: > > We rely the block layer always being able to send a bio of size > > atomic_write_unit_max without being required to split it due to request > > queue or other bio limits. > > > > A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, > > and each vector is at worst case the device logical block size from > > direct IO alignment requirement. > > Both unit_max and unit_min are applied to FS bio, which is built over > single userspace buffer, so only the 1st and last vector can include Actually it isn't true for pwritev, and sorry for the noise. Thanks, Ming
On 04/12/2023 03:55, Ming Lei wrote: Hi Ming, > On Mon, Dec 04, 2023 at 11:19:20AM +0800, Ming Lei wrote: >> On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote: >>> We rely the block layer always being able to send a bio of size >>> atomic_write_unit_max without being required to split it due to request >>> queue or other bio limits. >>> >>> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, >>> and each vector is at worst case the device logical block size from >>> direct IO alignment requirement. >> Both unit_max and unit_min are applied to FS bio, which is built over >> single userspace buffer, so only the 1st and last vector can include > Actually it isn't true for pwritev, and sorry for the noise. Yeah, I think that it should be: (max_segments - 2) * PAGE_SIZE And we need to enforce that any middle vectors are PAGE-aligned. Thanks, John
diff --git a/block/blk-settings.c b/block/blk-settings.c index d151be394c98..57d487a00c64 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -213,6 +213,18 @@ void blk_queue_atomic_write_boundary_bytes(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_atomic_write_boundary_bytes); +static unsigned int blk_queue_max_guaranteed_bio_size_sectors( + struct request_queue *q) +{ + struct queue_limits *limits = &q->limits; + unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS, + limits->max_segments); + /* Limit according to dev sector size as we only support direct-io */ + unsigned int limit = max_segments * queue_logical_block_size(q); + + return rounddown_pow_of_two(limit >> SECTOR_SHIFT); +} + /** * blk_queue_atomic_write_unit_min_sectors - smallest unit that can be written * atomically to the device. @@ -223,8 +235,10 @@ void blk_queue_atomic_write_unit_min_sectors(struct request_queue *q, unsigned int sectors) { struct queue_limits *limits = &q->limits; + unsigned int guaranteed_sectors = + blk_queue_max_guaranteed_bio_size_sectors(q); - limits->atomic_write_unit_min_sectors = sectors; + limits->atomic_write_unit_min_sectors = min(guaranteed_sectors, sectors); } EXPORT_SYMBOL(blk_queue_atomic_write_unit_min_sectors); @@ -238,8 +252,10 @@ void blk_queue_atomic_write_unit_max_sectors(struct request_queue *q, unsigned int sectors) { struct queue_limits *limits = &q->limits; + unsigned int guaranteed_sectors = + blk_queue_max_guaranteed_bio_size_sectors(q); - limits->atomic_write_unit_max_sectors = sectors; + limits->atomic_write_unit_max_sectors = min(guaranteed_sectors, sectors); } EXPORT_SYMBOL(blk_queue_atomic_write_unit_max_sectors);
We rely the block layer always being able to send a bio of size atomic_write_unit_max without being required to split it due to request queue or other bio limits. A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors, and each vector is at worst case the device logical block size from direct IO alignment requirement. Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-settings.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)