Message ID | 20240124113841.31824-3-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block atomic writes | expand |
John Garry <john.g.garry@oracle.com> writes: > We rely the block layer always being able to send a bio of size > atomic_write_unit_max without being required to split it due to request > queue or other bio limits. > > A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors on the > relevant submission paths for atomic writes and each vector contains at > least a PAGE_SIZE, apart from the first and last vectors. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > block/blk-settings.c | 28 ++++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 11c0361c2313..176f26374abc 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -108,18 +108,42 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce) > } > EXPORT_SYMBOL(blk_queue_bounce_limit); > > + > +/* > + * Returns max guaranteed sectors which we can fit in a bio. For convenience of > + * users, rounddown_pow_of_two() the return value. > + * > + * We always assume that we can fit in at least PAGE_SIZE in a segment, apart > + * from first and last segments. > + */ It took sometime to really understand what is special about the first and the last vector. Looks like what we are discussing here is the I/O covering a partial page, i.e. the starting offset and the end boundary might not cover the whole page. It still isn't very clear that why do we need to consider queue_logical_block_size(q) and not the PAGE_SIZE for those 2 vectors (1. given atomic writes starting offset and length has alignment restrictions? 2. So maybe there are devices with starting offset alignment to be as low as sector size?) But either ways, my point is it would be good to have a comment above this function to help understand what is going on here. > +static unsigned int blk_queue_max_guaranteed_bio_sectors( > + struct queue_limits *limits, > + struct request_queue *q) > +{ > + unsigned int max_segments = min(BIO_MAX_VECS, limits->max_segments); > + unsigned int length; > + > + length = min(max_segments, 2) * queue_logical_block_size(q); > + if (max_segments > 2) > + length += (max_segments - 2) * PAGE_SIZE; > + > + return rounddown_pow_of_two(length >> SECTOR_SHIFT); > +} > + > static void blk_atomic_writes_update_limits(struct request_queue *q) > { > struct queue_limits *limits = &q->limits; > unsigned int max_hw_sectors = > rounddown_pow_of_two(limits->max_hw_sectors); > + unsigned int unit_limit = min(max_hw_sectors, > + blk_queue_max_guaranteed_bio_sectors(limits, q)); > > limits->atomic_write_max_sectors = > min(limits->atomic_write_hw_max_sectors, max_hw_sectors); > limits->atomic_write_unit_min_sectors = > - min(limits->atomic_write_hw_unit_min_sectors, max_hw_sectors); > + min(limits->atomic_write_hw_unit_min_sectors, unit_limit); > limits->atomic_write_unit_max_sectors = > - min(limits->atomic_write_hw_unit_max_sectors, max_hw_sectors); > + min(limits->atomic_write_hw_unit_max_sectors, unit_limit); > } > > /** > -- > 2.31.1
On 13/02/2024 04:33, Ritesh Harjani (IBM) wrote: >> diff --git a/block/blk-settings.c b/block/blk-settings.c >> index 11c0361c2313..176f26374abc 100644 >> --- a/block/blk-settings.c >> +++ b/block/blk-settings.c >> @@ -108,18 +108,42 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce) >> } >> EXPORT_SYMBOL(blk_queue_bounce_limit); >> >> + >> +/* >> + * Returns max guaranteed sectors which we can fit in a bio. For convenience of >> + * users, rounddown_pow_of_two() the return value. >> + * >> + * We always assume that we can fit in at least PAGE_SIZE in a segment, apart >> + * from first and last segments. >> + */ > It took sometime to really understand what is special about the first > and the last vector. Looks like what we are discussing here is the > I/O covering a partial page, i.e. the starting offset and the end > boundary might not cover the whole page. > > It still isn't very clear that why do we need to consider > queue_logical_block_size(q) and not the PAGE_SIZE for those 2 vectors > (1. given atomic writes starting offset and length has alignment > restrictions? We are using the direct IO alignment restriction, and that is the iovecs need to be bdev logical block size aligned - please see bdev_iter_is_aligned(). We are also supporting a single iovec currently. As such, the middle bvecs will always contain at least PAGE_SIZE, and the first/last must have at least LBS data. Note that we will want to support atomic writes in future for buffered IO, but it would be sensible to keep this direct IO alignment restriction there as well. Let me know if this needs to be made clearer in the code/commit message. Thanks, John
diff --git a/block/blk-settings.c b/block/blk-settings.c index 11c0361c2313..176f26374abc 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -108,18 +108,42 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce) } EXPORT_SYMBOL(blk_queue_bounce_limit); + +/* + * Returns max guaranteed sectors which we can fit in a bio. For convenience of + * users, rounddown_pow_of_two() the return value. + * + * We always assume that we can fit in at least PAGE_SIZE in a segment, apart + * from first and last segments. + */ +static unsigned int blk_queue_max_guaranteed_bio_sectors( + struct queue_limits *limits, + struct request_queue *q) +{ + unsigned int max_segments = min(BIO_MAX_VECS, limits->max_segments); + unsigned int length; + + length = min(max_segments, 2) * queue_logical_block_size(q); + if (max_segments > 2) + length += (max_segments - 2) * PAGE_SIZE; + + return rounddown_pow_of_two(length >> SECTOR_SHIFT); +} + static void blk_atomic_writes_update_limits(struct request_queue *q) { struct queue_limits *limits = &q->limits; unsigned int max_hw_sectors = rounddown_pow_of_two(limits->max_hw_sectors); + unsigned int unit_limit = min(max_hw_sectors, + blk_queue_max_guaranteed_bio_sectors(limits, q)); limits->atomic_write_max_sectors = min(limits->atomic_write_hw_max_sectors, max_hw_sectors); limits->atomic_write_unit_min_sectors = - min(limits->atomic_write_hw_unit_min_sectors, max_hw_sectors); + min(limits->atomic_write_hw_unit_min_sectors, unit_limit); limits->atomic_write_unit_max_sectors = - min(limits->atomic_write_hw_unit_max_sectors, max_hw_sectors); + min(limits->atomic_write_hw_unit_max_sectors, unit_limit); } /**
We rely the block layer always being able to send a bio of size atomic_write_unit_max without being required to split it due to request queue or other bio limits. A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors on the relevant submission paths for atomic writes and each vector contains at least a PAGE_SIZE, apart from the first and last vectors. Signed-off-by: John Garry <john.g.garry@oracle.com> --- block/blk-settings.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-)