diff mbox series

[v3,02/15] block: Limit atomic writes according to bio and queue limits

Message ID 20240124113841.31824-3-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series block atomic writes | expand

Commit Message

John Garry Jan. 24, 2024, 11:38 a.m. UTC
We rely the block layer always being able to send a bio of size
atomic_write_unit_max without being required to split it due to request
queue or other bio limits.

A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors on the
relevant submission paths for atomic writes and each vector contains at
least a PAGE_SIZE, apart from the first and last vectors.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Comments

Ritesh Harjani (IBM) Feb. 13, 2024, 4:33 a.m. UTC | #1
John Garry <john.g.garry@oracle.com> writes:

> We rely the block layer always being able to send a bio of size
> atomic_write_unit_max without being required to split it due to request
> queue or other bio limits.
>
> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors on the
> relevant submission paths for atomic writes and each vector contains at
> least a PAGE_SIZE, apart from the first and last vectors.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/blk-settings.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 11c0361c2313..176f26374abc 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -108,18 +108,42 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
>  }
>  EXPORT_SYMBOL(blk_queue_bounce_limit);
>  
> +
> +/*
> + * Returns max guaranteed sectors which we can fit in a bio. For convenience of
> + * users, rounddown_pow_of_two() the return value.
> + *
> + * We always assume that we can fit in at least PAGE_SIZE in a segment, apart
> + * from first and last segments.
> + */

It took sometime to really understand what is special about the first
and the last vector. Looks like what we are discussing here is the
I/O covering a partial page, i.e. the starting offset and the end
boundary might not cover the whole page. 

It still isn't very clear that why do we need to consider
queue_logical_block_size(q) and not the PAGE_SIZE for those 2 vectors
(1. given atomic writes starting offset and length has alignment
restrictions? 
2. So maybe there are devices with starting offset alignment
to be as low as sector size?)

But either ways, my point is it would be good to have a comment above
this function to help understand what is going on here. 


> +static unsigned int blk_queue_max_guaranteed_bio_sectors(
> +					struct queue_limits *limits,
> +					struct request_queue *q)
> +{
> +	unsigned int max_segments = min(BIO_MAX_VECS, limits->max_segments);
> +	unsigned int length;
> +
> +	length = min(max_segments, 2) * queue_logical_block_size(q);
> +	if (max_segments > 2)
> +		length += (max_segments - 2) * PAGE_SIZE;
> +
> +	return rounddown_pow_of_two(length >> SECTOR_SHIFT);
> +}
> +
>  static void blk_atomic_writes_update_limits(struct request_queue *q)
>  {
>  	struct queue_limits *limits = &q->limits;
>  	unsigned int max_hw_sectors =
>  		rounddown_pow_of_two(limits->max_hw_sectors);
> +	unsigned int unit_limit = min(max_hw_sectors,
> +		blk_queue_max_guaranteed_bio_sectors(limits, q));
>  
>  	limits->atomic_write_max_sectors =
>  		min(limits->atomic_write_hw_max_sectors, max_hw_sectors);
>  	limits->atomic_write_unit_min_sectors =
> -		min(limits->atomic_write_hw_unit_min_sectors, max_hw_sectors);
> +		min(limits->atomic_write_hw_unit_min_sectors, unit_limit);
>  	limits->atomic_write_unit_max_sectors =
> -		min(limits->atomic_write_hw_unit_max_sectors, max_hw_sectors);
> +		min(limits->atomic_write_hw_unit_max_sectors, unit_limit);
>  }
>  
>  /**
> -- 
> 2.31.1
John Garry Feb. 13, 2024, 8:05 a.m. UTC | #2
On 13/02/2024 04:33, Ritesh Harjani (IBM) wrote:
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 11c0361c2313..176f26374abc 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -108,18 +108,42 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
>>   }
>>   EXPORT_SYMBOL(blk_queue_bounce_limit);
>>   
>> +
>> +/*
>> + * Returns max guaranteed sectors which we can fit in a bio. For convenience of
>> + * users, rounddown_pow_of_two() the return value.
>> + *
>> + * We always assume that we can fit in at least PAGE_SIZE in a segment, apart
>> + * from first and last segments.
>> + */
> It took sometime to really understand what is special about the first
> and the last vector. Looks like what we are discussing here is the
> I/O covering a partial page, i.e. the starting offset and the end
> boundary might not cover the whole page.
> 
> It still isn't very clear that why do we need to consider
> queue_logical_block_size(q) and not the PAGE_SIZE for those 2 vectors
> (1. given atomic writes starting offset and length has alignment
> restrictions?
We are using the direct IO alignment restriction, and that is the iovecs 
need to be bdev logical block size aligned - please see 
bdev_iter_is_aligned().

We are also supporting a single iovec currently. As such, the middle 
bvecs will always contain at least PAGE_SIZE, and the first/last must 
have at least LBS data.

Note that we will want to support atomic writes in future for buffered 
IO, but it would be sensible to keep this direct IO alignment 
restriction there as well.

Let me know if this needs to be made clearer in the code/commit message.

Thanks,
John
diff mbox series

Patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 11c0361c2313..176f26374abc 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -108,18 +108,42 @@  void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
 }
 EXPORT_SYMBOL(blk_queue_bounce_limit);
 
+
+/*
+ * Returns max guaranteed sectors which we can fit in a bio. For convenience of
+ * users, rounddown_pow_of_two() the return value.
+ *
+ * We always assume that we can fit in at least PAGE_SIZE in a segment, apart
+ * from first and last segments.
+ */
+static unsigned int blk_queue_max_guaranteed_bio_sectors(
+					struct queue_limits *limits,
+					struct request_queue *q)
+{
+	unsigned int max_segments = min(BIO_MAX_VECS, limits->max_segments);
+	unsigned int length;
+
+	length = min(max_segments, 2) * queue_logical_block_size(q);
+	if (max_segments > 2)
+		length += (max_segments - 2) * PAGE_SIZE;
+
+	return rounddown_pow_of_two(length >> SECTOR_SHIFT);
+}
+
 static void blk_atomic_writes_update_limits(struct request_queue *q)
 {
 	struct queue_limits *limits = &q->limits;
 	unsigned int max_hw_sectors =
 		rounddown_pow_of_two(limits->max_hw_sectors);
+	unsigned int unit_limit = min(max_hw_sectors,
+		blk_queue_max_guaranteed_bio_sectors(limits, q));
 
 	limits->atomic_write_max_sectors =
 		min(limits->atomic_write_hw_max_sectors, max_hw_sectors);
 	limits->atomic_write_unit_min_sectors =
-		min(limits->atomic_write_hw_unit_min_sectors, max_hw_sectors);
+		min(limits->atomic_write_hw_unit_min_sectors, unit_limit);
 	limits->atomic_write_unit_max_sectors =
-		min(limits->atomic_write_hw_unit_max_sectors, max_hw_sectors);
+		min(limits->atomic_write_hw_unit_max_sectors, unit_limit);
 }
 
 /**