diff mbox series

[02/21] block: Limit atomic writes according to bio and queue limits

Message ID 20230929102726.2985188-3-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series block atomic writes | expand

Commit Message

John Garry Sept. 29, 2023, 10:27 a.m. UTC
We rely the block layer always being able to send a bio of size
atomic_write_unit_max without being required to split it due to request
queue or other bio limits.

A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
and each vector is at worst case the device logical block size from
direct IO alignment requirement.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 block/blk-settings.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Nov. 9, 2023, 3:13 p.m. UTC | #1
On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote:
> We rely the block layer always being able to send a bio of size
> atomic_write_unit_max without being required to split it due to request
> queue or other bio limits.
> 
> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
> and each vector is at worst case the device logical block size from
> direct IO alignment requirement.

A bio can have more than BIO_MAX_VECS if you use bio_init.

> +static unsigned int blk_queue_max_guaranteed_bio_size_sectors(
> +					struct request_queue *q)
> +{
> +	struct queue_limits *limits = &q->limits;
> +	unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS,
> +					limits->max_segments);
> +	/*  Limit according to dev sector size as we only support direct-io */

Who is "we", and how tells the caller to only ever use direct I/O?
And how would a type of userspace I/O even matter for low-level
block code.  What if I wanted to use this for file system metadata?
John Garry Nov. 9, 2023, 5:41 p.m. UTC | #2
On 09/11/2023 15:13, Christoph Hellwig wrote:
> On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote:
>> We rely the block layer always being able to send a bio of size
>> atomic_write_unit_max without being required to split it due to request
>> queue or other bio limits.
>>
>> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
>> and each vector is at worst case the device logical block size from
>> direct IO alignment requirement.
> A bio can have more than BIO_MAX_VECS if you use bio_init.

Right, FWIW we are only concerned with codepaths which use BIO_MAX_VECS, 
but I suppose that is not good enough as a guarantee.

> 
>> +static unsigned int blk_queue_max_guaranteed_bio_size_sectors(
>> +					struct request_queue *q)
>> +{
>> +	struct queue_limits *limits = &q->limits;
>> +	unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS,
>> +					limits->max_segments);
>> +	/*  Limit according to dev sector size as we only support direct-io */
> Who is "we", and how tells the caller to only ever use direct I/O?

I think that this can be dropped as a comment. My earlier series used 
PAGE_SIZE and not sector size here, which I think was proper.

> And how would a type of userspace I/O even matter for low-level
> block code.

It shouldn't do, but we still need to limit according to request queue 
limits.

>  What if I wanted to use this for file system metadata?
> 

As mentioned, I think that the direct-IO comment can be dropped.

Thanks,
John
Ming Lei Dec. 4, 2023, 3:19 a.m. UTC | #3
On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote:
> We rely the block layer always being able to send a bio of size
> atomic_write_unit_max without being required to split it due to request
> queue or other bio limits.
> 
> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
> and each vector is at worst case the device logical block size from
> direct IO alignment requirement.

Both unit_max and unit_min are applied to FS bio, which is built over
single userspace buffer, so only the 1st and last vector can include
partial page, and the other vectors should always cover whole page,
then the minimal size could be:

	(max_segments - 2) * PAGE_SIZE + 2 * queue_logical_block_size(q)


Thanks,
Ming
Ming Lei Dec. 4, 2023, 3:55 a.m. UTC | #4
On Mon, Dec 04, 2023 at 11:19:20AM +0800, Ming Lei wrote:
> On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote:
> > We rely the block layer always being able to send a bio of size
> > atomic_write_unit_max without being required to split it due to request
> > queue or other bio limits.
> > 
> > A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
> > and each vector is at worst case the device logical block size from
> > direct IO alignment requirement.
> 
> Both unit_max and unit_min are applied to FS bio, which is built over
> single userspace buffer, so only the 1st and last vector can include

Actually it isn't true for pwritev, and sorry for the noise.

Thanks,
Ming
John Garry Dec. 4, 2023, 9:35 a.m. UTC | #5
On 04/12/2023 03:55, Ming Lei wrote:

Hi Ming,

> On Mon, Dec 04, 2023 at 11:19:20AM +0800, Ming Lei wrote:
>> On Fri, Sep 29, 2023 at 10:27:07AM +0000, John Garry wrote:
>>> We rely the block layer always being able to send a bio of size
>>> atomic_write_unit_max without being required to split it due to request
>>> queue or other bio limits.
>>>
>>> A bio may contain min(BIO_MAX_VECS, limits->max_segments) vectors,
>>> and each vector is at worst case the device logical block size from
>>> direct IO alignment requirement.
>> Both unit_max and unit_min are applied to FS bio, which is built over
>> single userspace buffer, so only the 1st and last vector can include
> Actually it isn't true for pwritev, and sorry for the noise.

Yeah, I think that it should be:

(max_segments - 2) * PAGE_SIZE

And we need to enforce that any middle vectors are PAGE-aligned.

Thanks,
John
diff mbox series

Patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index d151be394c98..57d487a00c64 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -213,6 +213,18 @@  void blk_queue_atomic_write_boundary_bytes(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_atomic_write_boundary_bytes);
 
+static unsigned int blk_queue_max_guaranteed_bio_size_sectors(
+					struct request_queue *q)
+{
+	struct queue_limits *limits = &q->limits;
+	unsigned int max_segments = min_t(unsigned int, BIO_MAX_VECS,
+					limits->max_segments);
+	/*  Limit according to dev sector size as we only support direct-io */
+	unsigned int limit = max_segments * queue_logical_block_size(q);
+
+	return rounddown_pow_of_two(limit >> SECTOR_SHIFT);
+}
+
 /**
  * blk_queue_atomic_write_unit_min_sectors - smallest unit that can be written
  * atomically to the device.
@@ -223,8 +235,10 @@  void blk_queue_atomic_write_unit_min_sectors(struct request_queue *q,
 					     unsigned int sectors)
 {
 	struct queue_limits *limits = &q->limits;
+	unsigned int guaranteed_sectors =
+		blk_queue_max_guaranteed_bio_size_sectors(q);
 
-	limits->atomic_write_unit_min_sectors = sectors;
+	limits->atomic_write_unit_min_sectors = min(guaranteed_sectors, sectors);
 }
 EXPORT_SYMBOL(blk_queue_atomic_write_unit_min_sectors);
 
@@ -238,8 +252,10 @@  void blk_queue_atomic_write_unit_max_sectors(struct request_queue *q,
 					     unsigned int sectors)
 {
 	struct queue_limits *limits = &q->limits;
+	unsigned int guaranteed_sectors =
+		blk_queue_max_guaranteed_bio_size_sectors(q);
 
-	limits->atomic_write_unit_max_sectors = sectors;
+	limits->atomic_write_unit_max_sectors = min(guaranteed_sectors, sectors);
 }
 EXPORT_SYMBOL(blk_queue_atomic_write_unit_max_sectors);