diff mbox series

block: try to make aligned bio in case of big chunk IO

Message ID 20231107100140.2084870-1-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: try to make aligned bio in case of big chunk IO | expand

Commit Message

Ming Lei Nov. 7, 2023, 10:01 a.m. UTC
In case of big chunk sequential IO, bio's size is often not aligned with
this queue's max request size because of multipage bvec, then small sized
bio can be made by bio split, so try to align bio with max io size if
it isn't the last one.

Ed Tsai reported this way improves 64MB read/write by > 15%~25% in
Antutu V10 Storage Test.

Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Closes: https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

Comments

Christoph Hellwig Nov. 8, 2023, 7:28 a.m. UTC | #1
On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote:
> In case of big chunk sequential IO, bio's size is often not aligned with
> this queue's max request size because of multipage bvec, then small sized
> bio can be made by bio split, so try to align bio with max io size if
> it isn't the last one.

I have a hard time parsing this long sentence.

> +	/*
> +	 * If we still have data and bio is full, this bio size may not be
> +	 * aligned with max io size, small bio can be caused by split, try
> +	 * to avoid this situation by aligning bio with max io size.
> +	 *
> +	 * Big chunk of sequential IO workload can benefit from this way.
> +	 */
> +	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
> +			bio_op(bio) != REQ_OP_ZONE_APPEND) {
> +		unsigned trim = bio_align_with_io_size(bio);

Besides the fact that bi_bdev should always be set, this really does
thing backwards.  Instead of rewding things which is really
expensive don't add it in the first place.
Ed Tsai (蔡宗軒) Nov. 8, 2023, 7:36 a.m. UTC | #2
On Tue, 2023-11-07 at 18:01 +0800, Ming Lei wrote:
> In case of big chunk sequential IO, bio's size is often not aligned
> with
> this queue's max request size because of multipage bvec, then small
> sized
> bio can be made by bio split, so try to align bio with max io size if
> it isn't the last one.
> 
> Ed Tsai reported this way improves 64MB read/write by > 15%~25% in
> Antutu V10 Storage Test.
> 
> Reported-by: Ed Tsai <ed.tsai@mediatek.com>
> Closes: 
> https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c | 57
> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 816d412c06e9..749b6283dab9 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1294,6 +1294,47 @@ static int __bio_iov_iter_get_pages(struct bio
> *bio, struct iov_iter *iter)
>  	return ret;
>  }
>  
> +/* should only be called before submission */
> +static void bio_shrink(struct bio *bio, unsigned bytes)
> +{
> +	unsigned int size = bio->bi_iter.bi_size;
> +	int idx;
> +
> +	if (bytes >= size)
> +		return;
> +
> +	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
> +
> +	idx = bio->bi_vcnt - 1;
> +	bio->bi_iter.bi_size -= bytes;
> +	while (bytes > 0) {
> +		struct bio_vec *bv = &bio->bi_io_vec[idx];
> +		unsigned int len = min_t(unsigned, bv->bv_len, bytes);
> +
> +		bytes -= len;
> +		bv->bv_len -= len;
> +		if (!bv->bv_len) {
> +			bio_release_page(bio, bv->bv_page);
> +			idx--;
> +		}
> +	}
> +	WARN_ON_ONCE(idx < 0);
> +	bio->bi_vcnt = idx + 1;
> +}
> +
> +static unsigned bio_align_with_io_size(struct bio *bio)
> +{
> +	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> +	unsigned int size = bio->bi_iter.bi_size;
> +	unsigned int trim = size & ((queue_max_sectors(q) << 9) - 1);
> +
> +	if (trim && trim != size) {
> +		bio_shrink(bio, trim);
> +		return trim;
> +	}
> +	return 0;
> +}
> +
>  /**
>   * bio_iov_iter_get_pages - add user or kernel pages to a bio
>   * @bio: bio to add pages to
> @@ -1333,6 +1374,22 @@ int bio_iov_iter_get_pages(struct bio *bio,
> struct iov_iter *iter)
>  		ret = __bio_iov_iter_get_pages(bio, iter);
>  	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
>  
> +
> +	/*
> +	 * If we still have data and bio is full, this bio size may not
> be
> +	 * aligned with max io size, small bio can be caused by split,
> try
> +	 * to avoid this situation by aligning bio with max io size.
> +	 *
> +	 * Big chunk of sequential IO workload can benefit from this
> way.
> +	 */
> +	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
> +			bio_op(bio) != REQ_OP_ZONE_APPEND) {
> +		unsigned trim = bio_align_with_io_size(bio);
> +
> +		if (trim)
> +			iov_iter_revert(iter, trim);
> +	}
> +
>  	return bio->bi_vcnt ? 0 : ret;
>  }
>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> -- 
> 2.41.0

Thanks. This looks good to me.

Acked-by: Ed Tsai <ed.tsai@mediatek.com>

--
Best Regards,
Ed Tsai
Ming Lei Nov. 8, 2023, 7:58 a.m. UTC | #3
On Tue, Nov 07, 2023 at 11:28:30PM -0800, Christoph Hellwig wrote:
> On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote:
> > In case of big chunk sequential IO, bio's size is often not aligned with
> > this queue's max request size because of multipage bvec, then small sized
> > bio can be made by bio split, so try to align bio with max io size if
> > it isn't the last one.
> 
> I have a hard time parsing this long sentence.

It covers the reasons(multipage bvec, bio split, big sequential IO) and solution(
align bio), or any suggestion to simplify it further?

> 
> > +	/*
> > +	 * If we still have data and bio is full, this bio size may not be
> > +	 * aligned with max io size, small bio can be caused by split, try
> > +	 * to avoid this situation by aligning bio with max io size.
> > +	 *
> > +	 * Big chunk of sequential IO workload can benefit from this way.
> > +	 */
> > +	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
> > +			bio_op(bio) != REQ_OP_ZONE_APPEND) {
> > +		unsigned trim = bio_align_with_io_size(bio);
> 
> Besides the fact that bi_bdev should always be set, this really does
> thing backwards.

Looks it is true after your patch passes bdev to bio_alloc*(), but
bio_add_page() is just fine without bio->bi_bdev.

Also this way follows the check for aligning with block size.

But seems safe to remove the check.

> Instead of rewding things which is really
> expensive don't add it in the first place.

The rewind may not be avoided because iov_iter_extract_pages() can often
return less pages than requested. Also rewind is only done after the bio
becomes full(at least 1MB bytes are added) and there is still data left,
so it shouldn't be too expensive.

I will provide 'size' hint to iov_iter_extract_pages() to try to make
it aligned from the beginning in next version, but rewind may not be
avoided.


Thanks,
Ming
Ed Tsai (蔡宗軒) Nov. 8, 2023, 8:22 a.m. UTC | #4
On Tue, 2023-11-07 at 23:28 -0800, Christoph Hellwig wrote:
> you have verified the sender or the content.
>  On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote:
> > In case of big chunk sequential IO, bio's size is often not aligned
> with
> > this queue's max request size because of multipage bvec, then small
> sized
> > bio can be made by bio split, so try to align bio with max io size
> if
> > it isn't the last one.
> 
> I have a hard time parsing this long sentence.
> 
> > +/*
> > + * If we still have data and bio is full, this bio size may not be
> > + * aligned with max io size, small bio can be caused by split, try
> > + * to avoid this situation by aligning bio with max io size.
> > + *
> > + * Big chunk of sequential IO workload can benefit from this way.
> > + */
> > +if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
> > +bio_op(bio) != REQ_OP_ZONE_APPEND) {
> > +unsigned trim = bio_align_with_io_size(bio);
> 
> Besides the fact that bi_bdev should always be set, this really does
> thing backwards.  Instead of rewding things which is really
> expensive don't add it in the first place.
> 

We cannot predict the number of pages that can be filled in a bio as it
depends on the physical layout of memory buffer. The only option is to
limit the bio to the queue limit.

Even if I fill the bio with a large amount of pages, it will still be
split based on the queue limit at block layer. Therefore, I believe it
is appropriate to limit the bio size in this case.

We can enforce this limitation before extracting the page, which would
elimiate the need for iov_iter_revert.

--
Best Regards,
Ed Tsai
Ed Tsai (蔡宗軒) Nov. 8, 2023, 8:29 a.m. UTC | #5
On Wed, 2023-11-08 at 15:58 +0800, Ming Lei wrote:
>  On Tue, Nov 07, 2023 at 11:28:30PM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote:
> > > In case of big chunk sequential IO, bio's size is often not
> aligned with
> > > this queue's max request size because of multipage bvec, then
> small sized
> > > bio can be made by bio split, so try to align bio with max io
> size if
> > > it isn't the last one.
> > 
> > I have a hard time parsing this long sentence.
> 
> It covers the reasons(multipage bvec, bio split, big sequential IO)
> and solution(
> align bio), or any suggestion to simplify it further?
> 
> > 
> > > +/*
> > > + * If we still have data and bio is full, this bio size may not
> be
> > > + * aligned with max io size, small bio can be caused by split,
> try
> > > + * to avoid this situation by aligning bio with max io size.
> > > + *
> > > + * Big chunk of sequential IO workload can benefit from this
> way.
> > > + */
> > > +if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
> > > +bio_op(bio) != REQ_OP_ZONE_APPEND) {
> > > +unsigned trim = bio_align_with_io_size(bio);
> > 
> > Besides the fact that bi_bdev should always be set, this really
> does
> > thing backwards.
> 
> Looks it is true after your patch passes bdev to bio_alloc*(), but
> bio_add_page() is just fine without bio->bi_bdev.
> 
> Also this way follows the check for aligning with block size.
> 
> But seems safe to remove the check.
> 
> > Instead of rewding things which is really
> > expensive don't add it in the first place.
> 
> The rewind may not be avoided because iov_iter_extract_pages() can
> often
> return less pages than requested. Also rewind is only done after the
> bio
> becomes full(at least 1MB bytes are added) and there is still data
> left,
> so it shouldn't be too expensive.
> 
> I will provide 'size' hint to iov_iter_extract_pages() to try to make
> it aligned from the beginning in next version, but rewind may not be
> avoided.
> 
> 
> Thanks,
> Ming

We cannot predict the number of pages that can be filled in a bio as it
depends on the physical layout of memory buffer. The only option is to
limit the bio to the queue limit.

Even if I fill the bio with a large amount of pages, it will still be
split based on the queue limit at block layer. Therefore, I believe it
is appropriate to limit the bio size in this case.

We can enforce this limitation before extracting the page, which would
elimiate the need for iov_iter_revert.

Hi Christoph, what stage do you think would be better to impose the
limitaion?

--
Best Regards,
Ed Tsai

>
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..749b6283dab9 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1294,6 +1294,47 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return ret;
 }
 
+/* should only be called before submission */
+static void bio_shrink(struct bio *bio, unsigned bytes)
+{
+	unsigned int size = bio->bi_iter.bi_size;
+	int idx;
+
+	if (bytes >= size)
+		return;
+
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+	idx = bio->bi_vcnt - 1;
+	bio->bi_iter.bi_size -= bytes;
+	while (bytes > 0) {
+		struct bio_vec *bv = &bio->bi_io_vec[idx];
+		unsigned int len = min_t(unsigned, bv->bv_len, bytes);
+
+		bytes -= len;
+		bv->bv_len -= len;
+		if (!bv->bv_len) {
+			bio_release_page(bio, bv->bv_page);
+			idx--;
+		}
+	}
+	WARN_ON_ONCE(idx < 0);
+	bio->bi_vcnt = idx + 1;
+}
+
+static unsigned bio_align_with_io_size(struct bio *bio)
+{
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	unsigned int size = bio->bi_iter.bi_size;
+	unsigned int trim = size & ((queue_max_sectors(q) << 9) - 1);
+
+	if (trim && trim != size) {
+		bio_shrink(bio, trim);
+		return trim;
+	}
+	return 0;
+}
+
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1333,6 +1374,22 @@  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
+
+	/*
+	 * If we still have data and bio is full, this bio size may not be
+	 * aligned with max io size, small bio can be caused by split, try
+	 * to avoid this situation by aligning bio with max io size.
+	 *
+	 * Big chunk of sequential IO workload can benefit from this way.
+	 */
+	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
+			bio_op(bio) != REQ_OP_ZONE_APPEND) {
+		unsigned trim = bio_align_with_io_size(bio);
+
+		if (trim)
+			iov_iter_revert(iter, trim);
+	}
+
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);