Message ID | 20231107100140.2084870-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: try to make aligned bio in case of big chunk IO | expand |
On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote: > In case of big chunk sequential IO, bio's size is often not aligned with > this queue's max request size because of multipage bvec, then small sized > bio can be made by bio split, so try to align bio with max io size if > it isn't the last one. I have a hard time parsing this long sentence. > + /* > + * If we still have data and bio is full, this bio size may not be > + * aligned with max io size, small bio can be caused by split, try > + * to avoid this situation by aligning bio with max io size. > + * > + * Big chunk of sequential IO workload can benefit from this way. > + */ > + if (!ret && iov_iter_count(iter) && bio->bi_bdev && > + bio_op(bio) != REQ_OP_ZONE_APPEND) { > + unsigned trim = bio_align_with_io_size(bio); Besides the fact that bi_bdev should always be set, this really does thing backwards. Instead of rewding things which is really expensive don't add it in the first place.
On Tue, 2023-11-07 at 18:01 +0800, Ming Lei wrote: > In case of big chunk sequential IO, bio's size is often not aligned > with > this queue's max request size because of multipage bvec, then small > sized > bio can be made by bio split, so try to align bio with max io size if > it isn't the last one. > > Ed Tsai reported this way improves 64MB read/write by > 15%~25% in > Antutu V10 Storage Test. > > Reported-by: Ed Tsai <ed.tsai@mediatek.com> > Closes: > https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/ > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/bio.c | 57 > +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 57 insertions(+) > > diff --git a/block/bio.c b/block/bio.c > index 816d412c06e9..749b6283dab9 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1294,6 +1294,47 @@ static int __bio_iov_iter_get_pages(struct bio > *bio, struct iov_iter *iter) > return ret; > } > > +/* should only be called before submission */ > +static void bio_shrink(struct bio *bio, unsigned bytes) > +{ > + unsigned int size = bio->bi_iter.bi_size; > + int idx; > + > + if (bytes >= size) > + return; > + > + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > + > + idx = bio->bi_vcnt - 1; > + bio->bi_iter.bi_size -= bytes; > + while (bytes > 0) { > + struct bio_vec *bv = &bio->bi_io_vec[idx]; > + unsigned int len = min_t(unsigned, bv->bv_len, bytes); > + > + bytes -= len; > + bv->bv_len -= len; > + if (!bv->bv_len) { > + bio_release_page(bio, bv->bv_page); > + idx--; > + } > + } > + WARN_ON_ONCE(idx < 0); > + bio->bi_vcnt = idx + 1; > +} > + > +static unsigned bio_align_with_io_size(struct bio *bio) > +{ > + struct request_queue *q = bdev_get_queue(bio->bi_bdev); > + unsigned int size = bio->bi_iter.bi_size; > + unsigned int trim = size & ((queue_max_sectors(q) << 9) - 1); > + > + if (trim && trim != size) { > + bio_shrink(bio, trim); > + return trim; > + } > + return 0; > +} > + > /** > * bio_iov_iter_get_pages - add user or kernel pages to a bio > * @bio: bio to add pages to > @@ -1333,6 +1374,22 @@ int bio_iov_iter_get_pages(struct bio *bio, > struct iov_iter *iter) > ret = __bio_iov_iter_get_pages(bio, iter); > } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); > > + > + /* > + * If we still have data and bio is full, this bio size may not > be > + * aligned with max io size, small bio can be caused by split, > try > + * to avoid this situation by aligning bio with max io size. > + * > + * Big chunk of sequential IO workload can benefit from this > way. > + */ > + if (!ret && iov_iter_count(iter) && bio->bi_bdev && > + bio_op(bio) != REQ_OP_ZONE_APPEND) { > + unsigned trim = bio_align_with_io_size(bio); > + > + if (trim) > + iov_iter_revert(iter, trim); > + } > + > return bio->bi_vcnt ? 0 : ret; > } > EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); > -- > 2.41.0 Thanks. This looks good to me. Acked-by: Ed Tsai <ed.tsai@mediatek.com> -- Best Regards, Ed Tsai
On Tue, Nov 07, 2023 at 11:28:30PM -0800, Christoph Hellwig wrote: > On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote: > > In case of big chunk sequential IO, bio's size is often not aligned with > > this queue's max request size because of multipage bvec, then small sized > > bio can be made by bio split, so try to align bio with max io size if > > it isn't the last one. > > I have a hard time parsing this long sentence. It covers the reasons(multipage bvec, bio split, big sequential IO) and solution( align bio), or any suggestion to simplify it further? > > > + /* > > + * If we still have data and bio is full, this bio size may not be > > + * aligned with max io size, small bio can be caused by split, try > > + * to avoid this situation by aligning bio with max io size. > > + * > > + * Big chunk of sequential IO workload can benefit from this way. > > + */ > > + if (!ret && iov_iter_count(iter) && bio->bi_bdev && > > + bio_op(bio) != REQ_OP_ZONE_APPEND) { > > + unsigned trim = bio_align_with_io_size(bio); > > Besides the fact that bi_bdev should always be set, this really does > thing backwards. Looks it is true after your patch passes bdev to bio_alloc*(), but bio_add_page() is just fine without bio->bi_bdev. Also this way follows the check for aligning with block size. But seems safe to remove the check. > Instead of rewding things which is really > expensive don't add it in the first place. The rewind may not be avoided because iov_iter_extract_pages() can often return less pages than requested. Also rewind is only done after the bio becomes full(at least 1MB bytes are added) and there is still data left, so it shouldn't be too expensive. I will provide 'size' hint to iov_iter_extract_pages() to try to make it aligned from the beginning in next version, but rewind may not be avoided. Thanks, Ming
On Tue, 2023-11-07 at 23:28 -0800, Christoph Hellwig wrote: > you have verified the sender or the content. > On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote: > > In case of big chunk sequential IO, bio's size is often not aligned > with > > this queue's max request size because of multipage bvec, then small > sized > > bio can be made by bio split, so try to align bio with max io size > if > > it isn't the last one. > > I have a hard time parsing this long sentence. > > > +/* > > + * If we still have data and bio is full, this bio size may not be > > + * aligned with max io size, small bio can be caused by split, try > > + * to avoid this situation by aligning bio with max io size. > > + * > > + * Big chunk of sequential IO workload can benefit from this way. > > + */ > > +if (!ret && iov_iter_count(iter) && bio->bi_bdev && > > +bio_op(bio) != REQ_OP_ZONE_APPEND) { > > +unsigned trim = bio_align_with_io_size(bio); > > Besides the fact that bi_bdev should always be set, this really does > thing backwards. Instead of rewding things which is really > expensive don't add it in the first place. > We cannot predict the number of pages that can be filled in a bio as it depends on the physical layout of memory buffer. The only option is to limit the bio to the queue limit. Even if I fill the bio with a large amount of pages, it will still be split based on the queue limit at block layer. Therefore, I believe it is appropriate to limit the bio size in this case. We can enforce this limitation before extracting the page, which would elimiate the need for iov_iter_revert. -- Best Regards, Ed Tsai
On Wed, 2023-11-08 at 15:58 +0800, Ming Lei wrote: > On Tue, Nov 07, 2023 at 11:28:30PM -0800, Christoph Hellwig wrote: > > On Tue, Nov 07, 2023 at 06:01:40PM +0800, Ming Lei wrote: > > > In case of big chunk sequential IO, bio's size is often not > aligned with > > > this queue's max request size because of multipage bvec, then > small sized > > > bio can be made by bio split, so try to align bio with max io > size if > > > it isn't the last one. > > > > I have a hard time parsing this long sentence. > > It covers the reasons(multipage bvec, bio split, big sequential IO) > and solution( > align bio), or any suggestion to simplify it further? > > > > > > +/* > > > + * If we still have data and bio is full, this bio size may not > be > > > + * aligned with max io size, small bio can be caused by split, > try > > > + * to avoid this situation by aligning bio with max io size. > > > + * > > > + * Big chunk of sequential IO workload can benefit from this > way. > > > + */ > > > +if (!ret && iov_iter_count(iter) && bio->bi_bdev && > > > +bio_op(bio) != REQ_OP_ZONE_APPEND) { > > > +unsigned trim = bio_align_with_io_size(bio); > > > > Besides the fact that bi_bdev should always be set, this really > does > > thing backwards. > > Looks it is true after your patch passes bdev to bio_alloc*(), but > bio_add_page() is just fine without bio->bi_bdev. > > Also this way follows the check for aligning with block size. > > But seems safe to remove the check. > > > Instead of rewding things which is really > > expensive don't add it in the first place. > > The rewind may not be avoided because iov_iter_extract_pages() can > often > return less pages than requested. Also rewind is only done after the > bio > becomes full(at least 1MB bytes are added) and there is still data > left, > so it shouldn't be too expensive. > > I will provide 'size' hint to iov_iter_extract_pages() to try to make > it aligned from the beginning in next version, but rewind may not be > avoided. > > > Thanks, > Ming We cannot predict the number of pages that can be filled in a bio as it depends on the physical layout of memory buffer. The only option is to limit the bio to the queue limit. Even if I fill the bio with a large amount of pages, it will still be split based on the queue limit at block layer. Therefore, I believe it is appropriate to limit the bio size in this case. We can enforce this limitation before extracting the page, which would elimiate the need for iov_iter_revert. Hi Christoph, what stage do you think would be better to impose the limitaion? -- Best Regards, Ed Tsai >
diff --git a/block/bio.c b/block/bio.c index 816d412c06e9..749b6283dab9 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1294,6 +1294,47 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return ret; } +/* should only be called before submission */ +static void bio_shrink(struct bio *bio, unsigned bytes) +{ + unsigned int size = bio->bi_iter.bi_size; + int idx; + + if (bytes >= size) + return; + + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); + + idx = bio->bi_vcnt - 1; + bio->bi_iter.bi_size -= bytes; + while (bytes > 0) { + struct bio_vec *bv = &bio->bi_io_vec[idx]; + unsigned int len = min_t(unsigned, bv->bv_len, bytes); + + bytes -= len; + bv->bv_len -= len; + if (!bv->bv_len) { + bio_release_page(bio, bv->bv_page); + idx--; + } + } + WARN_ON_ONCE(idx < 0); + bio->bi_vcnt = idx + 1; +} + +static unsigned bio_align_with_io_size(struct bio *bio) +{ + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + unsigned int size = bio->bi_iter.bi_size; + unsigned int trim = size & ((queue_max_sectors(q) << 9) - 1); + + if (trim && trim != size) { + bio_shrink(bio, trim); + return trim; + } + return 0; +} + /** * bio_iov_iter_get_pages - add user or kernel pages to a bio * @bio: bio to add pages to @@ -1333,6 +1374,22 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) ret = __bio_iov_iter_get_pages(bio, iter); } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); + + /* + * If we still have data and bio is full, this bio size may not be + * aligned with max io size, small bio can be caused by split, try + * to avoid this situation by aligning bio with max io size. + * + * Big chunk of sequential IO workload can benefit from this way. + */ + if (!ret && iov_iter_count(iter) && bio->bi_bdev && + bio_op(bio) != REQ_OP_ZONE_APPEND) { + unsigned trim = bio_align_with_io_size(bio); + + if (trim) + iov_iter_revert(iter, trim); + } + return bio->bi_vcnt ? 0 : ret; } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
In case of big chunk sequential IO, bio's size is often not aligned with this queue's max request size because of multipage bvec, then small sized bio can be made by bio split, so try to align bio with max io size if it isn't the last one. Ed Tsai reported this way improves 64MB read/write by > 15%~25% in Antutu V10 Storage Test. Reported-by: Ed Tsai <ed.tsai@mediatek.com> Closes: https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/bio.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)