diff mbox series

block: Minimize the number of requests in Direct I/O when bio get pages.

Message ID 20240313092346.1304889-1-zhengxu.zhang@unisoc.com (mailing list archive)
State New, archived
Headers show
Series block: Minimize the number of requests in Direct I/O when bio get pages. | expand

Commit Message

Zhengxu Zhang March 13, 2024, 9:23 a.m. UTC
We find that when direct io is decomposed into requests,
there are often have some requests size that are less than the maximum
size of a request, which increases the number of requests for a direct io
 and slows down the speed.

We try to let bio determine how many pages canbe obtained by dividing it
by the maximum request through bio_export_page
when obtaining a set of pages. Then try to obtain
as many pages as possible.

Of course, if the last page obtained by
bio is less than the maximum number that bio can obtain,
it will also be added to that bio (if the bio have enough vector)
or the next bio.

Device info: ufs RAM:6GB kernel:6.6 improve seq read 5%
Device info: emmc RAM:4GB kernel:5.15 improve seq read 10%

1.androidbench test result(ufs RAM:6GB kernel6.6):

before:								 avg
seq read	1812	1735	1879	1887	1817	1849	1829
seq write	1271	1294	1287	1305	1296	1296	1291
ram read	301	298	297	297	295	294	297
ram write	343	342	344	351	349	345	345
sql insert	2773	2779	2856	2801	2932	2913	2842
sql update	2819	2888	3009	2953	2961	2974	2934
sql delete	4117	4004	4121	4158	4088	4071	4093

after:								 avg
seq read	1923	1919	1946	1921	1953	1937	1933	+5.65%
seq write	1277	1301	1301	1296	1291	1299	1294	+0.21%
ram read	302	301	302	306	301	297	301	+1.52%
ram write	350	359	350	339	351	356	350	+1.49%
sql insert	2788	2935	3011	3022	3069	3037	2977	+4.74%
sql update	2991	3123	3125	3070	3087	3132	3088	+5.25%
sql delete	4157	4174	4173	4177	4297	4246	4204	+2.71%

seq read avg: 1933MB/s better than 1829MB/s (up about 5.65%)

2.trace:
before:   lots of (2048+x),it will product 2+1 requests,
           the "1" will lead speed down.

	block_bio_queue: 179,0 R 84601376 + 2064 [Thread-2]
	block_bio_queue: 179,0 R 84603440 + 2048 [Thread-2]
	block_bio_queue: 179,0 R 84605488 + 2072 [Thread-2]
	block_bio_queue: 179,0 R 84607560 + 2064 [Thread-2]
	block_bio_queue: 179,0 R 84609624 + 2064 [Thread-2]

after:   we make size of bio can divide by request_max_size.
       and then do not product the "1", minimize one direct-io requests  ,
       and minimize the number of bio as soon as possible at the same time.

	block_bio_queue: 179,0 R 69829456 + 2048 [Thread-3]
	block_bio_queue: 179,0 R 69831504 + 3072 [Thread-3]
	block_bio_queue: 179,0 R 69834576 + 4096 [Thread-3]
	block_bio_queue: 179,0 R 69838672 + 5120 [Thread-3]

	block_bio_queue: 8,0 WFS 153795416 + 5120 [Thread-11]
	block_bio_queue: 8,0 WFS 153800536 + 7168 [Thread-11]
	block_bio_queue: 8,0 WFS 153807704 + 9216 [Thread-11]
	block_bio_queue: 8,0 WFS 153816920 + 10240 [Thread-11]
	block_bio_queue: 8,0 WFS 153827160 + 8192 [Thread-11]

Signed-off-by: Zhengxu Zhang <Zhengxu.Zhang@unisoc.com>
---
 block/bio.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 71 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig March 13, 2024, 9:49 p.m. UTC | #1
> +		request_max_size = queue_max_bytes(bdev_get_queue(bio->bi_bdev));

Besides all the coding style problems this is simply not how the bio
interface works.  queue_max_bytes is a limit only available to the
splitting routines and not to the upper layers submitting I/O.

Please try to debug your scenarious a little more - just because a bio
get split off it should not just turn into a request of it's own,
but be merged with the next bio due to the plug.  If that doesn't happen
we do have a problem somewhere.
张政旭 March 14, 2024, 3:01 a.m. UTC | #2
hi Christoph Hellwig:

>queue_max_bytes is a limit only available to the
>splitting routines and not to the upper layers submitting I/O.

keep  “bio->bi_iter.bi_size % request_max_size = 0” if possible ,it
better for he final number of requests to be processed.

>Please try to debug your scenarious a little more - just because a bio
>get split off it should not just turn into a request of it's own,
>but be merged with the next bio due to the plug.  If that doesn't happen
>we do have a problem somewhere.

without patch,the request can not merge with the next bio.

for example:
first bio size is 2088 * 512 , split by 1024+1024+40,address
138577200,138578224,138579248
second bio size is 2064 * 512, split by 1024+1024+16,address
138579288,138580312,138581336
They are continuous. However, when decomposing requests in bio,
try to split the maximum request as much as possible,
resulting in the first request of the second bio being the maximum request size,
 so it cannot be merged with the last request of the first bio.
Therefore, two additional requests need to be consumed.

This is the trace without patch.
request 1.3 do not merged with request 2.1,beacause request 2.1 is
full.And also can not merged with request2.3.
        Thread-6-11566   [007] .....  1982.474122:
f2fs_direct_IO_enter: dev = (254,48), ino = 6366 pos = 0 len =
33554432 ki_flags = 20000 ki_ioprio = 4004 rw = 0
        Thread-6-11566   [007] .....  1982.474843: block_bio_queue:
254,48 R 123268400 + 2088 [Thread-6]
        Thread-6-11566   [007] .....  1982.474854: block_bio_remap:
8,0 R 123268400 + 2088 <- (254,48) 123268400
        Thread-6-11566   [007] .....  1982.474854: block_bio_remap:
8,0 R 138577200 + 2088 <- (259,55) 123268400
        Thread-6-11566   [007] .....  1982.474855: block_bio_queue:
8,0 R 138577200 + 2088 [Thread-6]                 <<<<first bio
        Thread-6-11566   [007] .....  1982.474863: block_split: 8,0 R
138577200 / 138578224 [Thread-6]
        Thread-6-11566   [007] .....  1982.474869: block_getrq: 8,0 R
138577200 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.474876: block_split: 8,0 R
138578224 / 138579248 [Thread-6]
        Thread-6-11566   [007] .....  1982.474878: block_getrq: 8,0 R
138578224 + 1024 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.474881: block_rq_insert:
8,0 R 524288 () 138577200 + 1024 [Thread-6]       <<<<<request1.1
        Thread-6-11566   [007] .....  1982.474897: block_rq_issue: 8,0
R 524288 () 138577200 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477396: block_getrq: 8,0 R
138579248 + 40 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.477400: block_rq_insert:
8,0 R 524288 () 138578224 + 1024 [Thread-6]       <<<<<request1.2
        Thread-6-11566   [007] .....  1982.477409: block_rq_issue: 8,0
R 524288 () 138578224 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477515: block_bio_queue:
254,48 R 123270488 + 2064 [Thread-6]
        Thread-6-11566   [007] .....  1982.477520: block_bio_remap:
8,0 R 123270488 + 2064 <- (254,48) 123270488
        Thread-6-11566   [007] .....  1982.477521: block_bio_remap:
8,0 R 138579288 + 2064 <- (259,55) 123270488
        Thread-6-11566   [007] .....  1982.477522: block_bio_queue:
8,0 R 138579288 + 2064 [Thread-6]                 <<<<second bio
        Thread-6-11566   [007] .....  1982.477528: block_split: 8,0 R
138579288 / 138580312 [Thread-6]
        Thread-6-11566   [007] .....  1982.477530: block_getrq: 8,0 R
138579288 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477534: block_split: 8,0 R
138580312 / 138581336 [Thread-6]
        Thread-6-11566   [007] .....  1982.477536: block_getrq: 8,0 R
138580312 + 1024 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.477538: block_rq_insert:
8,0 R 20480 () 138579248 + 40 [Thread-6]          <<<<<request1.3
        Thread-6-11566   [007] ...1.  1982.477538: block_rq_insert:
8,0 R 524288 () 138579288 + 1024 [Thread-6]       <<<<<request2.1
        Thread-6-11566   [007] .....  1982.477545: block_rq_issue: 8,0
R 20480 () 138579248 + 40 [Thread-6]
        Thread-6-11566   [007] .....  1982.477557: block_rq_issue: 8,0
R 524288 () 138579288 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477596: block_getrq: 8,0 R
138581336 + 16 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.477597: block_rq_insert:
8,0 R 524288 () 138580312 + 1024 [Thread-6]       <<<<<request2.2
        Thread-6-11566   [007] .....  1982.477604: block_rq_issue: 8,0
R 524288 () 138580312 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477703: block_bio_queue:
254,48 R 123272552 + 2072 [Thread-6]
        Thread-6-11566   [007] .....  1982.477707: block_bio_remap:
8,0 R 123272552 + 2072 <- (254,48) 123272552
        Thread-6-11566   [007] .....  1982.477707: block_bio_remap:
8,0 R 138581352 + 2072 <- (259,55) 123272552
        Thread-6-11566   [007] .....  1982.477708: block_bio_queue:
8,0 R 138581352 + 2072 [Thread-6]
        Thread-6-11566   [007] .....  1982.477713: block_split: 8,0 R
138581352 / 138582376 [Thread-6]
        Thread-6-11566   [007] .....  1982.477715: block_getrq: 8,0 R
138581352 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477719: block_split: 8,0 R
138582376 / 138583400 [Thread-6]
        Thread-6-11566   [007] .....  1982.477721: block_getrq: 8,0 R
138582376 + 1024 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.477722: block_rq_insert:
8,0 R 8192 () 138581336 + 16 [Thread-6]           <<<<request2.3
        Thread-6-11566   [007] ...1.  1982.477723: block_rq_insert:
8,0 R 524288 () 138581352 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477728: block_rq_issue: 8,0
R 8192 () 138581336 + 16 [Thread-6]
        Thread-6-11566   [007] .....  1982.477740: block_rq_issue: 8,0
R 524288 () 138581352 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477778: block_getrq: 8,0 R
138583400 + 24 [Thread-6]
        Thread-6-11566   [007] ...1.  1982.477779: block_rq_insert:
8,0 R 524288 () 138582376 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477784: block_rq_issue: 8,0
R 524288 () 138582376 + 1024 [Thread-6]
        Thread-6-11566   [007] .....  1982.477886: block_bio_queue:
254,48 R 123274624 + 2056 [Thread-6]
        Thread-6-11566   [007] .....  1982.477889: block_bio_remap:
8,0 R 123274624 + 2056 <- (254,48) 123274624

....
        Thread-6-11566   [007] ..s..  1982.478899: block_rq_complete:
8,0 R () 138579248 + 40 [0]                        <<<<request1.3
finish
.....
        Thread-6-11566   [007] ..s..  1982.478995: block_rq_complete:
8,0 R () 138577200 + 1024 [0]                       <<<<request 1.1
finish
...
        Thread-6-11566   [007] ..s..  1982.479603: block_rq_complete:
8,0 R () 138578224 + 1024 [0]                    <<<<<request1.2
finish
        Thread-6-11566   [007] ..s..  1982.479824: block_rq_complete:
8,0 R () 138581336 + 16 [0]       <<<<request 2.3 finish
....
....
        Thread-6-11566   [007] ..s..  1982.480173: block_rq_complete:
8,0 R () 138579288 + 1024 [0]     <<<<request 2.1 finish
....
        Thread-6-11566   [007] ..s..  1982.480739: block_rq_complete:
8,0 R () 138580312 + 1024 [0]     <<<<request 2.2 finish

Christoph Hellwig <hch@infradead.org> 于2024年3月14日周四 05:49写道:
>
> > +             request_max_size = queue_max_bytes(bdev_get_queue(bio->bi_bdev));
>
> Besides all the coding style problems this is simply not how the bio
> interface works.  queue_max_bytes is a limit only available to the
> splitting routines and not to the upper layers submitting I/O.
>
> Please try to debug your scenarious a little more - just because a bio
> get split off it should not just turn into a request of it's own,
> but be merged with the next bio due to the plug.  If that doesn't happen
> we do have a problem somewhere.
>
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index b9642a41f286..f821a5fb72bd 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -903,6 +903,70 @@  static inline bool bio_full(struct bio *bio, unsigned len)
 	return false;
 }
 
+/**
+ * bio_export_pages - calculate how many pages does this bio needs,
+ * in order to minimize the number of requests the bio needs to split.
+ * @bio:        bio to check
+ * @page_total: total number of pages need to be added. If no page_total,
+ *              it can be used by UINT_MAX.
+ *
+ * Return the page numbers can be added to this bio.
+ * If return 0 means do not suggest add page in this bio,
+ * lead to add more requests in direct-io.
+ */
+static unsigned int bio_export_pages(struct bio *bio, unsigned int page_total)
+{
+	unsigned int request_max_size;
+	unsigned int request_max_pages;
+	unsigned short bio_export_pages;
+	unsigned int last_request_size;
+	unsigned int nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
+
+	/*
+	 * The number of pages that need to add is less than the pages that bio can be get,
+	 * we can get whole pages directly.
+	 */
+	if (page_total < nr_pages)
+		return page_total;
+
+	if (bio->bi_bdev) {
+		request_max_size = queue_max_bytes(bdev_get_queue(bio->bi_bdev));
+		/* request_max_size >> PAGE_SHIFT should equals 2^n */
+		request_max_pages = request_max_size >> PAGE_SHIFT;
+		last_request_size = bio->bi_iter.bi_size & (request_max_size - 1);
+
+		/*
+		 * when bio->bi_iter.bi_size % request_max_size = 0, we need confirm if
+		 * we still need to add a page to bio, because now the bio size is best for request.
+		 *
+		 * If bio can get the max number of pages is less than the number of
+		 * one complete request, we should still get pages if possible.
+		 *
+		 * If the bio can not get the size of one complete request that we badly think
+		 * the page physical addresses are not contiguous.
+		 * we suggest this bio do not add pages, the pages maybe break
+		 *  "bio->bi_iter.bi_size % request_max_size = 0", and then make a litte request.
+		 * the next bio continues maybe better.
+		 */
+		if (last_request_size == 0 && bio->bi_max_vecs > request_max_pages &&
+			nr_pages < request_max_pages)
+			return 0;
+
+		/*
+		 * base on the bio->vcnt, we confirm the max number of pages that can keep
+		 *  bio->bi_iter.bi_size % request_max_size = 0 when we can get,
+		 * note: the sum of the pages size maybe equals more than one request size.
+		 */
+		bio_export_pages = (request_max_size - last_request_size) >> PAGE_SHIFT;
+		if (!bio_export_pages)
+			bio_export_pages = request_max_pages;
+		if (nr_pages > bio_export_pages)
+			nr_pages -= nr_pages & (request_max_pages - 1);
+	}
+
+	return nr_pages;
+}
+
 static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page)
 {
@@ -1228,16 +1292,16 @@  static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
  * __bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
+ * @nr_pages: the number of pages that bio want to get.
  *
  * Extracts pages from *iter and appends them to @bio's bvec array.  The pages
  * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
  * For a multi-segment *iter, this function only adds pages from the next
  * non-empty segment of the iov iterator.
  */
-static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter, unsigned short nr_pages)
 {
 	iov_iter_extraction_t extraction_flags = 0;
-	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
@@ -1329,6 +1393,7 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	int ret = 0;
+	unsigned short nr_pages;
 
 	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
 		return -EIO;
@@ -1342,7 +1407,10 @@  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	if (iov_iter_extract_will_pin(iter))
 		bio_set_flag(bio, BIO_PAGE_PINNED);
 	do {
-		ret = __bio_iov_iter_get_pages(bio, iter);
+		nr_pages = bio_export_pages(bio, iov_iter_count(iter));
+		if (!nr_pages)
+			break;
+		ret = __bio_iov_iter_get_pages(bio, iter, nr_pages);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	return bio->bi_vcnt ? 0 : ret;