diff mbox series

[-next,RFC,2/6] block: refactor to split bio thoroughly

Message ID 20220329094048.2107094-3-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series improve large random io for HDD | expand

Commit Message

Yu Kuai March 29, 2022, 9:40 a.m. UTC
Currently, the splited bio is handled first, and then continue to split
the original bio. This patch tries to split the original bio thoroughly,
so that it can be known in advance how many tags will be needed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bio.c               |  2 +
 block/blk-merge.c         | 90 ++++++++++++++++++++++++++++-----------
 block/blk-mq.c            |  7 ++-
 block/blk.h               |  3 +-
 include/linux/blk_types.h |  4 ++
 5 files changed, 77 insertions(+), 29 deletions(-)

Comments

Jens Axboe March 29, 2022, 12:46 p.m. UTC | #1
On 3/29/22 3:40 AM, Yu Kuai wrote:
> Currently, the splited bio is handled first, and then continue to split
> the original bio. This patch tries to split the original bio thoroughly,
> so that it can be known in advance how many tags will be needed.

This makes the bio almost 10% bigger in size, which is NOT something
that is just done casually and without strong reasoning.

So please provide that, your commit message is completely missing
justification for why this change is useful or necessary. A good
commit always explains WHY the change needs to be made, yours is
simply stating WHAT is being done. The latter can be gleaned from
the code change anyway.
Christoph Hellwig March 29, 2022, 1:32 p.m. UTC | #2
On Tue, Mar 29, 2022 at 05:40:44PM +0800, Yu Kuai wrote:
> Currently, the splited bio is handled first, and then continue to split
> the original bio. This patch tries to split the original bio thoroughly,
> so that it can be known in advance how many tags will be needed.

How do you avoid the deadlock risk with all the split bios being
submitted together?

But more importantly why does your use case even have splits that get
submitted together?  Is this a case of Linus' stupidly low default
max_sectors when the hardware supports more, or is the hardware limited
to a low number of sectors per request?  Or do we hit another reason
for the split?
Jens Axboe March 29, 2022, 2:35 p.m. UTC | #3
On 3/29/22 7:32 AM, Christoph Hellwig wrote:
> On Tue, Mar 29, 2022 at 05:40:44PM +0800, Yu Kuai wrote:
>> Currently, the splited bio is handled first, and then continue to split
>> the original bio. This patch tries to split the original bio thoroughly,
>> so that it can be known in advance how many tags will be needed.
> 
> How do you avoid the deadlock risk with all the split bios being
> submitted together?

That too

> But more importantly why does your use case even have splits that get
> submitted together?  Is this a case of Linus' stupidly low default
> max_sectors when the hardware supports more, or is the hardware limited
> to a low number of sectors per request?  Or do we hit another reason
> for the split?

See the posted use case, it's running 512kb ios on a 128kb device.
Christoph Hellwig March 29, 2022, 2:40 p.m. UTC | #4
On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote:
> > But more importantly why does your use case even have splits that get
> > submitted together?  Is this a case of Linus' stupidly low default
> > max_sectors when the hardware supports more, or is the hardware limited
> > to a low number of sectors per request?  Or do we hit another reason
> > for the split?
> 
> See the posted use case, it's running 512kb ios on a 128kb device.

That is an awfully low limit these days.  I'm really not sure we should
optimize the block layer for that.
Jens Axboe March 29, 2022, 2:41 p.m. UTC | #5
On 3/29/22 8:40 AM, Christoph Hellwig wrote:
> On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote:
>>> But more importantly why does your use case even have splits that get
>>> submitted together?  Is this a case of Linus' stupidly low default
>>> max_sectors when the hardware supports more, or is the hardware limited
>>> to a low number of sectors per request?  Or do we hit another reason
>>> for the split?
>>
>> See the posted use case, it's running 512kb ios on a 128kb device.
> 
> That is an awfully low limit these days.  I'm really not sure we should
> optimize the block layer for that.

That's exactly what my replies have been saying. I don't think this is
a relevant thing to optimize for.

Fixing fairness for wakeups seems useful, however.
Christoph Hellwig March 29, 2022, 2:42 p.m. UTC | #6
On Tue, Mar 29, 2022 at 08:41:57AM -0600, Jens Axboe wrote:
> Fixing fairness for wakeups seems useful, however.

Agreed.
Yu Kuai March 30, 2022, 1:35 a.m. UTC | #7
On 2022/03/29 20:46, Jens Axboe wrote:
> On 3/29/22 3:40 AM, Yu Kuai wrote:
>> Currently, the splited bio is handled first, and then continue to split
>> the original bio. This patch tries to split the original bio thoroughly,
>> so that it can be known in advance how many tags will be needed.
> 
> This makes the bio almost 10% bigger in size, which is NOT something
> that is just done casually and without strong reasoning.
Hi,

Thanks for taking time on this patchset, It's right this way is not
appropriate.
> 
> So please provide that, your commit message is completely missing
> justification for why this change is useful or necessary. A good
> commit always explains WHY the change needs to be made, yours is
> simply stating WHAT is being done. The latter can be gleaned from
> the code change anyway.
> 
Thanks for the guidance, I'll pay attemtion to that carefully in future.

For this patch, what I want to do is to gain information about how many
tags will be needed for the big io before getting the first tag, and use
that information to optimize wake up path. The problem in this patch is
that adding two feilds in bio is a bad move.

I was thinking that for segment split, I can get the information by
caculating bio segments / queue max segments.

Thanks,
Kuai
Yu Kuai March 30, 2022, 1:54 a.m. UTC | #8
On 2022/03/29 22:41, Jens Axboe wrote:
> On 3/29/22 8:40 AM, Christoph Hellwig wrote:
>> On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote:
>>>> But more importantly why does your use case even have splits that get
>>>> submitted together?  Is this a case of Linus' stupidly low default
>>>> max_sectors when the hardware supports more, or is the hardware limited
>>>> to a low number of sectors per request?  Or do we hit another reason
>>>> for the split?
>>>
>>> See the posted use case, it's running 512kb ios on a 128kb device.
Hi,

The problem was first found during kernel upgrade(v3.10 to v4.18), and
we maintain a series of io performance test suites, and one of the test
is fio random rw with large bs. In the environment, the 'max_sectors_kb'
is 256kb, and fio bs is 1m.
>>
>> That is an awfully low limit these days.  I'm really not sure we should
>> optimize the block layer for that.
> 
> That's exactly what my replies have been saying. I don't think this is
> a relevant thing to optimize for.

If the use case that large ios get submitted together is not a common
issue(probably not since it's been a long time without complaining),
I agree that we should not optimize the block layer for that.

Thanks,
Kuai
> 
> Fixing fairness for wakeups seems useful, however.
>
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index cdd7b2915c53..ac7ce8b4ba42 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -258,6 +258,8 @@  void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
 	bio->bi_status = 0;
+	bio->bi_nr_segs = 0;
+	bio->bi_nr_split = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
 	bio->bi_iter.bi_idx = 0;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7771dacc99cb..340860746cac 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -309,44 +309,85 @@  static struct bio *blk_bio_segment_split(struct request_queue *q,
 	return bio_split(bio, sectors, GFP_NOIO, bs);
 }
 
-/**
- * __blk_queue_split - split a bio and submit the second half
- * @q:       [in] request_queue new bio is being queued at
- * @bio:     [in, out] bio to be split
- * @nr_segs: [out] number of segments in the first bio
- *
- * Split a bio into two bios, chain the two bios, submit the second half and
- * store a pointer to the first half in *@bio. If the second bio is still too
- * big it will be split by a recursive call to this function. Since this
- * function may allocate a new bio from q->bio_split, it is the responsibility
- * of the caller to ensure that q->bio_split is only released after processing
- * of the split bio has finished.
- */
-void __blk_queue_split(struct request_queue *q, struct bio **bio,
-		       unsigned int *nr_segs)
+static struct bio *blk_queue_split_one(struct request_queue *q, struct bio *bio)
 {
 	struct bio *split = NULL;
+	unsigned int nr_segs = 1;
 
-	switch (bio_op(*bio)) {
+	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
-		split = blk_bio_discard_split(q, *bio, &q->bio_split, nr_segs);
+		split = blk_bio_discard_split(q, bio, &q->bio_split, &nr_segs);
 		break;
 	case REQ_OP_WRITE_ZEROES:
-		split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split,
-				nr_segs);
+		split = blk_bio_write_zeroes_split(q, bio, &q->bio_split,
+				&nr_segs);
 		break;
 	default:
-		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
+		split = blk_bio_segment_split(q, bio, &q->bio_split, &nr_segs);
 		break;
 	}
 
 	if (split) {
 		/* there isn't chance to merge the splitted bio */
-		split->bi_opf |= REQ_NOMERGE;
+		split->bi_opf |= (REQ_NOMERGE | REQ_SPLIT);
+		split->bi_nr_segs = nr_segs;
+
+		bio_chain(split, bio);
+		trace_block_split(split, bio->bi_iter.bi_sector);
+	} else {
+		bio->bi_nr_segs = nr_segs;
+	}
+
+	return split;
+}
+
+static unsigned short blk_queue_split_all(struct request_queue *q,
+					  struct bio *bio)
+{
+	struct bio *split = NULL;
+	struct bio *first = NULL;
+	unsigned short nr_split = 1;
+	unsigned short total;
 
-		bio_chain(split, *bio);
-		trace_block_split(split, (*bio)->bi_iter.bi_sector);
+	if (!current->bio_list)
+		return 1;
+
+	while ((split = blk_queue_split_one(q, bio))) {
+		if (!first)
+			first = split;
+
+		nr_split++;
+		submit_bio_noacct(split);
+	}
+
+	total = nr_split;
+	while (first) {
+		first->bi_nr_split = --total;
+		first = first->bi_next;
+	}
+
+	return nr_split;
+}
+
+/**
+ * __blk_queue_split - split a bio, store the first and submit others
+ * @q:       [in] request_queue new bio is being queued at
+ * @bio:     [in, out] bio to be split
+ *
+ * Split a bio into several bios, chain all the bios, store a pointer to the
+ * first in *@bio, and submit others. Since this function may allocate a new
+ * bio from q->bio_split, it is the responsibility of the caller to ensure
+ * that q->bio_split is only released after processing of the split bio has
+ * finished.
+ */
+void __blk_queue_split(struct request_queue *q, struct bio **bio)
+{
+	struct bio *split = blk_queue_split_one(q, *bio);
+
+	if (split) {
+		split->bi_nr_split = blk_queue_split_all(q, *bio);
+		(*bio)->bi_opf |= REQ_SPLIT;
 		submit_bio_noacct(*bio);
 		*bio = split;
 	}
@@ -365,10 +406,9 @@  void __blk_queue_split(struct request_queue *q, struct bio **bio,
 void blk_queue_split(struct bio **bio)
 {
 	struct request_queue *q = bdev_get_queue((*bio)->bi_bdev);
-	unsigned int nr_segs;
 
 	if (blk_may_split(q, *bio))
-		__blk_queue_split(q, bio, &nr_segs);
+		__blk_queue_split(q, bio);
 }
 EXPORT_SYMBOL(blk_queue_split);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e6f24fa4a4c2..cad207d2079e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2817,8 +2817,11 @@  void blk_mq_submit_bio(struct bio *bio)
 	blk_status_t ret;
 
 	blk_queue_bounce(q, &bio);
-	if (blk_may_split(q, bio))
-		__blk_queue_split(q, &bio, &nr_segs);
+	if (blk_may_split(q, bio)) {
+		if (!(bio->bi_opf & REQ_SPLIT))
+			__blk_queue_split(q, &bio);
+		nr_segs = bio->bi_nr_segs;
+	}
 
 	if (!bio_integrity_prep(bio))
 		return;
diff --git a/block/blk.h b/block/blk.h
index 8ccbc6e07636..cd478187b525 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -303,8 +303,7 @@  static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 		bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
 }
 
-void __blk_queue_split(struct request_queue *q, struct bio **bio,
-			unsigned int *nr_segs);
+void __blk_queue_split(struct request_queue *q, struct bio **bio);
 int ll_back_merge_fn(struct request *req, struct bio *bio,
 		unsigned int nr_segs);
 bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index dd0763a1c674..702f6b83dc88 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -250,6 +250,8 @@  struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
+	unsigned int		bi_nr_segs;
+	unsigned int		bi_nr_split;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 
@@ -416,6 +418,7 @@  enum req_flag_bits {
 	/* for driver use */
 	__REQ_DRV,
 	__REQ_SWAP,		/* swapping request. */
+	__REQ_SPLIT,		/* io is split */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -440,6 +443,7 @@  enum req_flag_bits {
 
 #define REQ_DRV			(1ULL << __REQ_DRV)
 #define REQ_SWAP		(1ULL << __REQ_SWAP)
+#define REQ_SPLIT		(1ULL << __REQ_SPLIT)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)