Message ID | 20220329094048.2107094-3-yukuai3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | improve large random io for HDD | expand |
On 3/29/22 3:40 AM, Yu Kuai wrote: > Currently, the splited bio is handled first, and then continue to split > the original bio. This patch tries to split the original bio thoroughly, > so that it can be known in advance how many tags will be needed. This makes the bio almost 10% bigger in size, which is NOT something that is just done casually and without strong reasoning. So please provide that, your commit message is completely missing justification for why this change is useful or necessary. A good commit always explains WHY the change needs to be made, yours is simply stating WHAT is being done. The latter can be gleaned from the code change anyway.
On Tue, Mar 29, 2022 at 05:40:44PM +0800, Yu Kuai wrote: > Currently, the splited bio is handled first, and then continue to split > the original bio. This patch tries to split the original bio thoroughly, > so that it can be known in advance how many tags will be needed. How do you avoid the deadlock risk with all the split bios being submitted together? But more importantly why does your use case even have splits that get submitted together? Is this a case of Linus' stupidly low default max_sectors when the hardware supports more, or is the hardware limited to a low number of sectors per request? Or do we hit another reason for the split?
On 3/29/22 7:32 AM, Christoph Hellwig wrote: > On Tue, Mar 29, 2022 at 05:40:44PM +0800, Yu Kuai wrote: >> Currently, the splited bio is handled first, and then continue to split >> the original bio. This patch tries to split the original bio thoroughly, >> so that it can be known in advance how many tags will be needed. > > How do you avoid the deadlock risk with all the split bios being > submitted together? That too > But more importantly why does your use case even have splits that get > submitted together? Is this a case of Linus' stupidly low default > max_sectors when the hardware supports more, or is the hardware limited > to a low number of sectors per request? Or do we hit another reason > for the split? See the posted use case, it's running 512kb ios on a 128kb device.
On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote: > > But more importantly why does your use case even have splits that get > > submitted together? Is this a case of Linus' stupidly low default > > max_sectors when the hardware supports more, or is the hardware limited > > to a low number of sectors per request? Or do we hit another reason > > for the split? > > See the posted use case, it's running 512kb ios on a 128kb device. That is an awfully low limit these days. I'm really not sure we should optimize the block layer for that.
On 3/29/22 8:40 AM, Christoph Hellwig wrote: > On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote: >>> But more importantly why does your use case even have splits that get >>> submitted together? Is this a case of Linus' stupidly low default >>> max_sectors when the hardware supports more, or is the hardware limited >>> to a low number of sectors per request? Or do we hit another reason >>> for the split? >> >> See the posted use case, it's running 512kb ios on a 128kb device. > > That is an awfully low limit these days. I'm really not sure we should > optimize the block layer for that. That's exactly what my replies have been saying. I don't think this is a relevant thing to optimize for. Fixing fairness for wakeups seems useful, however.
On Tue, Mar 29, 2022 at 08:41:57AM -0600, Jens Axboe wrote:
> Fixing fairness for wakeups seems useful, however.
Agreed.
On 2022/03/29 20:46, Jens Axboe wrote: > On 3/29/22 3:40 AM, Yu Kuai wrote: >> Currently, the splited bio is handled first, and then continue to split >> the original bio. This patch tries to split the original bio thoroughly, >> so that it can be known in advance how many tags will be needed. > > This makes the bio almost 10% bigger in size, which is NOT something > that is just done casually and without strong reasoning. Hi, Thanks for taking time on this patchset, It's right this way is not appropriate. > > So please provide that, your commit message is completely missing > justification for why this change is useful or necessary. A good > commit always explains WHY the change needs to be made, yours is > simply stating WHAT is being done. The latter can be gleaned from > the code change anyway. > Thanks for the guidance, I'll pay attemtion to that carefully in future. For this patch, what I want to do is to gain information about how many tags will be needed for the big io before getting the first tag, and use that information to optimize wake up path. The problem in this patch is that adding two feilds in bio is a bad move. I was thinking that for segment split, I can get the information by caculating bio segments / queue max segments. Thanks, Kuai
On 2022/03/29 22:41, Jens Axboe wrote: > On 3/29/22 8:40 AM, Christoph Hellwig wrote: >> On Tue, Mar 29, 2022 at 08:35:29AM -0600, Jens Axboe wrote: >>>> But more importantly why does your use case even have splits that get >>>> submitted together? Is this a case of Linus' stupidly low default >>>> max_sectors when the hardware supports more, or is the hardware limited >>>> to a low number of sectors per request? Or do we hit another reason >>>> for the split? >>> >>> See the posted use case, it's running 512kb ios on a 128kb device. Hi, The problem was first found during kernel upgrade(v3.10 to v4.18), and we maintain a series of io performance test suites, and one of the test is fio random rw with large bs. In the environment, the 'max_sectors_kb' is 256kb, and fio bs is 1m. >> >> That is an awfully low limit these days. I'm really not sure we should >> optimize the block layer for that. > > That's exactly what my replies have been saying. I don't think this is > a relevant thing to optimize for. If the use case that large ios get submitted together is not a common issue(probably not since it's been a long time without complaining), I agree that we should not optimize the block layer for that. Thanks, Kuai > > Fixing fairness for wakeups seems useful, however. >
diff --git a/block/bio.c b/block/bio.c index cdd7b2915c53..ac7ce8b4ba42 100644 --- a/block/bio.c +++ b/block/bio.c @@ -258,6 +258,8 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, bio->bi_flags = 0; bio->bi_ioprio = 0; bio->bi_status = 0; + bio->bi_nr_segs = 0; + bio->bi_nr_split = 0; bio->bi_iter.bi_sector = 0; bio->bi_iter.bi_size = 0; bio->bi_iter.bi_idx = 0; diff --git a/block/blk-merge.c b/block/blk-merge.c index 7771dacc99cb..340860746cac 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -309,44 +309,85 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, return bio_split(bio, sectors, GFP_NOIO, bs); } -/** - * __blk_queue_split - split a bio and submit the second half - * @q: [in] request_queue new bio is being queued at - * @bio: [in, out] bio to be split - * @nr_segs: [out] number of segments in the first bio - * - * Split a bio into two bios, chain the two bios, submit the second half and - * store a pointer to the first half in *@bio. If the second bio is still too - * big it will be split by a recursive call to this function. Since this - * function may allocate a new bio from q->bio_split, it is the responsibility - * of the caller to ensure that q->bio_split is only released after processing - * of the split bio has finished. - */ -void __blk_queue_split(struct request_queue *q, struct bio **bio, - unsigned int *nr_segs) +static struct bio *blk_queue_split_one(struct request_queue *q, struct bio *bio) { struct bio *split = NULL; + unsigned int nr_segs = 1; - switch (bio_op(*bio)) { + switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: - split = blk_bio_discard_split(q, *bio, &q->bio_split, nr_segs); + split = blk_bio_discard_split(q, bio, &q->bio_split, &nr_segs); break; case REQ_OP_WRITE_ZEROES: - split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split, - nr_segs); + split = blk_bio_write_zeroes_split(q, bio, &q->bio_split, + &nr_segs); break; default: - split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs); + split = blk_bio_segment_split(q, bio, &q->bio_split, &nr_segs); break; } if (split) { /* there isn't chance to merge the splitted bio */ - split->bi_opf |= REQ_NOMERGE; + split->bi_opf |= (REQ_NOMERGE | REQ_SPLIT); + split->bi_nr_segs = nr_segs; + + bio_chain(split, bio); + trace_block_split(split, bio->bi_iter.bi_sector); + } else { + bio->bi_nr_segs = nr_segs; + } + + return split; +} + +static unsigned short blk_queue_split_all(struct request_queue *q, + struct bio *bio) +{ + struct bio *split = NULL; + struct bio *first = NULL; + unsigned short nr_split = 1; + unsigned short total; - bio_chain(split, *bio); - trace_block_split(split, (*bio)->bi_iter.bi_sector); + if (!current->bio_list) + return 1; + + while ((split = blk_queue_split_one(q, bio))) { + if (!first) + first = split; + + nr_split++; + submit_bio_noacct(split); + } + + total = nr_split; + while (first) { + first->bi_nr_split = --total; + first = first->bi_next; + } + + return nr_split; +} + +/** + * __blk_queue_split - split a bio, store the first and submit others + * @q: [in] request_queue new bio is being queued at + * @bio: [in, out] bio to be split + * + * Split a bio into several bios, chain all the bios, store a pointer to the + * first in *@bio, and submit others. Since this function may allocate a new + * bio from q->bio_split, it is the responsibility of the caller to ensure + * that q->bio_split is only released after processing of the split bio has + * finished. + */ +void __blk_queue_split(struct request_queue *q, struct bio **bio) +{ + struct bio *split = blk_queue_split_one(q, *bio); + + if (split) { + split->bi_nr_split = blk_queue_split_all(q, *bio); + (*bio)->bi_opf |= REQ_SPLIT; submit_bio_noacct(*bio); *bio = split; } @@ -365,10 +406,9 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio, void blk_queue_split(struct bio **bio) { struct request_queue *q = bdev_get_queue((*bio)->bi_bdev); - unsigned int nr_segs; if (blk_may_split(q, *bio)) - __blk_queue_split(q, bio, &nr_segs); + __blk_queue_split(q, bio); } EXPORT_SYMBOL(blk_queue_split); diff --git a/block/blk-mq.c b/block/blk-mq.c index e6f24fa4a4c2..cad207d2079e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2817,8 +2817,11 @@ void blk_mq_submit_bio(struct bio *bio) blk_status_t ret; blk_queue_bounce(q, &bio); - if (blk_may_split(q, bio)) - __blk_queue_split(q, &bio, &nr_segs); + if (blk_may_split(q, bio)) { + if (!(bio->bi_opf & REQ_SPLIT)) + __blk_queue_split(q, &bio); + nr_segs = bio->bi_nr_segs; + } if (!bio_integrity_prep(bio)) return; diff --git a/block/blk.h b/block/blk.h index 8ccbc6e07636..cd478187b525 100644 --- a/block/blk.h +++ b/block/blk.h @@ -303,8 +303,7 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio) bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE; } -void __blk_queue_split(struct request_queue *q, struct bio **bio, - unsigned int *nr_segs); +void __blk_queue_split(struct request_queue *q, struct bio **bio); int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs); bool blk_attempt_req_merge(struct request_queue *q, struct request *rq, diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dd0763a1c674..702f6b83dc88 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -250,6 +250,8 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; + unsigned int bi_nr_segs; + unsigned int bi_nr_split; blk_status_t bi_status; atomic_t __bi_remaining; @@ -416,6 +418,7 @@ enum req_flag_bits { /* for driver use */ __REQ_DRV, __REQ_SWAP, /* swapping request. */ + __REQ_SPLIT, /* io is split */ __REQ_NR_BITS, /* stops here */ }; @@ -440,6 +443,7 @@ enum req_flag_bits { #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) +#define REQ_SPLIT (1ULL << __REQ_SPLIT) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
Currently, the splited bio is handled first, and then continue to split the original bio. This patch tries to split the original bio thoroughly, so that it can be known in advance how many tags will be needed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/bio.c | 2 + block/blk-merge.c | 90 ++++++++++++++++++++++++++++----------- block/blk-mq.c | 7 ++- block/blk.h | 3 +- include/linux/blk_types.h | 4 ++ 5 files changed, 77 insertions(+), 29 deletions(-)