diff mbox

[RFC] block: fix bio merge checks when virt_boundary is set

Message ID CACVXFVO37O2Yp60E82U_YWCe2yUqsEn1ojMb6kpTDmhBk94dQA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei March 30, 2016, 1:07 p.m. UTC
On Fri, Mar 18, 2016 at 10:59 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Fri, Mar 18, 2016 at 12:39 AM, Keith Busch <keith.busch@intel.com> wrote:
>> On Thu, Mar 17, 2016 at 12:20:28PM +0100, Vitaly Kuznetsov wrote:
>>> Keith Busch <keith.busch@intel.com> writes:
>>> > been combined. In any case, I think you can get what you're after just
>>> > by moving the gap check after BIOVEC_PHYS_MERGABLE. Does the following
>>> > look ok to you?
>>> >
>>>
>>> Thanks, it does.
>>
>> Cool, thanks for confirming.
>>
>>> Will you send it or would you like me to do that with your Suggested-by?
>>
>> I'm not confident yet this doesn't break anything, particularly since
>> we moved the gap check after the length check. Just wanted to confirm
>> the concept addressed your concern, but still need to take a closer look
>> and test before submitting.
>
> IMO, the change on blk_bio_segment_split() is correct, because actually it
> is a sg gap and the check should have been done between segments
> instead of bvecs. So it is reasonable to move the check just before populating
> a new segment.

Thinking of the 1st part change further, looks it is just correct in concept,
but wrong from current implementation. Because of bios/reqs merge,
blk_rq_map_sg() may end one segment in any bvec in theroy, so I guess
that is why each non-1st bvec need the check to make sure no sg gap.
Looks a very crazy limit, :-)

>
> But for the 2nd change in bio_will_gap(), which should fix Vitaly's problem, I
> am still not sure if it is completely correct. bio_will_gap() is used
> to check if two
> bios may be merged. Suppose two bios are continues physically, the last bvec
> in 1st bio and the first bvec in 2nd bio might not be in one same segment
> because of segment size limit.

How about the attached patch?


>
> The root cause might be from blkdev_writepage(), and I guess these small
> bios are from there.
>
> thanks,
> Ming Lei

Comments

Vitaly Kuznetsov April 20, 2016, 1:48 p.m. UTC | #1
Ming Lei <tom.leiming@gmail.com> writes:

> On Fri, Mar 18, 2016 at 10:59 AM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Fri, Mar 18, 2016 at 12:39 AM, Keith Busch <keith.busch@intel.com> wrote:
>>> On Thu, Mar 17, 2016 at 12:20:28PM +0100, Vitaly Kuznetsov wrote:
>>>> Keith Busch <keith.busch@intel.com> writes:
>>>> > been combined. In any case, I think you can get what you're after just
>>>> > by moving the gap check after BIOVEC_PHYS_MERGABLE. Does the following
>>>> > look ok to you?
>>>> >
>>>>
>>>> Thanks, it does.
>>>
>>> Cool, thanks for confirming.
>>>
>>>> Will you send it or would you like me to do that with your Suggested-by?
>>>
>>> I'm not confident yet this doesn't break anything, particularly since
>>> we moved the gap check after the length check. Just wanted to confirm
>>> the concept addressed your concern, but still need to take a closer look
>>> and test before submitting.
>>
>> IMO, the change on blk_bio_segment_split() is correct, because actually it
>> is a sg gap and the check should have been done between segments
>> instead of bvecs. So it is reasonable to move the check just before populating
>> a new segment.
>
> Thinking of the 1st part change further, looks it is just correct in concept,
> but wrong from current implementation. Because of bios/reqs merge,
> blk_rq_map_sg() may end one segment in any bvec in theroy, so I guess
> that is why each non-1st bvec need the check to make sure no sg gap.
> Looks a very crazy limit, :-)
>
>>
>> But for the 2nd change in bio_will_gap(), which should fix Vitaly's problem, I
>> am still not sure if it is completely correct. bio_will_gap() is used
>> to check if two
>> bios may be merged. Suppose two bios are continues physically, the last bvec
>> in 1st bio and the first bvec in 2nd bio might not be in one same segment
>> because of segment size limit.
>
> How about the attached patch?
>

I just wanted to revive the discussion as the issue persists. I
re-tested your patch against 4.6-rc4 and it efficiently solves the
issue.

pre-patch:
# time mkfs.ntfs /dev/sdb1
Cluster size has been automatically set to 4096 bytes.
Initializing device with zeroes: 100% - Done.
Creating NTFS volume structures.
mkntfs completed successfully. Have a nice day.

real8m10.977s
user0m0.115s
sys0m12.672s

post-patch:
# time mkfs.ntfs /dev/sdb1
Cluster size has been automatically set to 4096 bytes.
Initializing device with zeroes: 100% - Done.
Creating NTFS volume structures.
mkntfs completed successfully. Have a nice day.

real0m42.430s
user0m0.171s
sys0m7.675s

Will you send this patch? Please let me know if I can further
assist. Thanks!
Dexuan Cui Dec. 15, 2016, 2:03 p.m. UTC | #2
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Vitaly Kuznetsov
> Sent: Wednesday, April 20, 2016 21:48
> To: Ming Lei <tom.leiming@gmail.com>
> Cc: Keith Busch <keith.busch@intel.com>; linux-block@vger.kernel.org; Linux
> Kernel Mailing List <linux-kernel@vger.kernel.org>; Jens Axboe
> <axboe@kernel.dk>; Dan Williams <dan.j.williams@intel.com>; Martin K.
> Petersen <martin.petersen@oracle.com>; Sagi Grimberg
> <sagig@mellanox.com>; Mike Snitzer <snitzer@redhat.com>; KY Srinivasan
> <kys@microsoft.com>; Cathy Avery <cavery@redhat.com>
> Subject: Re: [PATCH RFC] block: fix bio merge checks when virt_boundary is set
> 
> Ming Lei <tom.leiming@gmail.com> writes:
> 
> > On Fri, Mar 18, 2016 at 10:59 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> >> On Fri, Mar 18, 2016 at 12:39 AM, Keith Busch <keith.busch@intel.com>
> wrote:
> >>> On Thu, Mar 17, 2016 at 12:20:28PM +0100, Vitaly Kuznetsov wrote:
> >>>> Keith Busch <keith.busch@intel.com> writes:
> >>>> > been combined. In any case, I think you can get what you're after just
> >>>> > by moving the gap check after BIOVEC_PHYS_MERGABLE. Does the
> following
> >>>> > look ok to you?
> >>>> >
> >>>>
> >>>> Thanks, it does.
> >>>
> >>> Cool, thanks for confirming.
> >>>
> >>>> Will you send it or would you like me to do that with your Suggested-by?
> >>>
> >>> I'm not confident yet this doesn't break anything, particularly since
> >>> we moved the gap check after the length check. Just wanted to confirm
> >>> the concept addressed your concern, but still need to take a closer look
> >>> and test before submitting.
> >>
> >> IMO, the change on blk_bio_segment_split() is correct, because actually it
> >> is a sg gap and the check should have been done between segments
> >> instead of bvecs. So it is reasonable to move the check just before populating
> >> a new segment.
> >
> > Thinking of the 1st part change further, looks it is just correct in concept,
> > but wrong from current implementation. Because of bios/reqs merge,
> > blk_rq_map_sg() may end one segment in any bvec in theroy, so I guess
> > that is why each non-1st bvec need the check to make sure no sg gap.
> > Looks a very crazy limit, :-)
> >
> >>
> >> But for the 2nd change in bio_will_gap(), which should fix Vitaly's problem, I
> >> am still not sure if it is completely correct. bio_will_gap() is used
> >> to check if two
> >> bios may be merged. Suppose two bios are continues physically, the last bvec
> >> in 1st bio and the first bvec in 2nd bio might not be in one same segment
> >> because of segment size limit.
> >
> > How about the attached patch?
> >
> 
> I just wanted to revive the discussion as the issue persists. I
> re-tested your patch against 4.6-rc4 and it efficiently solves the
> issue.
> 
> pre-patch:
> # time mkfs.ntfs /dev/sdb1
> Cluster size has been automatically set to 4096 bytes.
> Initializing device with zeroes: 100% - Done.
> Creating NTFS volume structures.
> mkntfs completed successfully. Have a nice day.
> 
> real8m10.977s
> user0m0.115s
> sys0m12.672s
> 
> post-patch:
> # time mkfs.ntfs /dev/sdb1
> Cluster size has been automatically set to 4096 bytes.
> Initializing device with zeroes: 100% - Done.
> Creating NTFS volume structures.
> mkntfs completed successfully. Have a nice day.
> 
> real0m42.430s
> user0m0.171s
> sys0m7.675s
> 
> Will you send this patch? Please let me know if I can further
> assist. Thanks!
> 
> --
>   Vitaly

Hi, I'm reviving the thread because I'm suffering from exactly the same issue.
This is the thread I created today: 
"Big I/O requests are split into small ones due to unaligned ext4 partition boundary?"
http://marc.info/?t=148180346100002&r=1&w=2

Ming's patch can fix this issue for me. 

Stable 4.4 and later are affected too.
I didn't check 4.3.x kernels, but for Linux guest on Hyper-V, any kernel with the
patch "storvsc: get rid of bounce buffer"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=81988a0e6b031bc80da15257201810ddcf989e64
should be affected.

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 5f60ae1d686f025445fdf09f546d4d055d255ce9 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Fri, 18 Mar 2016 12:41:53 +0800
Subject: [PATCH] block: loose check on sg gap

If the last bvec of the 1st bio and the 1st bvec of the next
bio are contineous physically, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.

Vitaly reported lots of unmergeable small bios are observed
when running mkfs.ntfs on Hyper-V virtual storage, and performance
becomes quite low, so this patch is figured out for fix the
performance issue.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/linux/blkdev.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7e5d7e0..3962527 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1394,6 +1394,25 @@  static inline bool bvec_gap_to_prev(struct request_queue *q,
 	return __bvec_gap_to_prev(q, bprv, offset);
 }
 
+/*
+ * Check if the two bvecs from two bios can be merged to one segment.
+ * If yes, no need to check gap between the two bios since the 1st bio
+ * and the 1st bvec in the 2nd bio can be handled in one segment.
+ */
+static inline bool bios_segs_mergeable(struct request_queue *q,
+		struct bio *prev, struct bio_vec *prev_last_bv,
+		struct bio_vec *next_first_bv)
+{
+	if (!BIOVEC_PHYS_MERGEABLE(prev_last_bv, next_first_bv))
+		return false;
+	if (!BIOVEC_SEG_BOUNDARY(q, prev_last_bv, next_first_bv))
+		return false;
+	if (prev->bi_seg_back_size + next_first_bv->bv_len >
+			queue_max_segment_size(q))
+		return false;
+	return true;
+}
+
 static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
 			 struct bio *next)
 {
@@ -1403,7 +1422,8 @@  static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
 		bio_get_last_bvec(prev, &pb);
 		bio_get_first_bvec(next, &nb);
 
-		return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
+		if (!bios_segs_mergeable(q, prev, &pb, &nb))
+			return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
 	}
 
 	return false;
-- 
1.9.1