diff mbox

block: loose check on sg gap

Message ID 1481971751-4016-1-git-send-email-ming.lei@canonical.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei Dec. 17, 2016, 10:49 a.m. UTC
If the last bvec of the 1st bio and the 1st bvec of the next
bio are contineous physically, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.

Both Vitaly and Dexuan reported lots of unmergeable small bios
are observed when running mkfs on Hyper-V virtual storage, and
performance becomes quite low, so this patch is figured out for
fixing the performance issue.

The same issue should exist on NVMe too sine it sets virt boundary too.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 include/linux/blkdev.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Jens Axboe Dec. 17, 2016, 4:49 p.m. UTC | #1
On 12/17/2016 03:49 AM, Ming Lei wrote:
> If the last bvec of the 1st bio and the 1st bvec of the next
> bio are contineous physically, and the latter can be merged
> to last segment of the 1st bio, we should think they don't
> violate sg gap(or virt boundary) limit.
> 
> Both Vitaly and Dexuan reported lots of unmergeable small bios
> are observed when running mkfs on Hyper-V virtual storage, and
> performance becomes quite low, so this patch is figured out for
> fixing the performance issue.
> 
> The same issue should exist on NVMe too sine it sets virt boundary too.

It looks pretty reasonable to me. I'll queue it up for some testing,
changes like this always make me a little nervous.
Ming Lei Dec. 20, 2016, 2:07 a.m. UTC | #2
On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:
> On 12/17/2016 03:49 AM, Ming Lei wrote:
>> If the last bvec of the 1st bio and the 1st bvec of the next
>> bio are contineous physically, and the latter can be merged
>> to last segment of the 1st bio, we should think they don't
>> violate sg gap(or virt boundary) limit.
>>
>> Both Vitaly and Dexuan reported lots of unmergeable small bios
>> are observed when running mkfs on Hyper-V virtual storage, and
>> performance becomes quite low, so this patch is figured out for
>> fixing the performance issue.
>>
>> The same issue should exist on NVMe too sine it sets virt boundary too.
>
> It looks pretty reasonable to me. I'll queue it up for some testing,
> changes like this always make me a little nervous.

Understood.

But given it is still in early stage of 4.10 cycle, seems fine to expose
it now, and we should have enough time to fix it if there might be
regressions.

BTW, it passes my xfstest(ext4) over sata/NVMe.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Dec. 20, 2016, 2:31 a.m. UTC | #3
On 12/19/2016 07:07 PM, Ming Lei wrote:
> On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:
>> On 12/17/2016 03:49 AM, Ming Lei wrote:
>>> If the last bvec of the 1st bio and the 1st bvec of the next
>>> bio are contineous physically, and the latter can be merged
>>> to last segment of the 1st bio, we should think they don't
>>> violate sg gap(or virt boundary) limit.
>>>
>>> Both Vitaly and Dexuan reported lots of unmergeable small bios
>>> are observed when running mkfs on Hyper-V virtual storage, and
>>> performance becomes quite low, so this patch is figured out for
>>> fixing the performance issue.
>>>
>>> The same issue should exist on NVMe too sine it sets virt boundary too.
>>
>> It looks pretty reasonable to me. I'll queue it up for some testing,
>> changes like this always make me a little nervous.
> 
> Understood.
> 
> But given it is still in early stage of 4.10 cycle, seems fine to expose
> it now, and we should have enough time to fix it if there might be
> regressions.
> 
> BTW, it passes my xfstest(ext4) over sata/NVMe.

It's been fine here in testing, too. I'm not worried about performance
regressions, those we can always fix. Merging makes me worried about
corruption, and those regressions are much worse.

Any reason we need to rush this? I'd be more comfortable pushing this to
4.11, unless there are strong reasons this should make 4.10.
Dexuan Cui Dec. 20, 2016, 3:41 a.m. UTC | #4
> From: Jens Axboe [mailto:axboe@fb.com]

> Sent: Tuesday, December 20, 2016 10:31

> To: Ming Lei <ming.lei@canonical.com>

> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block <linux-

> block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>; Dexuan Cui

> <decui@microsoft.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Keith Busch

> <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike Christie

> <mchristi@redhat.com>; Martin K. Petersen <martin.petersen@oracle.com>;

> Toshi Kani <toshi.kani@hpe.com>; Dan Williams <dan.j.williams@intel.com>;

> Damien Le Moal <damien.lemoal@hgst.com>

> Subject: Re: [PATCH] block: loose check on sg gap

> 

> On 12/19/2016 07:07 PM, Ming Lei wrote:

> > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:

> >> On 12/17/2016 03:49 AM, Ming Lei wrote:

> >>> If the last bvec of the 1st bio and the 1st bvec of the next

> >>> bio are contineous physically, and the latter can be merged

> >>> to last segment of the 1st bio, we should think they don't

> >>> violate sg gap(or virt boundary) limit.

> >>>

> >>> Both Vitaly and Dexuan reported lots of unmergeable small bios

> >>> are observed when running mkfs on Hyper-V virtual storage, and

> >>> performance becomes quite low, so this patch is figured out for

> >>> fixing the performance issue.

> >>>

> >>> The same issue should exist on NVMe too sine it sets virt boundary too.

> >>

> >> It looks pretty reasonable to me. I'll queue it up for some testing,

> >> changes like this always make me a little nervous.

> >

> > Understood.

> >

> > But given it is still in early stage of 4.10 cycle, seems fine to expose

> > it now, and we should have enough time to fix it if there might be

> > regressions.

> >

> > BTW, it passes my xfstest(ext4) over sata/NVMe.

> 

> It's been fine here in testing, too. I'm not worried about performance

> regressions, those we can always fix. Merging makes me worried about

> corruption, and those regressions are much worse.

> 

> Any reason we need to rush this? I'd be more comfortable pushing this to

> 4.11, unless there are strong reasons this should make 4.10.

> 

> --

> Jens Axboe


Hi Jens,

As far as I know, the patch is important to popular Linux distros,
e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on 
Hyper-V/Azure, because they can suffer from a pretty bad throughput/latency
in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but
with the patch, it only takes 1 second.

Thanks,
-- Dexuan
Dexuan Cui Jan. 11, 2017, 5:10 a.m. UTC | #5
> From: Dexuan Cui

> Sent: Tuesday, December 20, 2016 11:41

> To: 'Jens Axboe' <axboe@fb.com>; Ming Lei <ming.lei@canonical.com>

> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block

> <linux-block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>;

> Vitaly Kuznetsov <vkuznets@redhat.com>; Keith Busch

> <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike Christie

> <mchristi@redhat.com>; Martin K. Petersen <martin.petersen@oracle.com>;

> Toshi Kani <toshi.kani@hpe.com>; Dan Williams <dan.j.williams@intel.com>;

> Damien Le Moal <damien.lemoal@hgst.com>

> Subject: RE: [PATCH] block: loose check on sg gap

> 

> > From: Jens Axboe [mailto:axboe@fb.com]

> > Sent: Tuesday, December 20, 2016 10:31

> > To: Ming Lei <ming.lei@canonical.com>

> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block

> <linux-

> > block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>; Dexuan

> Cui

> > <decui@microsoft.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Keith

> Busch

> > <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike Christie

> > <mchristi@redhat.com>; Martin K. Petersen

> <martin.petersen@oracle.com>;

> > Toshi Kani <toshi.kani@hpe.com>; Dan Williams

> <dan.j.williams@intel.com>;

> > Damien Le Moal <damien.lemoal@hgst.com>

> > Subject: Re: [PATCH] block: loose check on sg gap

> >

> > On 12/19/2016 07:07 PM, Ming Lei wrote:

> > > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:

> > >> On 12/17/2016 03:49 AM, Ming Lei wrote:

> > >>> If the last bvec of the 1st bio and the 1st bvec of the next

> > >>> bio are contineous physically, and the latter can be merged

> > >>> to last segment of the 1st bio, we should think they don't

> > >>> violate sg gap(or virt boundary) limit.

> > >>>

> > >>> Both Vitaly and Dexuan reported lots of unmergeable small bios

> > >>> are observed when running mkfs on Hyper-V virtual storage, and

> > >>> performance becomes quite low, so this patch is figured out for

> > >>> fixing the performance issue.

> > >>>

> > >>> The same issue should exist on NVMe too sine it sets virt boundary

> too.

> > >>

> > >> It looks pretty reasonable to me. I'll queue it up for some testing,

> > >> changes like this always make me a little nervous.

> > >

> > > Understood.

> > >

> > > But given it is still in early stage of 4.10 cycle, seems fine to expose

> > > it now, and we should have enough time to fix it if there might be

> > > regressions.

> > >

> > > BTW, it passes my xfstest(ext4) over sata/NVMe.

> >

> > It's been fine here in testing, too. I'm not worried about performance

> > regressions, those we can always fix. Merging makes me worried about

> > corruption, and those regressions are much worse.

> >

> > Any reason we need to rush this? I'd be more comfortable pushing this to

> > 4.11, unless there are strong reasons this should make 4.10.

> >

> > --

> > Jens Axboe

> 

> Hi Jens,

> 

> As far as I know, the patch is important to popular Linux distros,

> e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on

> Hyper-V/Azure, because they can suffer from a pretty bad

> throughput/latency

> in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but

> with the patch, it only takes 1 second.

> 

> -- Dexuan


Hi Ming, Jens,
Did you find any issue later when testing with the patch?

May I know if it's possible to have it in 4.10 considering the above impact?

Is it on some temporary branch of linux-block.git? Looks not.

Thanks,
-- Dexuan
Ming Lei Jan. 12, 2017, 2:54 a.m. UTC | #6
On Wed, Jan 11, 2017 at 1:10 PM, Dexuan Cui <decui@microsoft.com> wrote:
>> From: Dexuan Cui
>> Sent: Tuesday, December 20, 2016 11:41
>> To: 'Jens Axboe' <axboe@fb.com>; Ming Lei <ming.lei@canonical.com>
>> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block
>> <linux-block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>;
>> Vitaly Kuznetsov <vkuznets@redhat.com>; Keith Busch
>> <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike Christie
>> <mchristi@redhat.com>; Martin K. Petersen <martin.petersen@oracle.com>;
>> Toshi Kani <toshi.kani@hpe.com>; Dan Williams <dan.j.williams@intel.com>;
>> Damien Le Moal <damien.lemoal@hgst.com>
>> Subject: RE: [PATCH] block: loose check on sg gap
>>
>> > From: Jens Axboe [mailto:axboe@fb.com]
>> > Sent: Tuesday, December 20, 2016 10:31
>> > To: Ming Lei <ming.lei@canonical.com>
>> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block
>> <linux-
>> > block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>; Dexuan
>> Cui
>> > <decui@microsoft.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Keith
>> Busch
>> > <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike Christie
>> > <mchristi@redhat.com>; Martin K. Petersen
>> <martin.petersen@oracle.com>;
>> > Toshi Kani <toshi.kani@hpe.com>; Dan Williams
>> <dan.j.williams@intel.com>;
>> > Damien Le Moal <damien.lemoal@hgst.com>
>> > Subject: Re: [PATCH] block: loose check on sg gap
>> >
>> > On 12/19/2016 07:07 PM, Ming Lei wrote:
>> > > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:
>> > >> On 12/17/2016 03:49 AM, Ming Lei wrote:
>> > >>> If the last bvec of the 1st bio and the 1st bvec of the next
>> > >>> bio are contineous physically, and the latter can be merged
>> > >>> to last segment of the 1st bio, we should think they don't
>> > >>> violate sg gap(or virt boundary) limit.
>> > >>>
>> > >>> Both Vitaly and Dexuan reported lots of unmergeable small bios
>> > >>> are observed when running mkfs on Hyper-V virtual storage, and
>> > >>> performance becomes quite low, so this patch is figured out for
>> > >>> fixing the performance issue.
>> > >>>
>> > >>> The same issue should exist on NVMe too sine it sets virt boundary
>> too.
>> > >>
>> > >> It looks pretty reasonable to me. I'll queue it up for some testing,
>> > >> changes like this always make me a little nervous.
>> > >
>> > > Understood.
>> > >
>> > > But given it is still in early stage of 4.10 cycle, seems fine to expose
>> > > it now, and we should have enough time to fix it if there might be
>> > > regressions.
>> > >
>> > > BTW, it passes my xfstest(ext4) over sata/NVMe.
>> >
>> > It's been fine here in testing, too. I'm not worried about performance
>> > regressions, those we can always fix. Merging makes me worried about
>> > corruption, and those regressions are much worse.
>> >
>> > Any reason we need to rush this? I'd be more comfortable pushing this to
>> > 4.11, unless there are strong reasons this should make 4.10.
>> >
>> > --
>> > Jens Axboe
>>
>> Hi Jens,
>>
>> As far as I know, the patch is important to popular Linux distros,
>> e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on
>> Hyper-V/Azure, because they can suffer from a pretty bad
>> throughput/latency
>> in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but
>> with the patch, it only takes 1 second.
>>
>> -- Dexuan
>
> Hi Ming, Jens,
> Did you find any issue later when testing with the patch?
>
> May I know if it's possible to have it in 4.10 considering the above impact?
>
> Is it on some temporary branch of linux-block.git? Looks not.

Dexuan, Jens has said that this patch may land v4.11, so just wait a release
and let it expose into more tests.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dexuan Cui Jan. 12, 2017, 3:14 a.m. UTC | #7
> From: Ming Lei [mailto:ming.lei@canonical.com]

> Sent: Thursday, January 12, 2017 10:54

> To: Dexuan Cui <decui@microsoft.com>

> Cc: Jens Axboe <axboe@fb.com>; Linux Kernel Mailing List <linux-

> kernel@vger.kernel.org>; linux-block <linux-block@vger.kernel.org>;

> Christoph Hellwig <hch@infradead.org>; Vitaly Kuznetsov

> <vkuznets@redhat.com>; Keith Busch <keith.busch@intel.com>; Hannes

> Reinecke <hare@suse.de>; Mike Christie <mchristi@redhat.com>; Martin K.

> Petersen <martin.petersen@oracle.com>; Toshi Kani <toshi.kani@hpe.com>;

> Dan Williams <dan.j.williams@intel.com>; Damien Le Moal

> <damien.lemoal@hgst.com>; KY Srinivasan <kys@microsoft.com>

> Subject: Re: [PATCH] block: loose check on sg gap

> 

> On Wed, Jan 11, 2017 at 1:10 PM, Dexuan Cui <decui@microsoft.com> wrote:

> >> From: Dexuan Cui

> >> Sent: Tuesday, December 20, 2016 11:41

> >> To: 'Jens Axboe' <axboe@fb.com>; Ming Lei <ming.lei@canonical.com>

> >> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block

> >> <linux-block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>;

> >> Vitaly Kuznetsov <vkuznets@redhat.com>; Keith Busch

> >> <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike

> Christie

> >> <mchristi@redhat.com>; Martin K. Petersen

> <martin.petersen@oracle.com>;

> >> Toshi Kani <toshi.kani@hpe.com>; Dan Williams

> <dan.j.williams@intel.com>;

> >> Damien Le Moal <damien.lemoal@hgst.com>

> >> Subject: RE: [PATCH] block: loose check on sg gap

> >>

> >> > From: Jens Axboe [mailto:axboe@fb.com]

> >> > Sent: Tuesday, December 20, 2016 10:31

> >> > To: Ming Lei <ming.lei@canonical.com>

> >> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-block

> >> <linux-

> >> > block@vger.kernel.org>; Christoph Hellwig <hch@infradead.org>;

> Dexuan

> >> Cui

> >> > <decui@microsoft.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;

> Keith

> >> Busch

> >> > <keith.busch@intel.com>; Hannes Reinecke <hare@suse.de>; Mike

> Christie

> >> > <mchristi@redhat.com>; Martin K. Petersen

> >> <martin.petersen@oracle.com>;

> >> > Toshi Kani <toshi.kani@hpe.com>; Dan Williams

> >> <dan.j.williams@intel.com>;

> >> > Damien Le Moal <damien.lemoal@hgst.com>

> >> > Subject: Re: [PATCH] block: loose check on sg gap

> >> >

> >> > On 12/19/2016 07:07 PM, Ming Lei wrote:

> >> > > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <axboe@fb.com> wrote:

> >> > >> On 12/17/2016 03:49 AM, Ming Lei wrote:

> >> > >>> If the last bvec of the 1st bio and the 1st bvec of the next

> >> > >>> bio are contineous physically, and the latter can be merged

> >> > >>> to last segment of the 1st bio, we should think they don't

> >> > >>> violate sg gap(or virt boundary) limit.

> >> > >>>

> >> > >>> Both Vitaly and Dexuan reported lots of unmergeable small bios

> >> > >>> are observed when running mkfs on Hyper-V virtual storage, and

> >> > >>> performance becomes quite low, so this patch is figured out for

> >> > >>> fixing the performance issue.

> >> > >>>

> >> > >>> The same issue should exist on NVMe too sine it sets virt boundary

> >> too.

> >> > >>

> >> > >> It looks pretty reasonable to me. I'll queue it up for some testing,

> >> > >> changes like this always make me a little nervous.

> >> > >

> >> > > Understood.

> >> > >

> >> > > But given it is still in early stage of 4.10 cycle, seems fine to expose

> >> > > it now, and we should have enough time to fix it if there might be

> >> > > regressions.

> >> > >

> >> > > BTW, it passes my xfstest(ext4) over sata/NVMe.

> >> >

> >> > It's been fine here in testing, too. I'm not worried about performance

> >> > regressions, those we can always fix. Merging makes me worried about

> >> > corruption, and those regressions are much worse.

> >> >

> >> > Any reason we need to rush this? I'd be more comfortable pushing this

> to

> >> > 4.11, unless there are strong reasons this should make 4.10.

> >> >

> >> > --

> >> > Jens Axboe

> >>

> >> Hi Jens,

> >>

> >> As far as I know, the patch is important to popular Linux distros,

> >> e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on

> >> Hyper-V/Azure, because they can suffer from a pretty bad

> >> throughput/latency

> >> in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes,

> but

> >> with the patch, it only takes 1 second.

> >>

> >> -- Dexuan

> >

> > Hi Ming, Jens,

> > Did you find any issue later when testing with the patch?

> >

> > May I know if it's possible to have it in 4.10 considering the above impact?

> >

> > Is it on some temporary branch of linux-block.git? Looks not.

> 

> Dexuan, Jens has said that this patch may land v4.11, so just wait a release

> and let it expose into more tests.

> 

> Thanks,

> Ming


Thanks for the reply!

Sorry, I didn't mean to be pushy -- I just wanted to get more idea about the
status of the patch, since I'm unfamiliar with the linux-block repo. :-)

BTW, I've been using the patch for ~1 month and I didn't get any issue.

Thanks,
-- Dexuan
diff mbox

Patch

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 286b2a264383..1ce26e771bcc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1608,6 +1608,25 @@  static inline bool bvec_gap_to_prev(struct request_queue *q,
 	return __bvec_gap_to_prev(q, bprv, offset);
 }
 
+/*
+ * Check if the two bvecs from two bios can be merged to one segment.
+ * If yes, no need to check gap between the two bios since the 1st bio
+ * and the 1st bvec in the 2nd bio can be handled in one segment.
+ */
+static inline bool bios_segs_mergeable(struct request_queue *q,
+		struct bio *prev, struct bio_vec *prev_last_bv,
+		struct bio_vec *next_first_bv)
+{
+	if (!BIOVEC_PHYS_MERGEABLE(prev_last_bv, next_first_bv))
+		return false;
+	if (!BIOVEC_SEG_BOUNDARY(q, prev_last_bv, next_first_bv))
+		return false;
+	if (prev->bi_seg_back_size + next_first_bv->bv_len >
+			queue_max_segment_size(q))
+		return false;
+	return true;
+}
+
 static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
 			 struct bio *next)
 {
@@ -1617,7 +1636,8 @@  static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
 		bio_get_last_bvec(prev, &pb);
 		bio_get_first_bvec(next, &nb);
 
-		return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
+		if (!bios_segs_mergeable(q, prev, &pb, &nb))
+			return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
 	}
 
 	return false;