diff mbox

[1/4] block: bio: introduce helpers to get the 1st and last bvec

Message ID 20160215174212.648098b0@tom-T450 (mailing list archive)
State New, archived
Headers show

Commit Message

Ming Lei Feb. 15, 2016, 9:42 a.m. UTC
Hi,

On Mon, 15 Feb 2016 10:19:49 +0200
Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:

> 
> > +/*
> > + * bio_get_last_bvec() is introduced to get the last bvec of one
> > + * bio for bio_will_gap().
> > + *
> > + * TODO: make it more efficient.
> > + */
> > +static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
> > +{
> > +	struct bvec_iter iter;
> > +
> > +	bio_for_each_segment(*bv, bio, iter)
> > +		if (bv->bv_len == iter.bi_size)
> > +			break;
> > +}
> 
> This helper is used for each req/bio once or more. I'd say

No, the helper is only used for the non-splitted BIO, and all
splitted BIO is marked as non-merge.

> it's critical to make it efficient and not settle for
> a quick bail for drivers that don't have a virt_boundary
> like you did in patch #2.

Cc Kent and Keith.

Follows another version which should be more efficient.
Kent and Keith, I appreciate much if you may give a review on it.



> 
> However, given that it's a regression bug fix I'm not sure it's the best
> idea to add logic here.

But the issue is obviously in bio_will_gap(), isn't it?

Simply reverting 52cc6eead9095(block: blk-merge: fast-clone bio when splitting rw bios)
still might cause performance regression too.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sagi Grimberg Feb. 15, 2016, 8:06 p.m. UTC | #1
> Cc Kent and Keith.
>
> Follows another version which should be more efficient.
> Kent and Keith, I appreciate much if you may give a review on it.
>
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 56d2db8..ef45fec 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -278,11 +278,21 @@ static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
>    */
>   static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>   {
> -	struct bvec_iter iter;
> +	struct bvec_iter iter = bio->bi_iter;
> +	int idx;
> +
> +	bio_advance_iter(bio, &iter, iter.bi_size);
> +
> +	WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
> +
> +	if (!iter.bi_bvec_done)
> +		idx = iter.bi_idx - 1;
> +	else	/* in the middle of bvec */
> +		idx = iter.bi_idx;
>
> -	bio_for_each_segment(*bv, bio, iter)
> -		if (bv->bv_len == iter.bi_size)
> -			break;
> +	*bv = bio->bi_io_vec[idx];
> +	if (iter.bi_bvec_done)
> +		bv->bv_len = iter.bi_bvec_done;
>   }
>
>   /*
>

This looks good too.

>
>>
>> However, given that it's a regression bug fix I'm not sure it's the best
>> idea to add logic here.
>
> But the issue is obviously in bio_will_gap(), isn't it?
>
> Simply reverting 52cc6eead9095(block: blk-merge: fast-clone bio when splitting rw bios)
> still might cause performance regression too.

That's correct. I assume that the bio splitting code affects
specific I/O pattern (gappy), however bio_will_gap is also tested
for bio merges (even if the bios won't merge eventually). This means
that each merge check will invoke bio_advance_iter() which is something
I'd like to avoid...
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Feb. 16, 2016, 1:03 p.m. UTC | #2
On Tue, Feb 16, 2016 at 4:06 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>
>> Cc Kent and Keith.
>>
>> Follows another version which should be more efficient.
>> Kent and Keith, I appreciate much if you may give a review on it.
>>
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 56d2db8..ef45fec 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -278,11 +278,21 @@ static inline void bio_get_first_bvec(struct bio
>> *bio, struct bio_vec *bv)
>>    */
>>   static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec
>> *bv)
>>   {
>> -       struct bvec_iter iter;
>> +       struct bvec_iter iter = bio->bi_iter;
>> +       int idx;
>> +
>> +       bio_advance_iter(bio, &iter, iter.bi_size);
>> +
>> +       WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
>> +
>> +       if (!iter.bi_bvec_done)
>> +               idx = iter.bi_idx - 1;
>> +       else    /* in the middle of bvec */
>> +               idx = iter.bi_idx;
>>
>> -       bio_for_each_segment(*bv, bio, iter)
>> -               if (bv->bv_len == iter.bi_size)
>> -                       break;
>> +       *bv = bio->bi_io_vec[idx];
>> +       if (iter.bi_bvec_done)
>> +               bv->bv_len = iter.bi_bvec_done;
>>   }
>>
>>   /*
>>
>
> This looks good too.
>
>>
>>>
>>> However, given that it's a regression bug fix I'm not sure it's the best
>>> idea to add logic here.
>>
>>
>> But the issue is obviously in bio_will_gap(), isn't it?
>>
>> Simply reverting 52cc6eead9095(block: blk-merge: fast-clone bio when
>> splitting rw bios)
>> still might cause performance regression too.
>
>
> That's correct. I assume that the bio splitting code affects
> specific I/O pattern (gappy), however bio_will_gap is also tested

I don't understand why bio splitting affects specific I/O pattern, could you
explain a bit?

From commit b54ffb73c(block: remove bio_get_nr_vecs()), the upper
layer(fs, dm, dio,...) creates bio with its max size, and splitting should
be triggered easily.

> for bio merges (even if the bios won't merge eventually). This means

As I mentioned, bio_will_gap() is only called for non-splitted bio.

> that each merge check will invoke bio_advance_iter() which is something
> I'd like to avoid...

One idea is to use original way to compute the last bvec for non-cloned
bio, and use the approach in this patch for cloned bio(often splitted bio).
I will take this way in v1 if no one objects.

thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elliott, Robert (Servers) Feb. 17, 2016, 3:08 a.m. UTC | #3
> -----Original Message-----
> From: linux-block-owner@vger.kernel.org [mailto:linux-block-
> owner@vger.kernel.org] On Behalf Of Ming Lei
> Sent: Monday, February 15, 2016 3:42 AM
> Subject: Re: [PATCH 1/4] block: bio: introduce helpers to get the 1st and
> last bvec
...
> diff --git a/include/linux/bio.h b/include/linux/bio.h
...
>  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>  {
> -	struct bvec_iter iter;
> +	struct bvec_iter iter = bio->bi_iter;
> +	int idx;
> +
> +	bio_advance_iter(bio, &iter, iter.bi_size);
> +
> +	WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);

If this ever did trigger, I don't think you'd want it for every bio
with a problem.  WARN_ONCE would be safer.

---
Robert Elliott, HPE Persistent Memory

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Feb. 18, 2016, 4:24 a.m. UTC | #4
On Mon, Feb 15, 2016 at 05:42:12PM +0800, Ming Lei wrote:
> Cc Kent and Keith.
> 
> Follows another version which should be more efficient.
> Kent and Keith, I appreciate much if you may give a review on it.
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 56d2db8..ef45fec 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -278,11 +278,21 @@ static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
>   */
>  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>  {
> -	struct bvec_iter iter;
> +	struct bvec_iter iter = bio->bi_iter;
> +	int idx;
> +
> +	bio_advance_iter(bio, &iter, iter.bi_size);
> +
> +	WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
> +
> +	if (!iter.bi_bvec_done)
> +		idx = iter.bi_idx - 1;
> +	else	/* in the middle of bvec */
> +		idx = iter.bi_idx;
>  
> -	bio_for_each_segment(*bv, bio, iter)
> -		if (bv->bv_len == iter.bi_size)
> -			break;
> +	*bv = bio->bi_io_vec[idx];
> +	if (iter.bi_bvec_done)
> +		bv->bv_len = iter.bi_bvec_done;
>  }

It can't be done correctly without a loop.

The reason is that if the bio was split in the middle of a segment, bv->bv_len
on the last biovec will be larger than what's actually used by the bio (it's
being shared between the two splits!).

You have to iterate over all the biovecs so that you can see where
bi_iter->bi_size ends.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Feb. 18, 2016, 6:16 a.m. UTC | #5
Hi Kent,

Thanks for your review.

On Thu, Feb 18, 2016 at 12:24 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Mon, Feb 15, 2016 at 05:42:12PM +0800, Ming Lei wrote:
>> Cc Kent and Keith.
>>
>> Follows another version which should be more efficient.
>> Kent and Keith, I appreciate much if you may give a review on it.
>>
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 56d2db8..ef45fec 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -278,11 +278,21 @@ static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
>>   */
>>  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>>  {
>> -     struct bvec_iter iter;
>> +     struct bvec_iter iter = bio->bi_iter;
>> +     int idx;
>> +
>> +     bio_advance_iter(bio, &iter, iter.bi_size);
>> +
>> +     WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
>> +
>> +     if (!iter.bi_bvec_done)
>> +             idx = iter.bi_idx - 1;
>> +     else    /* in the middle of bvec */
>> +             idx = iter.bi_idx;
>>
>> -     bio_for_each_segment(*bv, bio, iter)
>> -             if (bv->bv_len == iter.bi_size)
>> -                     break;
>> +     *bv = bio->bi_io_vec[idx];
>> +     if (iter.bi_bvec_done)
>> +             bv->bv_len = iter.bi_bvec_done;
>>  }
>
> It can't be done correctly without a loop.

As we discussed in gtalk, the only case this patch can't cope with
is that one single bvec doesn't use up the remained io vector,
but it can be handled by putting the following code at the
function entry:

if (!bio_multiple_segments(bio)) {
       *bv = bio_iovec(bio);
       return;
}

>
> The reason is that if the bio was split in the middle of a segment, bv->bv_len
> on the last biovec will be larger than what's actually used by the bio (it's
> being shared between the two splits!).

The last two lines in this helper should handle the situation.

>
> You have to iterate over all the biovecs so that you can see where
> bi_iter->bi_size ends.

I understand your concern is that this patch may not be much more
efficient than bio_for_each_segment().

IMO, one win of the patch is that 16bytes bvec copy is saved for all
vectors, and another 'win' is to just run bvec_iter_advance() once(
like move the outside for loop inside).

I will run some benchmark to see if there is any performance
difference between the two patches.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei Feb. 19, 2016, 1:47 a.m. UTC | #6
Hi Guys,

On Thu, Feb 18, 2016 at 2:16 PM, Ming Lei <ming.lei@canonical.com> wrote:
> Hi Kent,
>
> Thanks for your review.
>
> On Thu, Feb 18, 2016 at 12:24 PM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
>> On Mon, Feb 15, 2016 at 05:42:12PM +0800, Ming Lei wrote:
>>> Cc Kent and Keith.
>>>
>>> Follows another version which should be more efficient.
>>> Kent and Keith, I appreciate much if you may give a review on it.
>>>
>>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>>> index 56d2db8..ef45fec 100644
>>> --- a/include/linux/bio.h
>>> +++ b/include/linux/bio.h
>>> @@ -278,11 +278,21 @@ static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
>>>   */
>>>  static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
>>>  {
>>> -     struct bvec_iter iter;
>>> +     struct bvec_iter iter = bio->bi_iter;
>>> +     int idx;
>>> +
>>> +     bio_advance_iter(bio, &iter, iter.bi_size);
>>> +
>>> +     WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
>>> +
>>> +     if (!iter.bi_bvec_done)
>>> +             idx = iter.bi_idx - 1;
>>> +     else    /* in the middle of bvec */
>>> +             idx = iter.bi_idx;
>>>
>>> -     bio_for_each_segment(*bv, bio, iter)
>>> -             if (bv->bv_len == iter.bi_size)
>>> -                     break;
>>> +     *bv = bio->bi_io_vec[idx];
>>> +     if (iter.bi_bvec_done)
>>> +             bv->bv_len = iter.bi_bvec_done;
>>>  }
>>
>> It can't be done correctly without a loop.
>
> As we discussed in gtalk, the only case this patch can't cope with
> is that one single bvec doesn't use up the remained io vector,
> but it can be handled by putting the following code at the
> function entry:
>
> if (!bio_multiple_segments(bio)) {
>        *bv = bio_iovec(bio);
>        return;
> }
>
>>
>> The reason is that if the bio was split in the middle of a segment, bv->bv_len
>> on the last biovec will be larger than what's actually used by the bio (it's
>> being shared between the two splits!).
>
> The last two lines in this helper should handle the situation.
>
>>
>> You have to iterate over all the biovecs so that you can see where
>> bi_iter->bi_size ends.
>
> I understand your concern is that this patch may not be much more
> efficient than bio_for_each_segment().
>
> IMO, one win of the patch is that 16bytes bvec copy is saved for all
> vectors, and another 'win' is to just run bvec_iter_advance() once(
> like move the outside for loop inside).
>
> I will run some benchmark to see if there is any performance
> difference between the two patches.

When I call bench_last_bvec()(see below) from null_queue_rq():
drivers/block/null_blk.c, IOPS with the 2nd patch in fio test(libaio,
randread, null_blk with default mod parameter) is better than the
1st one by > ~2%:

-------------------------
|BS      | IOPS          |
------------------------
|64K    |  +2%           |
-----------------------
|512K  |  +3%          |
------------------------

the 1st patch: use bio_for_each_segment()
the 2nd patch: use single bio_advance_iter()

static void bench_last_bvec(struct request *rq)
{
        static unsigned long total = 0;
        struct bio *bio;
        struct bio_vec bv = {0};

        __rq_for_each_bio(bio, rq) {
                bio_get_last_bvec(bio, &bv);
                total += bv.bv_len;
        }
}

Thanks
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 56d2db8..ef45fec 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -278,11 +278,21 @@  static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
  */
 static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
 {
-	struct bvec_iter iter;
+	struct bvec_iter iter = bio->bi_iter;
+	int idx;
+
+	bio_advance_iter(bio, &iter, iter.bi_size);
+
+	WARN_ON(!iter.bi_idx && !iter.bi_bvec_done);
+
+	if (!iter.bi_bvec_done)
+		idx = iter.bi_idx - 1;
+	else	/* in the middle of bvec */
+		idx = iter.bi_idx;
 
-	bio_for_each_segment(*bv, bio, iter)
-		if (bv->bv_len == iter.bi_size)
-			break;
+	*bv = bio->bi_io_vec[idx];
+	if (iter.bi_bvec_done)
+		bv->bv_len = iter.bi_bvec_done;
 }
 
 /*