diff mbox

block: Make __bio_clone_fast() copy bi_vcnt

Message ID 20180627201231.15641-1-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche June 27, 2018, 8:12 p.m. UTC
Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
such that code that needs this member behaves identically for original
and for cloned requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/bio.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ming Lei June 27, 2018, 11:50 p.m. UTC | #1
On Wed, Jun 27, 2018 at 01:12:31PM -0700, Bart Van Assche wrote:
> Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
> the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
> such that code that needs this member behaves identically for original
> and for cloned requests.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/bio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index f7e3d88bd0b6..55f8e0dedd69 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -605,6 +605,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
>  	bio->bi_opf = bio_src->bi_opf;
>  	bio->bi_write_hint = bio_src->bi_write_hint;
>  	bio->bi_iter = bio_src->bi_iter;
> +	bio->bi_vcnt = bio_src->bi_vcnt;
>  	bio->bi_io_vec = bio_src->bi_io_vec;

No, don't do that.

For a cloned bio, both .bi_vcnt and .bi_io_vec can't be used directly,
and we have done huge such cleanup for supporting multipage bvec, that is
the great idea of immutable bvecs invented by Kent.

Thanks,
Ming
Bart Van Assche June 27, 2018, 11:59 p.m. UTC | #2
On 06/27/18 16:50, Ming Lei wrote:
> On Wed, Jun 27, 2018 at 01:12:31PM -0700, Bart Van Assche wrote:
>> Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
>> the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
>> such that code that needs this member behaves identically for original
>> and for cloned requests.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Mike Snitzer <snitzer@redhat.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Cc: Johannes Thumshirn <jthumshirn@suse.de>
>> ---
>>   block/bio.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index f7e3d88bd0b6..55f8e0dedd69 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -605,6 +605,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
>>   	bio->bi_opf = bio_src->bi_opf;
>>   	bio->bi_write_hint = bio_src->bi_write_hint;
>>   	bio->bi_iter = bio_src->bi_iter;
>> +	bio->bi_vcnt = bio_src->bi_vcnt;
>>   	bio->bi_io_vec = bio_src->bi_io_vec;
> 
> No, don't do that.

Why not? I think it's a huge booby trap that cloned bio's have a 
bi_io_vec but zero bi_vcnt.

> For a cloned bio, both .bi_vcnt and .bi_io_vec can't be used directly,
> and we have done huge such cleanup for supporting multipage bvec, that is
> the great idea of immutable bvecs invented by Kent.

My understanding is that the above change doesn't conflict at all with 
the immutable biovec work -- to the contrary.

Bart.
Ming Lei June 28, 2018, 12:30 a.m. UTC | #3
On Wed, Jun 27, 2018 at 04:59:41PM -0700, Bart Van Assche wrote:
> On 06/27/18 16:50, Ming Lei wrote:
> > On Wed, Jun 27, 2018 at 01:12:31PM -0700, Bart Van Assche wrote:
> > > Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
> > > the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
> > > such that code that needs this member behaves identically for original
> > > and for cloned requests.
> > > 
> > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Mike Snitzer <snitzer@redhat.com>
> > > Cc: Ming Lei <ming.lei@redhat.com>
> > > Cc: Hannes Reinecke <hare@suse.com>
> > > Cc: Johannes Thumshirn <jthumshirn@suse.de>
> > > ---
> > >   block/bio.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/block/bio.c b/block/bio.c
> > > index f7e3d88bd0b6..55f8e0dedd69 100644
> > > --- a/block/bio.c
> > > +++ b/block/bio.c
> > > @@ -605,6 +605,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
> > >   	bio->bi_opf = bio_src->bi_opf;
> > >   	bio->bi_write_hint = bio_src->bi_write_hint;
> > >   	bio->bi_iter = bio_src->bi_iter;
> > > +	bio->bi_vcnt = bio_src->bi_vcnt;
> > >   	bio->bi_io_vec = bio_src->bi_io_vec;
> > 
> > No, don't do that.
> 
> Why not? I think it's a huge booby trap that cloned bio's have a bi_io_vec
> but zero bi_vcnt.

One core idea of immutable bvec is to use bio->bi_iter and the original
bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
needs to copy, but not see any reason why .bi_vcnt needs to do.

Do you have use cases on .bi_vcnt for cloned bio?

Thanks,
Ming
Bart Van Assche June 28, 2018, 3:21 p.m. UTC | #4
On 06/27/18 17:30, Ming Lei wrote:
> One core idea of immutable bvec is to use bio->bi_iter and the original
> bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
> needs to copy, but not see any reason why .bi_vcnt needs to do.
> 
> Do you have use cases on .bi_vcnt for cloned bio?

So far this is only a theoretical concern. There are many functions in 
the block layer that use .bi_vcnt, and it is a lot of work to figure out 
all the callers of all these functions.

Bart.
Mike Snitzer June 28, 2018, 3:32 p.m. UTC | #5
On Thu, Jun 28 2018 at 11:21am -0400,
Bart Van Assche <bart.vanassche@wdc.com> wrote:

> On 06/27/18 17:30, Ming Lei wrote:
> >One core idea of immutable bvec is to use bio->bi_iter and the original
> >bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
> >needs to copy, but not see any reason why .bi_vcnt needs to do.
> >
> >Do you have use cases on .bi_vcnt for cloned bio?
> 
> So far this is only a theoretical concern. There are many functions
> in the block layer that use .bi_vcnt, and it is a lot of work to
> figure out all the callers of all these functions.

No point wasting time with that.  I don't understand why Ming cares.
Your change is obviously correct.  The state should get transfered over
to reflect reality.

This patch doesn't harm anything, it just prevents some clone-specific
failure in the future.

Acked-by: Mike Snitzer <snitzer@redhat.com>
Jens Axboe June 28, 2018, 3:53 p.m. UTC | #6
On 6/27/18 2:12 PM, Bart Van Assche wrote:
> Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
> the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
> such that code that needs this member behaves identically for original
> and for cloned requests.

Applied - it's correct for the current base.
Ming Lei June 28, 2018, 10:53 p.m. UTC | #7
On Thu, Jun 28, 2018 at 09:53:23AM -0600, Jens Axboe wrote:
> On 6/27/18 2:12 PM, Bart Van Assche wrote:
> > Although __bio_clone_fast() copies bi_io_vec, it does not copy bi_vcnt,
> > the number of elements in bi_io_vec[] that contains data. Copy bi_vcnt
> > such that code that needs this member behaves identically for original
> > and for cloned requests.
> 
> Applied - it's correct for the current base.
> 

Any users of cloned bio shouldn't use this bio's .bi_vcnt directly,
since this counter represents the original bio's actual io vector
number, nothing related with the cloned bio.

So I don't understand why we need to copy it.

Thanks,
Ming
Ming Lei June 28, 2018, 11:10 p.m. UTC | #8
On Thu, Jun 28, 2018 at 11:21 PM, Bart Van Assche
<bart.vanassche@wdc.com> wrote:
> On 06/27/18 17:30, Ming Lei wrote:
>>
>> One core idea of immutable bvec is to use bio->bi_iter and the original
>> bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
>> needs to copy, but not see any reason why .bi_vcnt needs to do.
>>
>> Do you have use cases on .bi_vcnt for cloned bio?
>
>
> So far this is only a theoretical concern. There are many functions in the
> block layer that use .bi_vcnt, and it is a lot of work to figure out all the
> callers of all these functions.

No, any functions using .bi_vcnt on a cloned-bio may be a bug, and we should
take a close look.

Thanks,
Ming Lei
Kent Overstreet June 28, 2018, 11:16 p.m. UTC | #9
On Fri, Jun 29, 2018 at 07:10:47AM +0800, Ming Lei wrote:
> On Thu, Jun 28, 2018 at 11:21 PM, Bart Van Assche
> <bart.vanassche@wdc.com> wrote:
> > On 06/27/18 17:30, Ming Lei wrote:
> >>
> >> One core idea of immutable bvec is to use bio->bi_iter and the original
> >> bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
> >> needs to copy, but not see any reason why .bi_vcnt needs to do.
> >>
> >> Do you have use cases on .bi_vcnt for cloned bio?
> >
> >
> > So far this is only a theoretical concern. There are many functions in the
> > block layer that use .bi_vcnt, and it is a lot of work to figure out all the
> > callers of all these functions.

Back when I implemented immutable biovecs I thoroughly audited all the bi_vcnt
uses and removed all of them that weren't by the code that owns/submits the bio.

Grepping around I see one or two suspicious uses.. blk-merge.c in particular

> No, any functions using .bi_vcnt on a cloned-bio may be a bug, and we should
> take a close look.

not just cloned bios, any code using bi_vcnt on a bio it didn't create is wrong.

so big nack to this patch (I wasn't ccd on it though and it doesn't seem to have
hit lkml, so I can't find the original patch...)
Bart Van Assche June 28, 2018, 11:54 p.m. UTC | #10
On 06/28/18 16:16, Kent Overstreet wrote:
> On Fri, Jun 29, 2018 at 07:10:47AM +0800, Ming Lei wrote:
>> On Thu, Jun 28, 2018 at 11:21 PM, Bart Van Assche
>> <bart.vanassche@wdc.com> wrote:
>>> On 06/27/18 17:30, Ming Lei wrote:
>>>>
>>>> One core idea of immutable bvec is to use bio->bi_iter and the original
>>>> bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
>>>> needs to copy, but not see any reason why .bi_vcnt needs to do.
>>>>
>>>> Do you have use cases on .bi_vcnt for cloned bio?
>>>
>>> So far this is only a theoretical concern. There are many functions in the
>>> block layer that use .bi_vcnt, and it is a lot of work to figure out all the
>>> callers of all these functions.
> 
> Back when I implemented immutable biovecs I thoroughly audited all the bi_vcnt
> uses and removed all of them that weren't by the code that owns/submits the bio.
> 
> Grepping around I see one or two suspicious uses.. blk-merge.c in particular
> 
>> No, any functions using .bi_vcnt on a cloned-bio may be a bug, and we should
>> take a close look.
> 
> not just cloned bios, any code using bi_vcnt on a bio it didn't create is wrong.
> 
> so big nack to this patch (I wasn't ccd on it though and it doesn't seem to have
> hit lkml, so I can't find the original patch...)

Hello Kent,

Thanks for chiming in. The linux-block mailing list is archived by 
multiple websites. The entire e-mail thread is available on e.g. 
https://www.mail-archive.com/linux-block@vger.kernel.org/msg23006.html.

I have a question for you: at least in kernel v4.17 bio_clone_bioset() 
copies bi_vcnt from the source to the destination bio. However,
__bio_clone_fast() doesn't copy bi_vcnt. Isn't that an inconsistency?

Thanks,

Bart.
Kent Overstreet June 29, 2018, 12:04 a.m. UTC | #11
On Thu, Jun 28, 2018 at 04:54:44PM -0700, Bart Van Assche wrote:
> On 06/28/18 16:16, Kent Overstreet wrote:
> > On Fri, Jun 29, 2018 at 07:10:47AM +0800, Ming Lei wrote:
> > > On Thu, Jun 28, 2018 at 11:21 PM, Bart Van Assche
> > > <bart.vanassche@wdc.com> wrote:
> > > > On 06/27/18 17:30, Ming Lei wrote:
> > > > > 
> > > > > One core idea of immutable bvec is to use bio->bi_iter and the original
> > > > > bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
> > > > > needs to copy, but not see any reason why .bi_vcnt needs to do.
> > > > > 
> > > > > Do you have use cases on .bi_vcnt for cloned bio?
> > > > 
> > > > So far this is only a theoretical concern. There are many functions in the
> > > > block layer that use .bi_vcnt, and it is a lot of work to figure out all the
> > > > callers of all these functions.
> > 
> > Back when I implemented immutable biovecs I thoroughly audited all the bi_vcnt
> > uses and removed all of them that weren't by the code that owns/submits the bio.
> > 
> > Grepping around I see one or two suspicious uses.. blk-merge.c in particular
> > 
> > > No, any functions using .bi_vcnt on a cloned-bio may be a bug, and we should
> > > take a close look.
> > 
> > not just cloned bios, any code using bi_vcnt on a bio it didn't create is wrong.
> > 
> > so big nack to this patch (I wasn't ccd on it though and it doesn't seem to have
> > hit lkml, so I can't find the original patch...)
> 
> Hello Kent,
> 
> Thanks for chiming in. The linux-block mailing list is archived by multiple
> websites. The entire e-mail thread is available on e.g.
> https://www.mail-archive.com/linux-block@vger.kernel.org/msg23006.html.
> 
> I have a question for you: at least in kernel v4.17 bio_clone_bioset()
> copies bi_vcnt from the source to the destination bio. However,
> __bio_clone_fast() doesn't copy bi_vcnt. Isn't that an inconsistency?

No - when you use bio_clone_bioset() you get a bio that you own and can do
whatever you want with, so it does make sense for it to initialize bi_vcnt.

e.g. you could use bio_clone_bioset() when you're going to be bouncing a bio,
iterating over each bvec and allocating a new page and copying data from the old
page to the new page.
Jens Axboe June 29, 2018, 2:18 a.m. UTC | #12
On 6/28/18 5:16 PM, Kent Overstreet wrote:
> On Fri, Jun 29, 2018 at 07:10:47AM +0800, Ming Lei wrote:
>> On Thu, Jun 28, 2018 at 11:21 PM, Bart Van Assche
>> <bart.vanassche@wdc.com> wrote:
>>> On 06/27/18 17:30, Ming Lei wrote:
>>>>
>>>> One core idea of immutable bvec is to use bio->bi_iter and the original
>>>> bvec table to iterate over anywhere in the bio. That is why .bi_io_vec
>>>> needs to copy, but not see any reason why .bi_vcnt needs to do.
>>>>
>>>> Do you have use cases on .bi_vcnt for cloned bio?
>>>
>>>
>>> So far this is only a theoretical concern. There are many functions in the
>>> block layer that use .bi_vcnt, and it is a lot of work to figure out all the
>>> callers of all these functions.
> 
> Back when I implemented immutable biovecs I thoroughly audited all the bi_vcnt
> uses and removed all of them that weren't by the code that owns/submits the bio.
> 
> Grepping around I see one or two suspicious uses.. blk-merge.c in particular
> 
>> No, any functions using .bi_vcnt on a cloned-bio may be a bug, and we should
>> take a close look.
> 
> not just cloned bios, any code using bi_vcnt on a bio it didn't create is wrong.
> 
> so big nack to this patch (I wasn't ccd on it though and it doesn't seem to have
> hit lkml, so I can't find the original patch...)

Yeah, you are both right, I was smoking crack.
Bart Van Assche June 29, 2018, 8 p.m. UTC | #13
On 06/28/18 17:04, Kent Overstreet wrote:
> On Thu, Jun 28, 2018 at 04:54:44PM -0700, Bart Van Assche wrote:
>> Thanks for chiming in. The linux-block mailing list is archived by multiple
>> websites. The entire e-mail thread is available on e.g.
>> https://www.mail-archive.com/linux-block@vger.kernel.org/msg23006.html.
>>
>> I have a question for you: at least in kernel v4.17 bio_clone_bioset()
>> copies bi_vcnt from the source to the destination bio. However,
>> __bio_clone_fast() doesn't copy bi_vcnt. Isn't that an inconsistency?
> 
> No - when you use bio_clone_bioset() you get a bio that you own and can do
> whatever you want with, so it does make sense for it to initialize bi_vcnt.
> 
> e.g. you could use bio_clone_bioset() when you're going to be bouncing a bio,
> iterating over each bvec and allocating a new page and copying data from the old
> page to the new page.

Hello Kent,

Have you considered to explain this in a comment above 
__bio_clone_fast() to avoid that the next person who reads that function 
gets confused?

Bart.
Kent Overstreet June 30, 2018, 11:38 p.m. UTC | #14
On Fri, Jun 29, 2018 at 01:00:33PM -0700, Bart Van Assche wrote:
> On 06/28/18 17:04, Kent Overstreet wrote:
> > On Thu, Jun 28, 2018 at 04:54:44PM -0700, Bart Van Assche wrote:
> > > Thanks for chiming in. The linux-block mailing list is archived by multiple
> > > websites. The entire e-mail thread is available on e.g.
> > > https://www.mail-archive.com/linux-block@vger.kernel.org/msg23006.html.
> > > 
> > > I have a question for you: at least in kernel v4.17 bio_clone_bioset()
> > > copies bi_vcnt from the source to the destination bio. However,
> > > __bio_clone_fast() doesn't copy bi_vcnt. Isn't that an inconsistency?
> > 
> > No - when you use bio_clone_bioset() you get a bio that you own and can do
> > whatever you want with, so it does make sense for it to initialize bi_vcnt.
> > 
> > e.g. you could use bio_clone_bioset() when you're going to be bouncing a bio,
> > iterating over each bvec and allocating a new page and copying data from the old
> > page to the new page.
> 
> Hello Kent,
> 
> Have you considered to explain this in a comment above __bio_clone_fast() to
> avoid that the next person who reads that function gets confused?

Well, there is a large comment in bio_clone_bioset() that you must have found...
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index f7e3d88bd0b6..55f8e0dedd69 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -605,6 +605,7 @@  void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
+	bio->bi_vcnt = bio_src->bi_vcnt;
 	bio->bi_io_vec = bio_src->bi_io_vec;
 
 	bio_clone_blkcg_association(bio, bio_src);