Message ID | 1554555800-14392-1-git-send-email-alex@zadara.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: preserve BIO_REFFED flag in bio_reset() | expand |
On 4/6/19 7:03 AM, Alex Lyakas wrote: > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled > "bio: skip atomic inc/dec of ->bi_cnt for most use cases" > made __bi_cnt dependent on the new BIO_REFFED flag. > > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag. > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() > needs to preserve this flag. Looks good to me, applied, thanks.
Hi Jens, Can we perhaps backport this to stable kernels as well? In particular, to long-term kernel 4.14. Thanks, Alex. -----Original Message----- From: Jens Axboe Sent: Saturday, April 06, 2019 9:58 PM To: Alex Lyakas ; linux-block@vger.kernel.org Subject: Re: [PATCH] block: preserve BIO_REFFED flag in bio_reset() On 4/6/19 7:03 AM, Alex Lyakas wrote: > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled > "bio: skip atomic inc/dec of ->bi_cnt for most use cases" > made __bi_cnt dependent on the new BIO_REFFED flag. > > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED > flag. > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() > needs to preserve this flag. Looks good to me, applied, thanks.
On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote: > On 4/6/19 7:03 AM, Alex Lyakas wrote: > > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled > > "bio: skip atomic inc/dec of ->bi_cnt for most use cases" > > made __bi_cnt dependent on the new BIO_REFFED flag. > > > > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag. > > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() > > needs to preserve this flag. > > Looks good to me, applied, thanks. Uh-oh. While this fix isn't wrong per se I think it is confusing and set a dnagerous precedence. We have BIO_RESET_BITS to indicate which flags survive. So any flag that is supposed to survive the reset should be be added to that instead of specifically worked around.
Hi Christoph, I understand and agree with your concern. Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are preserved by bio_reset(). These bits are used to indicate a bvec pool. So we don't have any free bits to move the BIO_REFFED to. Unless we make bi_flags wider. Bottom line, now if somebody calls bio_get() and then bio_reset(), then on next bio_put() bio will be freed immediately. But this is wrong, because eventually another bio_put() will be done, and this will result in double-free and kernel memory corruption. For example, bio_map_user_iov() performs an additional bio_get() before returning. If somebody calls bio_reset() in-between, then bio_unmap_user() will do double-free. So in general, I think this issue needs to be taken care of, even if not the way I suggested. Thanks, Alex. On Sun, Apr 7, 2019 at 10:56 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote: > > On 4/6/19 7:03 AM, Alex Lyakas wrote: > > > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled > > > "bio: skip atomic inc/dec of ->bi_cnt for most use cases" > > > made __bi_cnt dependent on the new BIO_REFFED flag. > > > > > > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag. > > > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() > > > needs to preserve this flag. > > > > Looks good to me, applied, thanks. > > Uh-oh. While this fix isn't wrong per se I think it is confusing and > set a dnagerous precedence. > > We have BIO_RESET_BITS to indicate which flags survive. So any flag > that is supposed to survive the reset should be be added to that > instead of specifically worked around.
On 4/7/19 1:56 AM, Christoph Hellwig wrote: > On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote: >> On 4/6/19 7:03 AM, Alex Lyakas wrote: >>> Commit dac56212e8127dbc0bff7be35c508bc280213309 titled >>> "bio: skip atomic inc/dec of ->bi_cnt for most use cases" >>> made __bi_cnt dependent on the new BIO_REFFED flag. >>> >>> bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag. >>> But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() >>> needs to preserve this flag. >> >> Looks good to me, applied, thanks. > > Uh-oh. While this fix isn't wrong per se I think it is confusing and > set a dnagerous precedence. > > We have BIO_RESET_BITS to indicate which flags survive. So any flag > that is supposed to survive the reset should be be added to that > instead of specifically worked around. That's a good point. I've pulled the fix for now, Alex would be great if you could re-do with that in mind.
On Sun, Apr 07, 2019 at 05:35:45PM +0300, Alex Lyakas wrote: > Hi Christoph, > > I understand and agree with your concern. > > Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are > preserved by bio_reset(). These bits are used to indicate a bvec pool. > So we don't have any free bits to move the BIO_REFFED to. Unless we > make bi_flags wider. I think we just need to move BIO_REFFED around. Something like the untested patch below should do the job, although blk_types.h could use some additional cleanup in this area.. diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index be418275763c..017a450dc8e0 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -224,13 +224,13 @@ enum { BIO_NULL_MAPPED, /* contains invalid user pages */ BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ - BIO_REFFED, /* bio has elevated ->bi_cnt */ BIO_THROTTLED, /* This bio has already been subjected to * throttling rules. Don't do it again. */ BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion * of this bio. */ BIO_QUEUE_ENTERED, /* can use blk_queue_enter_live() */ BIO_TRACKED, /* set if bio goes through the rq_qos path */ + BIO_REFFED, /* bio has elevated ->bi_cnt */ BIO_FLAG_LAST }; @@ -257,9 +257,9 @@ enum { /* * Flags starting here get preserved by bio_reset() - this includes - * only BVEC_POOL_IDX() + * BVEC_POOL_IDX() */ -#define BIO_RESET_BITS BVEC_POOL_OFFSET +#define BIO_RESET_BITS BIO_REFFED typedef __u32 __bitwise blk_mq_req_flags_t;
On Sun, Apr 07, 2019 at 11:30:52PM -0700, Christoph Hellwig wrote: > On Sun, Apr 07, 2019 at 05:35:45PM +0300, Alex Lyakas wrote: > > Hi Christoph, > > > > I understand and agree with your concern. > > > > Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are > > preserved by bio_reset(). These bits are used to indicate a bvec pool. > > So we don't have any free bits to move the BIO_REFFED to. Unless we > > make bi_flags wider. > > I think we just need to move BIO_REFFED around. Something like the > untested patch below should do the job, although blk_types.h could > use some additional cleanup in this area.. Any comments?
diff --git a/block/bio.c b/block/bio.c index b64cedc..96f8dca 100644 --- a/block/bio.c +++ b/block/bio.c @@ -301,11 +301,17 @@ void bio_init(struct bio *bio, struct bio_vec *table, void bio_reset(struct bio *bio) { unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS); + bool bio_reffed = bio_flagged(bio, BIO_REFFED); bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); bio->bi_flags = flags; + + /* we are not resetting __bi_cnt, but it depends on correct BIO_REFFED */ + if (bio_reffed) + bio_set_flag(bio, BIO_REFFED); + atomic_set(&bio->__bi_remaining, 1); } EXPORT_SYMBOL(bio_reset);
Commit dac56212e8127dbc0bff7be35c508bc280213309 titled "bio: skip atomic inc/dec of ->bi_cnt for most use cases" made __bi_cnt dependent on the new BIO_REFFED flag. bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag. But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset() needs to preserve this flag. Signed-off-by: Alex Lyakas <alex@zadara.com> --- block/bio.c | 6 ++++++ 1 file changed, 6 insertions(+)