diff mbox series

block: preserve BIO_REFFED flag in bio_reset()

Message ID 1554555800-14392-1-git-send-email-alex@zadara.com (mailing list archive)
State New, archived
Headers show
Series block: preserve BIO_REFFED flag in bio_reset() | expand

Commit Message

Alex Lyakas April 6, 2019, 1:03 p.m. UTC
Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
"bio: skip atomic inc/dec of ->bi_cnt for most use cases"
made __bi_cnt dependent on the new BIO_REFFED flag.

bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag.
But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
needs to preserve this flag.

Signed-off-by: Alex Lyakas <alex@zadara.com>
---
 block/bio.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Jens Axboe April 6, 2019, 6:58 p.m. UTC | #1
On 4/6/19 7:03 AM, Alex Lyakas wrote:
> Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
> "bio: skip atomic inc/dec of ->bi_cnt for most use cases"
> made __bi_cnt dependent on the new BIO_REFFED flag.
> 
> bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag.
> But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
> needs to preserve this flag.

Looks good to me, applied, thanks.
Alex Lyakas April 7, 2019, 7:42 a.m. UTC | #2
Hi Jens,

Can we perhaps backport this to stable kernels as well? In particular, to 
long-term kernel 4.14.

Thanks,
Alex.


-----Original Message----- 
From: Jens Axboe
Sent: Saturday, April 06, 2019 9:58 PM
To: Alex Lyakas ; linux-block@vger.kernel.org
Subject: Re: [PATCH] block: preserve BIO_REFFED flag in bio_reset()

On 4/6/19 7:03 AM, Alex Lyakas wrote:
> Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
> "bio: skip atomic inc/dec of ->bi_cnt for most use cases"
> made __bi_cnt dependent on the new BIO_REFFED flag.
>
> bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED 
> flag.
> But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
> needs to preserve this flag.

Looks good to me, applied, thanks.
Christoph Hellwig April 7, 2019, 7:56 a.m. UTC | #3
On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote:
> On 4/6/19 7:03 AM, Alex Lyakas wrote:
> > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
> > "bio: skip atomic inc/dec of ->bi_cnt for most use cases"
> > made __bi_cnt dependent on the new BIO_REFFED flag.
> > 
> > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag.
> > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
> > needs to preserve this flag.
> 
> Looks good to me, applied, thanks.

Uh-oh.  While this fix isn't wrong per se I think it is confusing and
set a dnagerous precedence.

We have BIO_RESET_BITS to indicate which flags survive.  So any flag
that is supposed to survive the reset should be be added to that
instead of specifically worked around.
Alex Lyakas April 7, 2019, 2:35 p.m. UTC | #4
Hi Christoph,

I understand and agree with your concern.

Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are
preserved by bio_reset(). These bits are used to indicate a bvec pool.
So we don't have any free bits to move the BIO_REFFED to. Unless we
make bi_flags wider.

Bottom line, now if somebody calls bio_get() and then bio_reset(),
then on next bio_put() bio will be freed immediately. But this is
wrong, because eventually another bio_put() will be done, and this
will result in double-free and kernel memory corruption.

For example, bio_map_user_iov() performs an additional bio_get()
before returning. If somebody calls bio_reset() in-between, then
bio_unmap_user() will do double-free.

So in general, I think this issue needs to be taken care of, even if
not the way I suggested.

Thanks,
Alex.



On Sun, Apr 7, 2019 at 10:56 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote:
> > On 4/6/19 7:03 AM, Alex Lyakas wrote:
> > > Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
> > > "bio: skip atomic inc/dec of ->bi_cnt for most use cases"
> > > made __bi_cnt dependent on the new BIO_REFFED flag.
> > >
> > > bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag.
> > > But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
> > > needs to preserve this flag.
> >
> > Looks good to me, applied, thanks.
>
> Uh-oh.  While this fix isn't wrong per se I think it is confusing and
> set a dnagerous precedence.
>
> We have BIO_RESET_BITS to indicate which flags survive.  So any flag
> that is supposed to survive the reset should be be added to that
> instead of specifically worked around.
Jens Axboe April 7, 2019, 7:38 p.m. UTC | #5
On 4/7/19 1:56 AM, Christoph Hellwig wrote:
> On Sat, Apr 06, 2019 at 12:58:54PM -0600, Jens Axboe wrote:
>> On 4/6/19 7:03 AM, Alex Lyakas wrote:
>>> Commit dac56212e8127dbc0bff7be35c508bc280213309 titled
>>> "bio: skip atomic inc/dec of ->bi_cnt for most use cases"
>>> made __bi_cnt dependent on the new BIO_REFFED flag.
>>>
>>> bio_reset() does not reset __bi_cnt, but it does reset the BIO_REFFED flag.
>>> But __bi_cnt depends now on the BIO_REFFED flag, so bio_reset()
>>> needs to preserve this flag.
>>
>> Looks good to me, applied, thanks.
> 
> Uh-oh.  While this fix isn't wrong per se I think it is confusing and
> set a dnagerous precedence.
> 
> We have BIO_RESET_BITS to indicate which flags survive.  So any flag
> that is supposed to survive the reset should be be added to that
> instead of specifically worked around.

That's a good point. I've pulled the fix for now, Alex would be great
if you could re-do with that in mind.
Christoph Hellwig April 8, 2019, 6:30 a.m. UTC | #6
On Sun, Apr 07, 2019 at 05:35:45PM +0300, Alex Lyakas wrote:
> Hi Christoph,
> 
> I understand and agree with your concern.
> 
> Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are
> preserved by bio_reset(). These bits are used to indicate a bvec pool.
> So we don't have any free bits to move the BIO_REFFED to. Unless we
> make bi_flags wider.

I think we just need to move BIO_REFFED around.  Something like the
untested patch below should do the job, although blk_types.h could
use some additional cleanup in this area..

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index be418275763c..017a450dc8e0 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -224,13 +224,13 @@ enum {
 	BIO_NULL_MAPPED,	/* contains invalid user pages */
 	BIO_QUIET,		/* Make BIO Quiet */
 	BIO_CHAIN,		/* chained bio, ->bi_remaining in effect */
-	BIO_REFFED,		/* bio has elevated ->bi_cnt */
 	BIO_THROTTLED,		/* This bio has already been subjected to
 				 * throttling rules. Don't do it again. */
 	BIO_TRACE_COMPLETION,	/* bio_endio() should trace the final completion
 				 * of this bio. */
 	BIO_QUEUE_ENTERED,	/* can use blk_queue_enter_live() */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
+	BIO_REFFED,		/* bio has elevated ->bi_cnt */
 	BIO_FLAG_LAST
 };
 
@@ -257,9 +257,9 @@ enum {
 
 /*
  * Flags starting here get preserved by bio_reset() - this includes
- * only BVEC_POOL_IDX()
+ * BVEC_POOL_IDX()
  */
-#define BIO_RESET_BITS	BVEC_POOL_OFFSET
+#define BIO_RESET_BITS BIO_REFFED
 
 typedef __u32 __bitwise blk_mq_req_flags_t;
Christoph Hellwig May 13, 2019, 8:08 a.m. UTC | #7
On Sun, Apr 07, 2019 at 11:30:52PM -0700, Christoph Hellwig wrote:
> On Sun, Apr 07, 2019 at 05:35:45PM +0300, Alex Lyakas wrote:
> > Hi Christoph,
> > 
> > I understand and agree with your concern.
> > 
> > Looking at the code, bi_flags is 16-bits wide, and only top 3 bits are
> > preserved by bio_reset(). These bits are used to indicate a bvec pool.
> > So we don't have any free bits to move the BIO_REFFED to. Unless we
> > make bi_flags wider.
> 
> I think we just need to move BIO_REFFED around.  Something like the
> untested patch below should do the job, although blk_types.h could
> use some additional cleanup in this area..

Any comments?
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index b64cedc..96f8dca 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -301,11 +301,17 @@  void bio_init(struct bio *bio, struct bio_vec *table,
 void bio_reset(struct bio *bio)
 {
 	unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
+	bool bio_reffed = bio_flagged(bio, BIO_REFFED);
 
 	bio_uninit(bio);
 
 	memset(bio, 0, BIO_RESET_BYTES);
 	bio->bi_flags = flags;
+
+	/* we are not resetting __bi_cnt, but it depends on correct BIO_REFFED */
+	if (bio_reffed)
+		bio_set_flag(bio, BIO_REFFED);
+
 	atomic_set(&bio->__bi_remaining, 1);
 }
 EXPORT_SYMBOL(bio_reset);