diff mbox series

[RFC] block: protect bi_status with spinlock

Message ID 20210329022337.3992955-1-yuyufen@huawei.com (mailing list archive)
State New, archived
Headers show
Series [RFC] block: protect bi_status with spinlock | expand

Commit Message

Yufen Yu March 29, 2021, 2:23 a.m. UTC
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:

split bio(READ)          kworker

nvme_complete_rq
blk_update_request //split error=0
  bio_endio
    bio_integrity_endio
      queue_work(kintegrityd_wq, &bip->bip_work);

                         bio_integrity_verify_fn
                         bio_endio //split bio
                          __bio_chain_endio
                             if (!parent->bi_status)

                               <interrupt entry>
                               nvme_irq
                                 blk_update_request //parent error=7
                                 req_bio_endio
                                    bio->bi_status = 7 //parent bio
                               <interrupt exit>

                               parent->bi_status = 0
                        parent->bi_end_io() // return bi_status=0

The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.

In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bio.c               | 3 +++
 block/blk-core.c          | 4 ++++
 include/linux/blk_types.h | 1 +
 3 files changed, 8 insertions(+)

Comments

Keith Busch March 29, 2021, 3:02 a.m. UTC | #1
On Sun, Mar 28, 2021 at 10:23:37PM -0400, Yufen Yu wrote:
>  static struct bio *__bio_chain_endio(struct bio *bio)
>  {
>  	struct bio *parent = bio->bi_private;
> +	unsigned long flags;
>  
> +	spin_lock_irqsave(&parent->bi_lock, flags);
>  	if (!parent->bi_status)
>  		parent->bi_status = bio->bi_status;
> +	spin_unlock_irqrestore(&parent->bi_lock, flags);


I don't see a spin_lock_init() on this new lock, though a spinlock seems
overkill here. If you need an atomic update, you could do:

	cmpxchg(&parent->bi_status, 0, bio->bi_status);

But I don't even think that's necessary. There really is no need to set
parent->bi_status if bio->bi_status is 0, so something like this should
be fine:

  	if (bio->bi_status && !parent->bi_status)
  		parent->bi_status = bio->bi_status;
Bart Van Assche March 29, 2021, 3:49 a.m. UTC | #2
On 3/28/21 8:02 PM, Keith Busch wrote:
> On Sun, Mar 28, 2021 at 10:23:37PM -0400, Yufen Yu wrote:
>>  static struct bio *__bio_chain_endio(struct bio *bio)
>>  {
>>  	struct bio *parent = bio->bi_private;
>> +	unsigned long flags;
>>  
>> +	spin_lock_irqsave(&parent->bi_lock, flags);
>>  	if (!parent->bi_status)
>>  		parent->bi_status = bio->bi_status;
>> +	spin_unlock_irqrestore(&parent->bi_lock, flags);
> 
> 
> I don't see a spin_lock_init() on this new lock, though a spinlock seems
> overkill here. If you need an atomic update, you could do:
> 
> 	cmpxchg(&parent->bi_status, 0, bio->bi_status);

Hmm ... isn't cmpxchg() significantly slower than a spinlock?

Thanks,

Bart.
Christoph Hellwig March 29, 2021, 5:27 a.m. UTC | #3
On Sun, Mar 28, 2021 at 08:49:29PM -0700, Bart Van Assche wrote:
> > I don't see a spin_lock_init() on this new lock, though a spinlock seems
> > overkill here. If you need an atomic update, you could do:
> > 
> > 	cmpxchg(&parent->bi_status, 0, bio->bi_status);
> 
> Hmm ... isn't cmpxchg() significantly slower than a spinlock?

That is (micro-)architecture-specific, but for common x86 CPU is
certainly is going to be at least as fast as the spinlock.
Christoph Hellwig March 29, 2021, 5:47 a.m. UTC | #4
On Mon, Mar 29, 2021 at 12:02:46PM +0900, Keith Busch wrote:
> On Sun, Mar 28, 2021 at 10:23:37PM -0400, Yufen Yu wrote:
> >  static struct bio *__bio_chain_endio(struct bio *bio)
> >  {
> >  	struct bio *parent = bio->bi_private;
> > +	unsigned long flags;
> >  
> > +	spin_lock_irqsave(&parent->bi_lock, flags);
> >  	if (!parent->bi_status)
> >  		parent->bi_status = bio->bi_status;
> > +	spin_unlock_irqrestore(&parent->bi_lock, flags);
> 
> 
> I don't see a spin_lock_init() on this new lock, though a spinlock seems
> overkill here. If you need an atomic update, you could do:
> 
> 	cmpxchg(&parent->bi_status, 0, bio->bi_status);
> 
> But I don't even think that's necessary. There really is no need to set
> parent->bi_status if bio->bi_status is 0, so something like this should
> be fine:
> 
>   	if (bio->bi_status && !parent->bi_status)
>   		parent->bi_status = bio->bi_status;

At very least we'd need READ_ONCE/WRITE_ONCE annotations, but yes,
those should be enough if every place that sets bi_status is careful.
Yufen Yu March 29, 2021, 8:08 a.m. UTC | #5
On 2021/3/29 11:02, Keith Busch wrote:
> On Sun, Mar 28, 2021 at 10:23:37PM -0400, Yufen Yu wrote:
>>   static struct bio *__bio_chain_endio(struct bio *bio)
>>   {
>>   	struct bio *parent = bio->bi_private;
>> +	unsigned long flags;
>>   
>> +	spin_lock_irqsave(&parent->bi_lock, flags);
>>   	if (!parent->bi_status)
>>   		parent->bi_status = bio->bi_status;
>> +	spin_unlock_irqrestore(&parent->bi_lock, flags);
> 
> 
> I don't see a spin_lock_init() on this new lock, though a spinlock seems
> overkill here. If you need an atomic update, you could do:
> 
> 	cmpxchg(&parent->bi_status, 0, bio->bi_status);
> 
> But I don't even think that's necessary. There really is no need to set
> parent->bi_status if bio->bi_status is 0, so something like this should
> be fine:
> 
>    	if (bio->bi_status && !parent->bi_status)
>    		parent->bi_status = bio->bi_status;
> .
>

Yeah, this is more smart. Thanks a lot for your suggestion.
I will send v2.

Thanks,
Yufen
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 26b7f721cda8..74863aaebf53 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -276,9 +276,12 @@  EXPORT_SYMBOL(bio_reset);
 static struct bio *__bio_chain_endio(struct bio *bio)
 {
 	struct bio *parent = bio->bi_private;
+	unsigned long flags;
 
+	spin_lock_irqsave(&parent->bi_lock, flags);
 	if (!parent->bi_status)
 		parent->bi_status = bio->bi_status;
+	spin_unlock_irqrestore(&parent->bi_lock, flags);
 	bio_put(bio);
 	return parent;
 }
diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..362653e937ff 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -241,8 +241,12 @@  static void print_req_error(struct request *req, blk_status_t status,
 static void req_bio_endio(struct request *rq, struct bio *bio,
 			  unsigned int nbytes, blk_status_t error)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&bio->bi_lock, flags);
 	if (error)
 		bio->bi_status = error;
+	spin_unlock_irqrestore(&bio->bi_lock, flags);
 
 	if (unlikely(rq->rq_flags & RQF_QUIET))
 		bio_set_flag(bio, BIO_QUIET);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..c3d50ad99510 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -231,6 +231,7 @@  struct bio {
 	unsigned short		bi_ioprio;
 	unsigned short		bi_write_hint;
 	blk_status_t		bi_status;
+	spinlock_t		bi_lock;
 	atomic_t		__bi_remaining;
 
 	struct bvec_iter	bi_iter;