diff mbox

xen-blkfront: fix mq start/stop race

Message ID 1498095412-18731-1-git-send-email-junxiao.bi@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Junxiao Bi June 22, 2017, 1:36 a.m. UTC
When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.

interrupt:                                      process:
blkif_interrupt()                               blkif_queue_rq()
 kick_pending_request_queues_locked()
  blk_mq_start_stopped_hw_queues()
   clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
                                                 blk_mq_stop_hw_queue(hctx)
   blk_mq_run_hw_queue(hctx, async)

If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 drivers/block/xen-blkfront.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junxiao Bi June 23, 2017, 4:58 a.m. UTC | #1
Hi Boris & Juergen,

Could you help review this patch? This is a race and will cause the io hung.

Thanks,
Junxiao.

On 06/22/2017 09:36 AM, Junxiao Bi wrote:
> When ring buf full, hw queue will be stopped. While blkif interrupt consume
> request and make free space in ring buf, hw queue will be started again.
> But since start queue is protected by spin lock while stop not, that will
> cause a race.
> 
> interrupt:                                      process:
> blkif_interrupt()                               blkif_queue_rq()
>  kick_pending_request_queues_locked()
>   blk_mq_start_stopped_hw_queues()
>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>                                                  blk_mq_stop_hw_queue(hctx)
>    blk_mq_run_hw_queue(hctx, async)
> 
> If ring buf is made empty in this case, interrupt will never come, then the
> hw queue will be stopped forever, all processes waiting for the pending io
> in the queue will hung.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 8bb160cd00e1..4767b82b2cf6 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -912,8 +912,8 @@ out_err:
>  	return BLK_MQ_RQ_QUEUE_ERROR;
>  
>  out_busy:
> -	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>  	blk_mq_stop_hw_queue(hctx);
> +	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>  	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
>  
>
Boris Ostrovsky June 23, 2017, 12:22 p.m. UTC | #2
On 06/23/2017 12:58 AM, Junxiao Bi wrote:
> Hi Boris & Juergen,
> 
> Could you help review this patch? This is a race and will cause the io hung.
> 
> Thanks,
> Junxiao.
> 
> On 06/22/2017 09:36 AM, Junxiao Bi wrote:
>> When ring buf full, hw queue will be stopped. While blkif interrupt consume
>> request and make free space in ring buf, hw queue will be started again.
>> But since start queue is protected by spin lock while stop not, that will
>> cause a race.
>>
>> interrupt:                                      process:
>> blkif_interrupt()                               blkif_queue_rq()
>>   kick_pending_request_queues_locked()
>>    blk_mq_start_stopped_hw_queues()
>>     clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>>                                                   blk_mq_stop_hw_queue(hctx)
>>     blk_mq_run_hw_queue(hctx, async)
>>
>> If ring buf is made empty in this case, interrupt will never come, then the
>> hw queue will be stopped forever, all processes waiting for the pending io
>> in the queue will hung.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

but I think Roger (copied) needs to Ack it.

-boris


>> ---
>>   drivers/block/xen-blkfront.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 8bb160cd00e1..4767b82b2cf6 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -912,8 +912,8 @@ out_err:
>>   	return BLK_MQ_RQ_QUEUE_ERROR;
>>   
>>   out_busy:
>> -	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>>   	blk_mq_stop_hw_queue(hctx);
>> +	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>>   	return BLK_MQ_RQ_QUEUE_BUSY;
>>   }
>>   
>>
>
Roger Pau Monné June 23, 2017, 12:57 p.m. UTC | #3
On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
> When ring buf full, hw queue will be stopped. While blkif interrupt consume
> request and make free space in ring buf, hw queue will be started again.
> But since start queue is protected by spin lock while stop not, that will
> cause a race.
>
> interrupt:                                      process:
> blkif_interrupt()                               blkif_queue_rq()
>  kick_pending_request_queues_locked()
>   blk_mq_start_stopped_hw_queues()
>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>                                                  blk_mq_stop_hw_queue(hctx)
>    blk_mq_run_hw_queue(hctx, async)
> 
> If ring buf is made empty in this case, interrupt will never come, then the
> hw queue will be stopped forever, all processes waiting for the pending io
> in the queue will hung.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Junxiao Bi July 19, 2017, 1:19 a.m. UTC | #4
Hi Roger,

On 06/23/2017 08:57 PM, Roger Pau Monné wrote:
> On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
>> When ring buf full, hw queue will be stopped. While blkif interrupt consume
>> request and make free space in ring buf, hw queue will be started again.
>> But since start queue is protected by spin lock while stop not, that will
>> cause a race.
>>
>> interrupt:                                      process:
>> blkif_interrupt()                               blkif_queue_rq()
>>  kick_pending_request_queues_locked()
>>   blk_mq_start_stopped_hw_queues()
>>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>>                                                  blk_mq_stop_hw_queue(hctx)
>>    blk_mq_run_hw_queue(hctx, async)
>>
>> If ring buf is made empty in this case, interrupt will never come, then the
>> hw queue will be stopped forever, all processes waiting for the pending io
>> in the queue will hung.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Looks patch not in mainline. Can you please help merge it?

Thanks,
Junxiao.
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>
Roger Pau Monné July 19, 2017, 7:37 a.m. UTC | #5
On Wed, Jul 19, 2017 at 09:19:49AM +0800, Junxiao Bi wrote:
> Hi Roger,
> 
> On 06/23/2017 08:57 PM, Roger Pau Monné wrote:
> > On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
> >> When ring buf full, hw queue will be stopped. While blkif interrupt consume
> >> request and make free space in ring buf, hw queue will be started again.
> >> But since start queue is protected by spin lock while stop not, that will
> >> cause a race.
> >>
> >> interrupt:                                      process:
> >> blkif_interrupt()                               blkif_queue_rq()
> >>  kick_pending_request_queues_locked()
> >>   blk_mq_start_stopped_hw_queues()
> >>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
> >>                                                  blk_mq_stop_hw_queue(hctx)
> >>    blk_mq_run_hw_queue(hctx, async)
> >>
> >> If ring buf is made empty in this case, interrupt will never come, then the
> >> hw queue will be stopped forever, all processes waiting for the pending io
> >> in the queue will hung.
> >>
> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> >> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
> > 
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> Looks patch not in mainline. Can you please help merge it?

I'm afraid this needs to be done by Konrad or one of the Linux
maintainers, I don't have an account on kernel.org in order to send
pull requests to Jens.

Roger.
Junxiao Bi July 19, 2017, 7:51 a.m. UTC | #6
Hi Konrad,

On 07/19/2017 03:37 PM, Roger Pau Monné wrote:
> On Wed, Jul 19, 2017 at 09:19:49AM +0800, Junxiao Bi wrote:
>> Hi Roger,
>>
>> On 06/23/2017 08:57 PM, Roger Pau Monné wrote:
>>> On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
>>>> When ring buf full, hw queue will be stopped. While blkif interrupt consume
>>>> request and make free space in ring buf, hw queue will be started again.
>>>> But since start queue is protected by spin lock while stop not, that will
>>>> cause a race.
>>>>
>>>> interrupt:                                      process:
>>>> blkif_interrupt()                               blkif_queue_rq()
>>>>  kick_pending_request_queues_locked()
>>>>   blk_mq_start_stopped_hw_queues()
>>>>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>>>>                                                  blk_mq_stop_hw_queue(hctx)
>>>>    blk_mq_run_hw_queue(hctx, async)
>>>>
>>>> If ring buf is made empty in this case, interrupt will never come, then the
>>>> hw queue will be stopped forever, all processes waiting for the pending io
>>>> in the queue will hung.
>>>>
>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>
>>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>> Looks patch not in mainline. Can you please help merge it?
> 
> I'm afraid this needs to be done by Konrad or one of the Linux
> maintainers, I don't have an account on kernel.org in order to send
> pull requests to Jens.
Can you pls help merge it?

Thanks,
Junxiao.
> 
> Roger.
>
Konrad Rzeszutek Wilk July 19, 2017, 2:08 p.m. UTC | #7
On Wed, Jul 19, 2017 at 03:51:48PM +0800, Junxiao Bi wrote:
> Hi Konrad,
> 
> On 07/19/2017 03:37 PM, Roger Pau Monné wrote:
> > On Wed, Jul 19, 2017 at 09:19:49AM +0800, Junxiao Bi wrote:
> >> Hi Roger,
> >>
> >> On 06/23/2017 08:57 PM, Roger Pau Monné wrote:
> >>> On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
> >>>> When ring buf full, hw queue will be stopped. While blkif interrupt consume
> >>>> request and make free space in ring buf, hw queue will be started again.
> >>>> But since start queue is protected by spin lock while stop not, that will
> >>>> cause a race.
> >>>>
> >>>> interrupt:                                      process:
> >>>> blkif_interrupt()                               blkif_queue_rq()
> >>>>  kick_pending_request_queues_locked()
> >>>>   blk_mq_start_stopped_hw_queues()
> >>>>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
> >>>>                                                  blk_mq_stop_hw_queue(hctx)
> >>>>    blk_mq_run_hw_queue(hctx, async)
> >>>>
> >>>> If ring buf is made empty in this case, interrupt will never come, then the
> >>>> hw queue will be stopped forever, all processes waiting for the pending io
> >>>> in the queue will hung.
> >>>>
> >>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> >>>> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
> >>>
> >>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Looks patch not in mainline. Can you please help merge it?
> > 
> > I'm afraid this needs to be done by Konrad or one of the Linux
> > maintainers, I don't have an account on kernel.org in order to send
> > pull requests to Jens.
> Can you pls help merge it?

Could you kindly repost it with the updated tags _and_ against Linus's latest
branch?

I get:
[konrad@char linux]$ git am -s < /tmp/a
Applying: xen-blkfront: fix mq start/stop race
error: patch failed: drivers/block/xen-blkfront.c:912
error: drivers/block/xen-blkfront.c: patch does not apply
Patch failed at 0001 xen-blkfront: fix mq start/stop race
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


> 
> Thanks,
> Junxiao.
> > 
> > Roger.
> > 
>
Junxiao Bi July 20, 2017, 1:29 a.m. UTC | #8
On 07/19/2017 10:08 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 19, 2017 at 03:51:48PM +0800, Junxiao Bi wrote:
>> Hi Konrad,
>>
>> On 07/19/2017 03:37 PM, Roger Pau Monné wrote:
>>> On Wed, Jul 19, 2017 at 09:19:49AM +0800, Junxiao Bi wrote:
>>>> Hi Roger,
>>>>
>>>> On 06/23/2017 08:57 PM, Roger Pau Monné wrote:
>>>>> On Thu, Jun 22, 2017 at 09:36:52AM +0800, Junxiao Bi wrote:
>>>>>> When ring buf full, hw queue will be stopped. While blkif interrupt consume
>>>>>> request and make free space in ring buf, hw queue will be started again.
>>>>>> But since start queue is protected by spin lock while stop not, that will
>>>>>> cause a race.
>>>>>>
>>>>>> interrupt:                                      process:
>>>>>> blkif_interrupt()                               blkif_queue_rq()
>>>>>>  kick_pending_request_queues_locked()
>>>>>>   blk_mq_start_stopped_hw_queues()
>>>>>>    clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
>>>>>>                                                  blk_mq_stop_hw_queue(hctx)
>>>>>>    blk_mq_run_hw_queue(hctx, async)
>>>>>>
>>>>>> If ring buf is made empty in this case, interrupt will never come, then the
>>>>>> hw queue will be stopped forever, all processes waiting for the pending io
>>>>>> in the queue will hung.
>>>>>>
>>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>>>> Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>>
>>>>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Looks patch not in mainline. Can you please help merge it?
>>>
>>> I'm afraid this needs to be done by Konrad or one of the Linux
>>> maintainers, I don't have an account on kernel.org in order to send
>>> pull requests to Jens.
>> Can you pls help merge it?
> 
> Could you kindly repost it with the updated tags _and_ against Linus's latest
> branch?
Sure, v2 sent. Please check.

Thanks,
Junxiao.
> 
> I get:
> [konrad@char linux]$ git am -s < /tmp/a
> Applying: xen-blkfront: fix mq start/stop race
> error: patch failed: drivers/block/xen-blkfront.c:912
> error: drivers/block/xen-blkfront.c: patch does not apply
> Patch failed at 0001 xen-blkfront: fix mq start/stop race
> The copy of the patch that failed is found in: .git/rebase-apply/patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> 
>>
>> Thanks,
>> Junxiao.
>>>
>>> Roger.
>>>
>>
diff mbox

Patch

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8bb160cd00e1..4767b82b2cf6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -912,8 +912,8 @@  out_err:
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	blk_mq_stop_hw_queue(hctx);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }