diff mbox series

[-next] block/wbt: fix negative inflight counter when remove scsi device

Message ID 20211213040907.2669480-1-qiulaibin@huawei.com (mailing list archive)
State New, archived
Headers show
Series [-next] block/wbt: fix negative inflight counter when remove scsi device | expand

Commit Message

QiuLaibin Dec. 13, 2021, 4:09 a.m. UTC
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.

The following is the scenario that triggered the problem.

T1                          T2                           T3
                            elevator_switch_mq
                            bfq_init_queue
                            wbt_disable_default <= Set
                            rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
                                                         scsi_remove_device
                                                         sd_remove
                                                         del_gendisk
                                                         blk_unregister_queue
                                                         elv_unregister_queue
                                                         wbt_enable_default
                                                         <= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.

Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
scsi remove scene.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
---
 block/blk-wbt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig Dec. 13, 2021, 5:16 p.m. UTC | #1
On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
> Submit_bio
>                                                          scsi_remove_device
>                                                          sd_remove
>                                                          del_gendisk
>                                                          blk_unregister_queue
>                                                          elv_unregister_queue
>                                                          wbt_enable_default
>                                                          <= Set rwb->enable_state (ON)
> q_qos_track
> <= rwb->enable_state (ON)
> ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
> lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
> 
> Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
> scsi remove scene.
> Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
> ---
>  block/blk-wbt.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 3ed71b8da887..537f77bb1365 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
>  {
>  	struct rq_qos *rqos = wbt_rq_qos(q);
>  
> +	/* Queue not registered? Maybe shutting down... */
> +	if (!blk_queue_registered(q))
> +		return;

Wouldn't it make more sense to simply not call wbt_enable_default from
elv_unregister_queue?
Ming Lei Dec. 14, 2021, 1:13 a.m. UTC | #2
On Mon, Dec 13, 2021 at 09:16:51AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
> > Submit_bio
> >                                                          scsi_remove_device
> >                                                          sd_remove
> >                                                          del_gendisk
> >                                                          blk_unregister_queue
> >                                                          elv_unregister_queue
> >                                                          wbt_enable_default
> >                                                          <= Set rwb->enable_state (ON)
> > q_qos_track
> > <= rwb->enable_state (ON)
> > ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
> > lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
> > 
> > Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
> > scsi remove scene.
> > Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
> > Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
> > ---
> >  block/blk-wbt.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> > index 3ed71b8da887..537f77bb1365 100644
> > --- a/block/blk-wbt.c
> > +++ b/block/blk-wbt.c
> > @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
> >  {
> >  	struct rq_qos *rqos = wbt_rq_qos(q);
> >  
> > +	/* Queue not registered? Maybe shutting down... */
> > +	if (!blk_queue_registered(q))
> > +		return;
> 
> Wouldn't it make more sense to simply not call wbt_enable_default from
> elv_unregister_queue?

wbt_disable_default() is called in bfq_init_root_group(), so wbt_enable_default
should be moved to bfq_exit_queue()?


Thanks,
Ming
QiuLaibin Dec. 14, 2021, 4:25 a.m. UTC | #3
On 2021/12/14 1:16, Christoph Hellwig wrote:
> On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
>> Submit_bio
>>                                                           scsi_remove_device
>>                                                           sd_remove
>>                                                           del_gendisk
>>                                                           blk_unregister_queue
>>                                                           elv_unregister_queue
>>                                                           wbt_enable_default
>>                                                           <= Set rwb->enable_state (ON)
>> q_qos_track
>> <= rwb->enable_state (ON)
>> ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
>> lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
>>
>> Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
>> scsi remove scene.
>> Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
>> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
>> ---
>>   block/blk-wbt.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>> index 3ed71b8da887..537f77bb1365 100644
>> --- a/block/blk-wbt.c
>> +++ b/block/blk-wbt.c
>> @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
>>   {
>>   	struct rq_qos *rqos = wbt_rq_qos(q);
>>   
>> +	/* Queue not registered? Maybe shutting down... */
>> +	if (!blk_queue_registered(q))
>> +		return;
> 
> Wouldn't it make more sense to simply not call wbt_enable_default from
> elv_unregister_queue?
> .
> 

Refer to your opinion, I will post another version of V2.
Please take a look again.
Christoph Hellwig Dec. 14, 2021, 8:07 a.m. UTC | #4
On Tue, Dec 14, 2021 at 09:13:10AM +0800, Ming Lei wrote:
> > Wouldn't it make more sense to simply not call wbt_enable_default from
> > elv_unregister_queue?
> 
> wbt_disable_default() is called in bfq_init_root_group(), so wbt_enable_default

s/bfq_init_root_group/bfq_init_queue/

But yes, that sounds like an even better idea.  Or maybe even an
elevator feature flag.
diff mbox series

Patch

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 3ed71b8da887..537f77bb1365 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -637,6 +637,10 @@  void wbt_enable_default(struct request_queue *q)
 {
 	struct rq_qos *rqos = wbt_rq_qos(q);
 
+	/* Queue not registered? Maybe shutting down... */
+	if (!blk_queue_registered(q))
+		return;
+
 	/* Throttling already enabled? */
 	if (rqos) {
 		if (RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
@@ -644,10 +648,6 @@  void wbt_enable_default(struct request_queue *q)
 		return;
 	}
 
-	/* Queue not registered? Maybe shutting down... */
-	if (!blk_queue_registered(q))
-		return;
-
 	if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
 		wbt_init(q);
 }