diff mbox series

[V4,3/7] blk-mq: quiesce queue before updating nr_hw_queues

Message ID 20190404084320.24681-4-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: fix races related with freeing queue | expand

Commit Message

Ming Lei April 4, 2019, 8:43 a.m. UTC
Inside __blk_mq_update_nr_hw_queues(), only request queues are frozen
before updating nr_hw_queues.

However, even though blk_mq_freeze_queue() is returned, there might be
run queue activity not completed, then use-after-free may be triggered
on hctx and its fields.

Fix this issue by really quiescing queue via blk_mq_quiesce_queue() and
blk_sync_queue() for making sure no any run queue activity pending
before releasing hctx.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Bart Van Assche April 4, 2019, 3:57 p.m. UTC | #1
On Thu, 2019-04-04 at 16:43 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b512ba0cb359..41c12d9008b7 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>         if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
>                 return;
>  
> -       list_for_each_entry(q, &set->tag_list, tag_set_list)
> +       list_for_each_entry(q, &set->tag_list, tag_set_list) {
>                 blk_mq_freeze_queue(q);
> +               blk_mq_quiesce_queue(q);
> +               blk_sync_queue(q);
> +       }
>         /*
>          * Sync with blk_mq_queue_tag_busy_iter.
>          */
> @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>         list_for_each_entry(q, &set->tag_list, tag_set_list)
>                 blk_mq_elv_switch_back(&head, q);
>  
> -       list_for_each_entry(q, &set->tag_list, tag_set_list)
> +       list_for_each_entry(q, &set->tag_list, tag_set_list) {
> +               blk_mq_unquiesce_queue(q);
>                 blk_mq_unfreeze_queue(q);
> +       }
>  }
>  
>  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)

Are you sure this patch is sufficient? What prevents that blk_mq_run_hw_queues()
gets called after the blk_mq_quiesce() and blk_sync_queue() calls have finished
and before the queue is unfrozen?

Bart.
Ming Lei April 4, 2019, 8:55 p.m. UTC | #2
On Thu, Apr 4, 2019 at 11:57 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Thu, 2019-04-04 at 16:43 +0800, Ming Lei wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index b512ba0cb359..41c12d9008b7 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> >         if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
> >                 return;
> >
> > -       list_for_each_entry(q, &set->tag_list, tag_set_list)
> > +       list_for_each_entry(q, &set->tag_list, tag_set_list) {
> >                 blk_mq_freeze_queue(q);
> > +               blk_mq_quiesce_queue(q);
> > +               blk_sync_queue(q);
> > +       }
> >         /*
> >          * Sync with blk_mq_queue_tag_busy_iter.
> >          */
> > @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> >         list_for_each_entry(q, &set->tag_list, tag_set_list)
> >                 blk_mq_elv_switch_back(&head, q);
> >
> > -       list_for_each_entry(q, &set->tag_list, tag_set_list)
> > +       list_for_each_entry(q, &set->tag_list, tag_set_list) {
> > +               blk_mq_unquiesce_queue(q);
> >                 blk_mq_unfreeze_queue(q);
> > +       }
> >  }
> >
> >  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
>
> Are you sure this patch is sufficient? What prevents that blk_mq_run_hw_queues()
> gets called after the blk_mq_quiesce() and blk_sync_queue() calls have finished
> and before the queue is unfrozen?

blk_queue_quiesced(hctx->queue) is supposed to prevents blk_mq_run_hw_queues()
from being called after the queue is quiesced.

But there is another bug in blk_mq_quiesce() which shouldn't depend on
hctx_lock(hctx, &srcu_idx)
for this check, will fix blk_mq_quiesce() in V5.

Thanks,
Ming Lei
jianchao.wang April 8, 2019, 3:16 a.m. UTC | #3
Hi Ming

Why not add percpu_ref_tryget/put pair into run queue and requeue work,
or before queue the work ?

Then freezing queue could really implement to freeze the queue and there will be
no any queue activity after freeze, including run queue and requeue work.

Thanks
Jianchao

On 4/4/19 4:43 PM, Ming Lei wrote:
> Inside __blk_mq_update_nr_hw_queues(), only request queues are frozen
> before updating nr_hw_queues.
> 
> However, even though blk_mq_freeze_queue() is returned, there might be
> run queue activity not completed, then use-after-free may be triggered
> on hctx and its fields.
> 
> Fix this issue by really quiescing queue via blk_mq_quiesce_queue() and
> blk_sync_queue() for making sure no any run queue activity pending
> before releasing hctx.
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: linux-scsi@vger.kernel.org,
> Cc: Martin K . Petersen <martin.petersen@oracle.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
> Cc: jianchao wang <jianchao.w.wang@oracle.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b512ba0cb359..41c12d9008b7 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
>  		return;
>  
> -	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_freeze_queue(q);
> +		blk_mq_quiesce_queue(q);
> +		blk_sync_queue(q);
> +	}
>  	/*
>  	 * Sync with blk_mq_queue_tag_busy_iter.
>  	 */
> @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
>  		blk_mq_elv_switch_back(&head, q);
>  
> -	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +	list_for_each_entry(q, &set->tag_list, tag_set_list) {
> +		blk_mq_unquiesce_queue(q);
>  		blk_mq_unfreeze_queue(q);
> +	}
>  }
>  
>  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
>
Ming Lei April 8, 2019, 9:01 a.m. UTC | #4
On Mon, Apr 8, 2019 at 11:17 AM jianchao.wang
<jianchao.w.wang@oracle.com> wrote:
>
> Hi Ming
>
> Why not add percpu_ref_tryget/put pair into run queue and requeue work,
> or before queue the work ?

If following this direction, most of block layer API might need the pair.

Also Jens has complained in another thread, the pair may introduce
1.2% performance
loss, then we should avoid it in the fast path, such as blk_mq_run_hw_queue().

Given it is required that request queue is alive from kobject view
before calling almost
every block layer API, this lifetime issue should be addressed easily
by moving the hctx
free into queue's release handler.

>
> Then freezing queue could really implement to freeze the queue and there will be
> no any queue activity after freeze, including run queue and requeue work.

The queue activity is just block layer internal thing, not related
with driver, so not a big
deal by dealing with freeing hctx resources in release handler.

Thanks,
Ming Lei
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b512ba0cb359..41c12d9008b7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3224,8 +3224,11 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
 		return;
 
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_freeze_queue(q);
+		blk_mq_quiesce_queue(q);
+		blk_sync_queue(q);
+	}
 	/*
 	 * Sync with blk_mq_queue_tag_busy_iter.
 	 */
@@ -3269,8 +3272,10 @@  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_elv_switch_back(&head, q);
 
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		blk_mq_unquiesce_queue(q);
 		blk_mq_unfreeze_queue(q);
+	}
 }
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)