Message ID | 20190404084320.24681-4-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | blk-mq: fix races related with freeing queue | expand |
On Thu, 2019-04-04 at 16:43 +0800, Ming Lei wrote: > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b512ba0cb359..41c12d9008b7 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues) > return; > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > blk_mq_freeze_queue(q); > + blk_mq_quiesce_queue(q); > + blk_sync_queue(q); > + } > /* > * Sync with blk_mq_queue_tag_busy_iter. > */ > @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > list_for_each_entry(q, &set->tag_list, tag_set_list) > blk_mq_elv_switch_back(&head, q); > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + blk_mq_unquiesce_queue(q); > blk_mq_unfreeze_queue(q); > + } > } > > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) Are you sure this patch is sufficient? What prevents that blk_mq_run_hw_queues() gets called after the blk_mq_quiesce() and blk_sync_queue() calls have finished and before the queue is unfrozen? Bart.
On Thu, Apr 4, 2019 at 11:57 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On Thu, 2019-04-04 at 16:43 +0800, Ming Lei wrote: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index b512ba0cb359..41c12d9008b7 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues) > > return; > > > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > > blk_mq_freeze_queue(q); > > + blk_mq_quiesce_queue(q); > > + blk_sync_queue(q); > > + } > > /* > > * Sync with blk_mq_queue_tag_busy_iter. > > */ > > @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > list_for_each_entry(q, &set->tag_list, tag_set_list) > > blk_mq_elv_switch_back(&head, q); > > > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > > + blk_mq_unquiesce_queue(q); > > blk_mq_unfreeze_queue(q); > > + } > > } > > > > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) > > Are you sure this patch is sufficient? What prevents that blk_mq_run_hw_queues() > gets called after the blk_mq_quiesce() and blk_sync_queue() calls have finished > and before the queue is unfrozen? blk_queue_quiesced(hctx->queue) is supposed to prevents blk_mq_run_hw_queues() from being called after the queue is quiesced. But there is another bug in blk_mq_quiesce() which shouldn't depend on hctx_lock(hctx, &srcu_idx) for this check, will fix blk_mq_quiesce() in V5. Thanks, Ming Lei
Hi Ming Why not add percpu_ref_tryget/put pair into run queue and requeue work, or before queue the work ? Then freezing queue could really implement to freeze the queue and there will be no any queue activity after freeze, including run queue and requeue work. Thanks Jianchao On 4/4/19 4:43 PM, Ming Lei wrote: > Inside __blk_mq_update_nr_hw_queues(), only request queues are frozen > before updating nr_hw_queues. > > However, even though blk_mq_freeze_queue() is returned, there might be > run queue activity not completed, then use-after-free may be triggered > on hctx and its fields. > > Fix this issue by really quiescing queue via blk_mq_quiesce_queue() and > blk_sync_queue() for making sure no any run queue activity pending > before releasing hctx. > > Cc: Dongli Zhang <dongli.zhang@oracle.com> > Cc: James Smart <james.smart@broadcom.com> > Cc: Bart Van Assche <bart.vanassche@wdc.com> > Cc: linux-scsi@vger.kernel.org, > Cc: Martin K . Petersen <martin.petersen@oracle.com>, > Cc: Christoph Hellwig <hch@lst.de>, > Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, > Cc: jianchao wang <jianchao.w.wang@oracle.com> > Reported-by: Bart Van Assche <bvanassche@acm.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b512ba0cb359..41c12d9008b7 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues) > return; > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > blk_mq_freeze_queue(q); > + blk_mq_quiesce_queue(q); > + blk_sync_queue(q); > + } > /* > * Sync with blk_mq_queue_tag_busy_iter. > */ > @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > list_for_each_entry(q, &set->tag_list, tag_set_list) > blk_mq_elv_switch_back(&head, q); > > - list_for_each_entry(q, &set->tag_list, tag_set_list) > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + blk_mq_unquiesce_queue(q); > blk_mq_unfreeze_queue(q); > + } > } > > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues) >
On Mon, Apr 8, 2019 at 11:17 AM jianchao.wang <jianchao.w.wang@oracle.com> wrote: > > Hi Ming > > Why not add percpu_ref_tryget/put pair into run queue and requeue work, > or before queue the work ? If following this direction, most of block layer API might need the pair. Also Jens has complained in another thread, the pair may introduce 1.2% performance loss, then we should avoid it in the fast path, such as blk_mq_run_hw_queue(). Given it is required that request queue is alive from kobject view before calling almost every block layer API, this lifetime issue should be addressed easily by moving the hctx free into queue's release handler. > > Then freezing queue could really implement to freeze the queue and there will be > no any queue activity after freeze, including run queue and requeue work. The queue activity is just block layer internal thing, not related with driver, so not a big deal by dealing with freeing hctx resources in release handler. Thanks, Ming Lei
diff --git a/block/blk-mq.c b/block/blk-mq.c index b512ba0cb359..41c12d9008b7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3224,8 +3224,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues) return; - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + blk_sync_queue(q); + } /* * Sync with blk_mq_queue_tag_busy_iter. */ @@ -3269,8 +3272,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_elv_switch_back(&head, q); - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { + blk_mq_unquiesce_queue(q); blk_mq_unfreeze_queue(q); + } } void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
Inside __blk_mq_update_nr_hw_queues(), only request queues are frozen before updating nr_hw_queues. However, even though blk_mq_freeze_queue() is returned, there might be run queue activity not completed, then use-after-free may be triggered on hctx and its fields. Fix this issue by really quiescing queue via blk_mq_quiesce_queue() and blk_sync_queue() for making sure no any run queue activity pending before releasing hctx. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Cc: jianchao wang <jianchao.w.wang@oracle.com> Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)