Message ID | 20200914071903.65704-1-yang.yang@vivo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: fix hang issue in blk_queue_enter() | expand |
On 2020-09-14 00:19, Yang Yang wrote: > There is a race between blk_queue_enter() and block layer's runtime > suspend. > > CPU0 CPU1 > --------------------------------- ------------------------------- > blk_pre_runtime_suspend(q) { blk_queue_enter() { > /* q->rpm_status=RPM_ACTIVE */ > blk_set_pm_only(q) > /* q->pm_only=1 */ > blk_freeze_queue_start(q) > blk_mq_unfreeze_queue(q) > if (percpu_ref_tryget_live()) { > /* pm=0 && q->pm_only=1 */ > if (pm || !blk_queue_pm_only(q)) { > } else { > percpu_ref_put() > } > } > wait_event(q->mq_freeze_wq, > (!q->mq_freeze_depth && > /* q->rpm_status=RPM_ACTIVE > q->pm_only=1 */ > (pm || (blk_pm_request_resume(q), > !blk_queue_pm_only(q)))) || > blk_queue_dying(q)) > } > spin_lock_irq(&q->queue_lock) > q->rpm_status = RPM_SUSPENDING > spin_unlock_irq(&q->queue_lock) > } > > At this point blk_pm_request_resume() missed the chance to resume the > queue, so blk_queue_enter() may wait here forever. > The solution is to wake up the mq_freeze_wq after runtime suspend > completed, make blk_pm_request_resume() reexamine the q->rpm_status flag. > > Signed-off-by: Yang Yang <yang.yang@vivo.com> > --- > block/blk-pm.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/block/blk-pm.c b/block/blk-pm.c > index b85234d758f7..dec7d0aef606 100644 > --- a/block/blk-pm.c > +++ b/block/blk-pm.c > @@ -132,6 +132,8 @@ void blk_post_runtime_suspend(struct request_queue *q, int err) > > if (err) > blk_clear_pm_only(q); > + else > + wake_up_all(&q->mq_freeze_wq); > } > EXPORT_SYMBOL(blk_post_runtime_suspend); Please verify whether the following patch series also fixes the reported hang: https://lore.kernel.org/linux-block/20200906012219.17893-1-bvanassche@acm.org/T/#t Thanks, Bart.
diff --git a/block/blk-pm.c b/block/blk-pm.c index b85234d758f7..dec7d0aef606 100644 --- a/block/blk-pm.c +++ b/block/blk-pm.c @@ -132,6 +132,8 @@ void blk_post_runtime_suspend(struct request_queue *q, int err) if (err) blk_clear_pm_only(q); + else + wake_up_all(&q->mq_freeze_wq); } EXPORT_SYMBOL(blk_post_runtime_suspend);
There is a race between blk_queue_enter() and block layer's runtime suspend. CPU0 CPU1 --------------------------------- ------------------------------- blk_pre_runtime_suspend(q) { blk_queue_enter() { /* q->rpm_status=RPM_ACTIVE */ blk_set_pm_only(q) /* q->pm_only=1 */ blk_freeze_queue_start(q) blk_mq_unfreeze_queue(q) if (percpu_ref_tryget_live()) { /* pm=0 && q->pm_only=1 */ if (pm || !blk_queue_pm_only(q)) { } else { percpu_ref_put() } } wait_event(q->mq_freeze_wq, (!q->mq_freeze_depth && /* q->rpm_status=RPM_ACTIVE q->pm_only=1 */ (pm || (blk_pm_request_resume(q), !blk_queue_pm_only(q)))) || blk_queue_dying(q)) } spin_lock_irq(&q->queue_lock) q->rpm_status = RPM_SUSPENDING spin_unlock_irq(&q->queue_lock) } At this point blk_pm_request_resume() missed the chance to resume the queue, so blk_queue_enter() may wait here forever. The solution is to wake up the mq_freeze_wq after runtime suspend completed, make blk_pm_request_resume() reexamine the q->rpm_status flag. Signed-off-by: Yang Yang <yang.yang@vivo.com> --- block/blk-pm.c | 2 ++ 1 file changed, 2 insertions(+)