Message ID | 20180807174433.8374-14-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SCSI: introduce per-host admin queue & enable runtime PM | expand |
On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote: > @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q) > if (!q->dev) > return ret; > > + mutex_lock(&q->pm_lock); > spin_lock_irq(q->queue_lock); > if (q->nr_pending) { > ret = -EBUSY; > @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q) > q->rpm_status = RPM_SUSPENDING; > } Hello Ming, As far as I can see none of the patches in this series adds a call to blk_pm_add_request() in the blk-mq code. Does that mean that q->nr_pending will always be zero for blk-mq code with your approach and hence that runtime suspend can get triggered while I/O is in progress, e.g. if blk_queue_enter() is called concurrently with blk_pre_runtime_suspend()? Thanks, Bart.
On Tue, Aug 07, 2018 at 07:54:44PM +0000, Bart Van Assche wrote: > On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote: > > @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q) > > if (!q->dev) > > return ret; > > > > + mutex_lock(&q->pm_lock); > > spin_lock_irq(q->queue_lock); > > if (q->nr_pending) { > > ret = -EBUSY; > > @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q) > > q->rpm_status = RPM_SUSPENDING; > > } > > Hello Ming, > > As far as I can see none of the patches in this series adds a call to > blk_pm_add_request() in the blk-mq code. Does that mean that q->nr_pending > will always be zero for blk-mq code with your approach and hence that runtime The counter of q->nr_pending is legacy only, and I just forgot to check blk-mq queue idle in next patch, but the runtime PM still works in this way for blk-mq, :-) > suspend can get triggered while I/O is in progress, e.g. if blk_queue_enter() > is called concurrently with blk_pre_runtime_suspend()? In this patchset, for blk-mq, runtime suspend is tried when the auto_suspend period is expired. Yes, blk_queue_enter() can run concurrently with blk_pre_runtime_suspend(). 1) if queue isn't frozen, blk_pre_runtime_suspend() will wait for completion of the coming request 2) if queue is frozen, blk_queue_enter() will try to resume the device via blk_resume_queue(), and q->pm_lock is use for covering the two paths. But I should have checked the inflight request counter in blk_pre_runtime_suspend() like the following way before freezing queue, will add it in V2 if no one objects this approach. diff --git a/block/blk-core.c b/block/blk-core.c index 26f9ceb85318..d1a5cd1da861 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev) } EXPORT_SYMBOL(blk_pm_runtime_init); +static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx, + struct request *rq, void *priv, bool reserved) +{ + unsigned long *cnt = priv; + + (*cnt)++; +} + +static bool blk_mq_pm_queue_idle(struct request_queue *q) +{ + unsigned long idle_cnt; + + idle_cnt = 0; + blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt); + + return idle_cnt == 0; +} + /** * blk_pre_runtime_suspend - Pre runtime suspend check * @q: the queue of the device @@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init); int blk_pre_runtime_suspend(struct request_queue *q) { int ret = 0; + bool mq_idle = false; if (!q->dev) return ret; mutex_lock(&q->pm_lock); + + if (q->mq_ops) + mq_idle = blk_mq_pm_queue_idle(q); + spin_lock_irq(q->queue_lock); - if (q->nr_pending) { + if (q->nr_pending || !mq_idle) { ret = -EBUSY; pm_runtime_mark_last_busy(q->dev); } else { Thanks, Ming
Hi Ming and Bart Would you mind to combine your solution together ? ;) It could be like this: blk_pre_runtime_suspend if (q->mq_ops) { if (!blk_mq_pm_queue_idle(q)) { ret = -EBUSY; pm_runtime_mark_last_busy(q->dev); } else { blk_set_preempt_only(q); synchronize_rcu() if (!blk_mq_pm_queue_idle(q)) { blk_clear_preempt_only(q); ret = -EBUSY; } else { q->rpm_status = RPM_SUSPENDING; } } } else { spin_lock_irq(q->queue_lock); if (q->nr_pending) { ret = -EBUSY; pm_runtime_mark_last_busy(q->dev); } else { q->rpm_status = RPM_SUSPENDING; } spin_unlock_irq(q->queue_lock); } blk_queue_enter blk_resume_queue(q); wait_event(q->mq_freeze_wq, atomic_read(&q->mq_freeze_depth) == 0 || blk_queue_dying(q)); Thanks Jianchao On 08/08/2018 11:50 AM, Ming Lei wrote: > iff --git a/block/blk-core.c b/block/blk-core.c > index 26f9ceb85318..d1a5cd1da861 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev) > } > EXPORT_SYMBOL(blk_pm_runtime_init); > > +static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx, > + struct request *rq, void *priv, bool reserved) > +{ > + unsigned long *cnt = priv; > + > + (*cnt)++; > +} > + > +static bool blk_mq_pm_queue_idle(struct request_queue *q) > +{ > + unsigned long idle_cnt; > + > + idle_cnt = 0; > + blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt); > + > + return idle_cnt == 0; > +} > + > /** > * blk_pre_runtime_suspend - Pre runtime suspend check > * @q: the queue of the device > @@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init); > int blk_pre_runtime_suspend(struct request_queue *q) > { > int ret = 0; > + bool mq_idle = false; > > if (!q->dev) > return ret; > > mutex_lock(&q->pm_lock); > + > + if (q->mq_ops) > + mq_idle = blk_mq_pm_queue_idle(q); > + > spin_lock_irq(q->queue_lock); > - if (q->nr_pending) { > + if (q->nr_pending || !mq_idle) { > ret = -EBUSY; > pm_runtime_mark_last_busy(q->dev); > } else {
diff --git a/block/blk-core.c b/block/blk-core.c index ea12e3fcfa11..7390149f4fd1 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -884,6 +884,24 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask) } EXPORT_SYMBOL(blk_alloc_queue); +#ifdef CONFIG_PM +static void blk_resume_queue(struct request_queue *q) +{ + if (!q->dev) + return; + + /* PM request needs to be dealt with out of band */ + mutex_lock(&q->pm_lock); + if (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING) + pm_runtime_resume(q->dev); + mutex_unlock(&q->pm_lock); +} +#else +static void blk_resume_queue(struct request_queue *q) +{ +} +#endif + /** * blk_queue_enter() - try to increase q->q_usage_counter * @q: request queue pointer @@ -907,6 +925,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); + blk_resume_queue(q); + wait_event(q->mq_freeze_wq, atomic_read(&q->mq_freeze_depth) == 0 || blk_queue_dying(q)); @@ -1684,7 +1704,7 @@ EXPORT_SYMBOL_GPL(part_round_stats); #ifdef CONFIG_PM static void blk_pm_put_request(struct request *rq) { - if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending) + if (rq->q->dev && !--rq->q->nr_pending) pm_runtime_mark_last_busy(rq->q->dev); } #else @@ -2702,30 +2722,6 @@ void blk_account_io_done(struct request *req, u64 now) } } -#ifdef CONFIG_PM -/* - * Don't process normal requests when queue is suspended - * or in the process of suspending/resuming - */ -static bool blk_pm_allow_request(struct request *rq) -{ - switch (rq->q->rpm_status) { - case RPM_RESUMING: - case RPM_SUSPENDING: - return rq->rq_flags & RQF_PM; - case RPM_SUSPENDED: - return false; - default: - return true; - } -} -#else -static bool blk_pm_allow_request(struct request *rq) -{ - return true; -} -#endif - void blk_account_io_start(struct request *rq, bool new_io) { struct hd_struct *part; @@ -2770,13 +2766,8 @@ static struct request *elv_next_request(struct request_queue *q) WARN_ON_ONCE(q->mq_ops); while (1) { - list_for_each_entry(rq, &q->queue_head, queuelist) { - if (blk_pm_allow_request(rq)) - return rq; - - if (rq->rq_flags & RQF_SOFTBARRIER) - break; - } + list_for_each_entry(rq, &q->queue_head, queuelist) + return rq; /* * Flush request is running and flush request isn't queueable @@ -3737,6 +3728,7 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev) return; } + mutex_init(&q->pm_lock); q->dev = dev; q->rpm_status = RPM_ACTIVE; pm_runtime_set_autosuspend_delay(q->dev, -1); @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q) if (!q->dev) return ret; + mutex_lock(&q->pm_lock); spin_lock_irq(q->queue_lock); if (q->nr_pending) { ret = -EBUSY; @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q) q->rpm_status = RPM_SUSPENDING; } spin_unlock_irq(q->queue_lock); + + if (!ret) { + blk_freeze_queue(q); + q->rpm_q_frozen = true; + } + mutex_unlock(&q->pm_lock); + return ret; } EXPORT_SYMBOL(blk_pre_runtime_suspend); @@ -3854,16 +3854,22 @@ void blk_post_runtime_resume(struct request_queue *q, int err) if (!q->dev) return; + lockdep_assert_held(&q->pm_lock); + spin_lock_irq(q->queue_lock); if (!err) { q->rpm_status = RPM_ACTIVE; - __blk_run_queue(q); pm_runtime_mark_last_busy(q->dev); pm_request_autosuspend(q->dev); } else { q->rpm_status = RPM_SUSPENDED; } spin_unlock_irq(q->queue_lock); + + if (!err && q->rpm_q_frozen) { + blk_mq_unfreeze_queue(q); + q->rpm_q_frozen = false; + } } EXPORT_SYMBOL(blk_post_runtime_resume); diff --git a/block/elevator.c b/block/elevator.c index a34fecbe7e81..d389b942378b 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -560,15 +560,14 @@ void elv_bio_merged(struct request_queue *q, struct request *rq, #ifdef CONFIG_PM static void blk_pm_requeue_request(struct request *rq) { - if (rq->q->dev && !(rq->rq_flags & RQF_PM)) + if (rq->q->dev) rq->q->nr_pending--; } static void blk_pm_add_request(struct request_queue *q, struct request *rq) { - if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 && - (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING)) - pm_request_resume(q->dev); + if (q->dev) + q->nr_pending++; } #else static inline void blk_pm_requeue_request(struct request *rq) {} diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c78602f1a425..0aee332fbb63 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -279,6 +279,10 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, struct scsi_request *rq; int ret = DRIVER_ERROR << 24; struct request_queue *q = sdev->host->admin_q; + bool pm_rq = rq_flags & RQF_PM; + + if (!pm_rq) + scsi_autopm_get_device(sdev); req = blk_get_request(q, data_direction == DMA_TO_DEVICE ? @@ -328,6 +332,9 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, atomic_dec(&sdev->nr_admin_pending); wake_up_all(&sdev->admin_wq); + if (!pm_rq) + scsi_autopm_put_device(sdev); + return ret; } EXPORT_SYMBOL(__scsi_execute); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a9d371f55ca5..b3dcba83a8d7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -544,6 +544,8 @@ struct request_queue { struct device *dev; int rpm_status; unsigned int nr_pending; + bool rpm_q_frozen; + struct mutex pm_lock; #endif /*
This patch simplifies runtime PM support by the following approach: 1) resume device in blk_queue_enter() if this device isn't active 2) freeze queue in blk_pre_runtime_suspend() 3) unfreeze queue in blk_pre_runtime_resume() 4) remove checking on RRF_PM because now we requires out-of-band PM request to resume device Then we can remove blk_pm_allow_request(), and more importantly this way can be applied to blk-mq path too. Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-core.c | 72 ++++++++++++++++++++++++++----------------------- block/elevator.c | 7 +++-- drivers/scsi/scsi_lib.c | 7 +++++ include/linux/blkdev.h | 2 ++ 4 files changed, 51 insertions(+), 37 deletions(-)