[RFC,13/14] block: simplify runtime PM support

Message ID	20180807174433.8374-14-ming.lei@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Alan Stern <stern@rowland.harvard.edu>, Christoph Hellwig <hch@lst.de>, Bart Van Assche <bart.vanassche@wdc.com>, Jianchao Wang <jianchao.w.wang@oracle.com>, Hannes Reinecke <hare@suse.de>, Johannes Thumshirn <jthumshirn@suse.de>, Adrian Hunter <adrian.hunter@intel.com>, "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, linux-scsi@vger.kernel.org Subject: [RFC PATCH 13/14] block: simplify runtime PM support Date: Wed, 8 Aug 2018 01:44:32 +0800 Message-Id: <20180807174433.8374-14-ming.lei@redhat.com> In-Reply-To: <20180807174433.8374-1-ming.lei@redhat.com> References: <20180807174433.8374-1-ming.lei@redhat.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	SCSI: introduce per-host admin queue & enable runtime PM \| expand [RFC,00/14] SCSI: introduce per-host admin queue & enable runtime PM [RFC,01/14] blk-mq: allow to pass default queue flags for creating & initializing queue [RFC,02/14] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag [RFC,03/14] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible [RFC,04/14] SCSI: pass 'scsi_device' instance from 'scsi_request' [RFC,05/14] SCSI: prepare for introducing admin queue for legacy path [RFC,06/14] SCSI: pass scsi_device to scsi_mq_prep_fn [RFC,07/14] SCSI: don't set .queuedata in scsi_mq_alloc_queue() [RFC,08/14] SCSI: deal with admin queue busy [RFC,09/14] SCSI: create admin queue for each host [RFC,10/14] SCSI: use the dedicated admin queue to send admin commands [RFC,11/14] SCSI: transport_spi: resume a quiesced device [RFC,12/14] SCSI: use admin queue to implement queue QUIESCE [RFC,13/14] block: simplify runtime PM support [RFC,14/14] block: enable runtime PM for blk-mq

Message ID

20180807174433.8374-14-ming.lei@redhat.com (mailing list archive)

State

New, archived

Headers

From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
        Alan Stern <stern@rowland.harvard.edu>,
        Christoph Hellwig <hch@lst.de>,
        Bart Van Assche <bart.vanassche@wdc.com>,
        Jianchao Wang <jianchao.w.wang@oracle.com>,
        Hannes Reinecke <hare@suse.de>,
        Johannes Thumshirn <jthumshirn@suse.de>,
        Adrian Hunter <adrian.hunter@intel.com>,
        "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>,
        "Martin K. Petersen" <martin.petersen@oracle.com>,
        linux-scsi@vger.kernel.org
Subject: [RFC PATCH 13/14] block: simplify runtime PM support
Date: Wed,  8 Aug 2018 01:44:32 +0800
Message-Id: <20180807174433.8374-14-ming.lei@redhat.com>
In-Reply-To: <20180807174433.8374-1-ming.lei@redhat.com>
References: <20180807174433.8374-1-ming.lei@redhat.com>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

SCSI: introduce per-host admin queue & enable runtime PM | expand

Commit Message

Ming Lei Aug. 7, 2018, 5:44 p.m. UTC

This patch simplifies runtime PM support by the following approach:

1) resume device in blk_queue_enter() if this device isn't active

2) freeze queue in blk_pre_runtime_suspend()

3) unfreeze queue in blk_pre_runtime_resume()

4) remove checking on RRF_PM because now we requires out-of-band PM
request to resume device

Then we can remove blk_pm_allow_request(), and more importantly this way
can be applied to blk-mq path too.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c        | 72 ++++++++++++++++++++++++++-----------------------
 block/elevator.c        |  7 +++--
 drivers/scsi/scsi_lib.c |  7 +++++
 include/linux/blkdev.h  |  2 ++
 4 files changed, 51 insertions(+), 37 deletions(-)

Comments

Bart Van Assche Aug. 7, 2018, 7:54 p.m. UTC | #1

On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>         if (!q->dev)
>                 return ret;
>  
> +       mutex_lock(&q->pm_lock);
>         spin_lock_irq(q->queue_lock);
>         if (q->nr_pending) {
>                 ret = -EBUSY;
> @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>                 q->rpm_status = RPM_SUSPENDING;
>         }

Hello Ming,

As far as I can see none of the patches in this series adds a call to
blk_pm_add_request() in the blk-mq code. Does that mean that q->nr_pending
will always be zero for blk-mq code with your approach and hence that runtime
suspend can get triggered while I/O is in progress, e.g. if blk_queue_enter()
is called concurrently with blk_pre_runtime_suspend()?

Thanks,

Bart.

Ming Lei Aug. 8, 2018, 3:50 a.m. UTC | #2

On Tue, Aug 07, 2018 at 07:54:44PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 01:44 +0800, Ming Lei wrote:
> > @@ -3772,6 +3764,7 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> >         if (!q->dev)
> >                 return ret;
> >  
> > +       mutex_lock(&q->pm_lock);
> >         spin_lock_irq(q->queue_lock);
> >         if (q->nr_pending) {
> >                 ret = -EBUSY;
> > @@ -3780,6 +3773,13 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> >                 q->rpm_status = RPM_SUSPENDING;
> >         }
> 
> Hello Ming,
> 
> As far as I can see none of the patches in this series adds a call to
> blk_pm_add_request() in the blk-mq code. Does that mean that q->nr_pending
> will always be zero for blk-mq code with your approach and hence that runtime

The counter of q->nr_pending is legacy only, and I just forgot to check
blk-mq queue idle in next patch, but the runtime PM still works in this
way for blk-mq, :-)

> suspend can get triggered while I/O is in progress, e.g. if blk_queue_enter()
> is called concurrently with blk_pre_runtime_suspend()?

In this patchset, for blk-mq, runtime suspend is tried when the auto_suspend
period is expired.

Yes, blk_queue_enter() can run concurrently with blk_pre_runtime_suspend().

	1) if queue isn't frozen, blk_pre_runtime_suspend() will wait for
	completion of the coming request

	2) if queue is frozen, blk_queue_enter() will try to resume the device
	via blk_resume_queue(), and q->pm_lock is use for covering the two paths.

But I should have checked the inflight request counter in blk_pre_runtime_suspend()
like the following way before freezing queue, will add it in V2 if no
one objects this approach.

diff --git a/block/blk-core.c b/block/blk-core.c
index 26f9ceb85318..d1a5cd1da861 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 }
 EXPORT_SYMBOL(blk_pm_runtime_init);
 
+static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	unsigned long *cnt = priv;
+
+	(*cnt)++;
+}
+
+static bool blk_mq_pm_queue_idle(struct request_queue *q)
+{
+	unsigned long idle_cnt;
+
+	idle_cnt = 0;
+	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
+
+	return idle_cnt == 0;
+}
+
 /**
  * blk_pre_runtime_suspend - Pre runtime suspend check
  * @q: the queue of the device
@@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
 int blk_pre_runtime_suspend(struct request_queue *q)
 {
 	int ret = 0;
+	bool mq_idle = false;
 
 	if (!q->dev)
 		return ret;
 
 	mutex_lock(&q->pm_lock);
+
+	if (q->mq_ops)
+		mq_idle = blk_mq_pm_queue_idle(q);
+
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
+	if (q->nr_pending || !mq_idle) {
 		ret = -EBUSY;
 		pm_runtime_mark_last_busy(q->dev);
 	} else {

Thanks,
Ming

jianchao.wang Aug. 8, 2018, 7:57 a.m. UTC | #3

Hi Ming and Bart

Would you mind to combine your solution together ? ;)

It could be like this:

blk_pre_runtime_suspend


	if (q->mq_ops) {
		if (!blk_mq_pm_queue_idle(q)) {
 		    ret = -EBUSY;
  		    pm_runtime_mark_last_busy(q->dev);
                } else {
                    blk_set_preempt_only(q);
                    synchronize_rcu()
                    if (!blk_mq_pm_queue_idle(q)) {
                        blk_clear_preempt_only(q);
                        ret = -EBUSY;
                    } else {
                    q->rpm_status = RPM_SUSPENDING;
                    }
                }
        } else {
            spin_lock_irq(q->queue_lock);
            if (q->nr_pending) {
                ret = -EBUSY;
                pm_runtime_mark_last_busy(q->dev);
            } else {
                q->rpm_status = RPM_SUSPENDING;
            }
            spin_unlock_irq(q->queue_lock);
        }

blk_queue_enter
 
    blk_resume_queue(q);
    wait_event(q->mq_freeze_wq,
 	  atomic_read(&q->mq_freeze_depth) == 0 ||
 	  blk_queue_dying(q));


Thanks
Jianchao

On 08/08/2018 11:50 AM, Ming Lei wrote:
> iff --git a/block/blk-core.c b/block/blk-core.c
> index 26f9ceb85318..d1a5cd1da861 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3730,6 +3730,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  }
>  EXPORT_SYMBOL(blk_pm_runtime_init);
>  
> +static void blk_mq_pm_check_idle(struct blk_mq_hw_ctx *hctx,
> +		struct request *rq, void *priv, bool reserved)
> +{
> +	unsigned long *cnt = priv;
> +
> +	(*cnt)++;
> +}
> +
> +static bool blk_mq_pm_queue_idle(struct request_queue *q)
> +{
> +	unsigned long idle_cnt;
> +
> +	idle_cnt = 0;
> +	blk_mq_queue_tag_busy_iter(q, blk_mq_pm_check_idle, &idle_cnt);
> +
> +	return idle_cnt == 0;
> +}
> +
>  /**
>   * blk_pre_runtime_suspend - Pre runtime suspend check
>   * @q: the queue of the device
> @@ -3754,13 +3772,18 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>  int blk_pre_runtime_suspend(struct request_queue *q)
>  {
>  	int ret = 0;
> +	bool mq_idle = false;
>  
>  	if (!q->dev)
>  		return ret;
>  
>  	mutex_lock(&q->pm_lock);
> +
> +	if (q->mq_ops)
> +		mq_idle = blk_mq_pm_queue_idle(q);
> +
>  	spin_lock_irq(q->queue_lock);
> -	if (q->nr_pending) {
> +	if (q->nr_pending || !mq_idle) {
>  		ret = -EBUSY;
>  		pm_runtime_mark_last_busy(q->dev);
>  	} else {

diff --git a/block/blk-core.c b/block/blk-core.c
index ea12e3fcfa11..7390149f4fd1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -884,6 +884,24 @@  struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
+#ifdef CONFIG_PM
+static void blk_resume_queue(struct request_queue *q)
+{
+	if (!q->dev)
+		return;
+
+	/* PM request needs to be dealt with out of band */
+	mutex_lock(&q->pm_lock);
+	if (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING)
+		pm_runtime_resume(q->dev);
+	mutex_unlock(&q->pm_lock);
+}
+#else
+static void blk_resume_queue(struct request_queue *q)
+{
+}
+#endif
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -907,6 +925,8 @@  int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
+		blk_resume_queue(q);
+
 		wait_event(q->mq_freeze_wq,
 			   atomic_read(&q->mq_freeze_depth) == 0 ||
 			   blk_queue_dying(q));
@@ -1684,7 +1704,7 @@  EXPORT_SYMBOL_GPL(part_round_stats);
 #ifdef CONFIG_PM
 static void blk_pm_put_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+	if (rq->q->dev && !--rq->q->nr_pending)
 		pm_runtime_mark_last_busy(rq->q->dev);
 }
 #else
@@ -2702,30 +2722,6 @@  void blk_account_io_done(struct request *req, u64 now)
 	}
 }
 
-#ifdef CONFIG_PM
-/*
- * Don't process normal requests when queue is suspended
- * or in the process of suspending/resuming
- */
-static bool blk_pm_allow_request(struct request *rq)
-{
-	switch (rq->q->rpm_status) {
-	case RPM_RESUMING:
-	case RPM_SUSPENDING:
-		return rq->rq_flags & RQF_PM;
-	case RPM_SUSPENDED:
-		return false;
-	default:
-		return true;
-	}
-}
-#else
-static bool blk_pm_allow_request(struct request *rq)
-{
-	return true;
-}
-#endif
-
 void blk_account_io_start(struct request *rq, bool new_io)
 {
 	struct hd_struct *part;
@@ -2770,13 +2766,8 @@  static struct request *elv_next_request(struct request_queue *q)
 	WARN_ON_ONCE(q->mq_ops);
 
 	while (1) {
-		list_for_each_entry(rq, &q->queue_head, queuelist) {
-			if (blk_pm_allow_request(rq))
-				return rq;
-
-			if (rq->rq_flags & RQF_SOFTBARRIER)
-				break;
-		}
+		list_for_each_entry(rq, &q->queue_head, queuelist)
+			return rq;
 
 		/*
 		 * Flush request is running and flush request isn't queueable
@@ -3737,6 +3728,7 @@  void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 		return;
 	}
 
+	mutex_init(&q->pm_lock);
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
 	pm_runtime_set_autosuspend_delay(q->dev, -1);
@@ -3772,6 +3764,7 @@  int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
+	mutex_lock(&q->pm_lock);
 	spin_lock_irq(q->queue_lock);
 	if (q->nr_pending) {
 		ret = -EBUSY;
@@ -3780,6 +3773,13 @@  int blk_pre_runtime_suspend(struct request_queue *q)
 		q->rpm_status = RPM_SUSPENDING;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!ret) {
+		blk_freeze_queue(q);
+		q->rpm_q_frozen = true;
+	}
+	mutex_unlock(&q->pm_lock);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_pre_runtime_suspend);
@@ -3854,16 +3854,22 @@  void blk_post_runtime_resume(struct request_queue *q, int err)
 	if (!q->dev)
 		return;
 
+	lockdep_assert_held(&q->pm_lock);
+
 	spin_lock_irq(q->queue_lock);
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
 		q->rpm_status = RPM_SUSPENDED;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!err && q->rpm_q_frozen) {
+		blk_mq_unfreeze_queue(q);
+		q->rpm_q_frozen = false;
+	}
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
diff --git a/block/elevator.c b/block/elevator.c
index a34fecbe7e81..d389b942378b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -560,15 +560,14 @@  void elv_bio_merged(struct request_queue *q, struct request *rq,
 #ifdef CONFIG_PM
 static void blk_pm_requeue_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+	if (rq->q->dev)
 		rq->q->nr_pending--;
 }
 
 static void blk_pm_add_request(struct request_queue *q, struct request *rq)
 {
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
-	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
+	if (q->dev)
+		q->nr_pending++;
 }
 #else
 static inline void blk_pm_requeue_request(struct request *rq) {}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c78602f1a425..0aee332fbb63 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -279,6 +279,10 @@  int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 	struct request_queue *q = sdev->host->admin_q;
+	bool pm_rq = rq_flags & RQF_PM;
+
+	if (!pm_rq)
+		scsi_autopm_get_device(sdev);
 
 	req = blk_get_request(q,
 			data_direction == DMA_TO_DEVICE ?
@@ -328,6 +332,9 @@  int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	atomic_dec(&sdev->nr_admin_pending);
 	wake_up_all(&sdev->admin_wq);
 
+	if (!pm_rq)
+		scsi_autopm_put_device(sdev);
+
 	return ret;
 }
 EXPORT_SYMBOL(__scsi_execute);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a9d371f55ca5..b3dcba83a8d7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -544,6 +544,8 @@  struct request_queue {
 	struct device		*dev;
 	int			rpm_status;
 	unsigned int		nr_pending;
+	bool			rpm_q_frozen;
+	struct mutex		pm_lock;
 #endif
 
 	/*

[RFC,13/14] block: simplify runtime PM support

Commit Message

Comments

Patch