diff mbox series

[2/2] nvme: reset request timeouts during fw activation

Message ID 20190522174812.5597-3-keith.busch@intel.com (mailing list archive)
State New, archived
Headers show
Series Reset timeout for paused hardware | expand

Commit Message

Keith Busch May 22, 2019, 5:48 p.m. UTC
The nvme controller may pause command processing during firmware
activation. The driver will quiesce queues during this time, but commands
dispatched prior to the notification will not be processed until the
hardware completes this activation.

We do not want those requests to time out while the hardware is in
this paused state as we don't expect those commands to complete during
this time, and that handling will interfere with the firmware activation
process.

In addition to quiescing the queues, halt timeout detection during the
paused state and reset the dispatched request deadlines when the hardware
exists that state. This action applies to IO and admin queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/nvme/host/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Ming Lei May 23, 2019, 10:19 a.m. UTC | #1
On Wed, May 22, 2019 at 11:48:12AM -0600, Keith Busch wrote:
> The nvme controller may pause command processing during firmware
> activation. The driver will quiesce queues during this time, but commands
> dispatched prior to the notification will not be processed until the
> hardware completes this activation.
> 
> We do not want those requests to time out while the hardware is in
> this paused state as we don't expect those commands to complete during
> this time, and that handling will interfere with the firmware activation
> process.
> 
> In addition to quiescing the queues, halt timeout detection during the
> paused state and reset the dispatched request deadlines when the hardware
> exists that state. This action applies to IO and admin queues.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/nvme/host/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1b7c2afd84cb..37a9a66ada22 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -89,6 +89,7 @@ static dev_t nvme_chr_devt;
>  static struct class *nvme_class;
>  static struct class *nvme_subsys_class;
>  
> +static void nvme_reset_queues(struct nvme_ctrl *ctrl);
>  static int nvme_revalidate_disk(struct gendisk *disk);
>  static void nvme_put_subsystem(struct nvme_subsystem *subsys);
>  static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
> @@ -3605,6 +3606,11 @@ static void nvme_fw_act_work(struct work_struct *work)
>  				msecs_to_jiffies(admin_timeout * 1000);
>  
>  	nvme_stop_queues(ctrl);
> +	nvme_sync_queues(ctrl);
> +
> +	blk_mq_quiesce_queue(ctrl->admin_q);
> +	blk_sync_queue(ctrl->admin_q);

blk_sync_queue() may not halt timeout detection completely since the
timeout work may reset timer again.

Also reset still may come during activating FW, is that a problem?


Thanks,
Ming
Keith Busch May 23, 2019, 1:34 p.m. UTC | #2
On Thu, May 23, 2019 at 03:19:54AM -0700, Ming Lei wrote:
> On Wed, May 22, 2019 at 11:48:12AM -0600, Keith Busch wrote:
> > @@ -3605,6 +3606,11 @@ static void nvme_fw_act_work(struct work_struct *work)
> >  				msecs_to_jiffies(admin_timeout * 1000);
> >  
> >  	nvme_stop_queues(ctrl);
> > +	nvme_sync_queues(ctrl);
> > +
> > +	blk_mq_quiesce_queue(ctrl->admin_q);
> > +	blk_sync_queue(ctrl->admin_q);
> 
> blk_sync_queue() may not halt timeout detection completely since the
> timeout work may reset timer again.

Doh! Didn't hit that in testing, but point taken.
 
> Also reset still may come during activating FW, is that a problem?

IO timeout and user initiated resets should be avoided. A state machine
addition may be useful here.
Christoph Hellwig May 23, 2019, 2:07 p.m. UTC | #3
On Thu, May 23, 2019 at 07:34:29AM -0600, Keith Busch wrote:
> Doh! Didn't hit that in testing, but point taken.
>  
> > Also reset still may come during activating FW, is that a problem?
> 
> IO timeout and user initiated resets should be avoided. A state machine
> addition may be useful here.

Yep.  It almost sounds like we'd want a PAUSED state where resets just
keep returning RESET_TIMER without any other action.
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1b7c2afd84cb..37a9a66ada22 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -89,6 +89,7 @@  static dev_t nvme_chr_devt;
 static struct class *nvme_class;
 static struct class *nvme_subsys_class;
 
+static void nvme_reset_queues(struct nvme_ctrl *ctrl);
 static int nvme_revalidate_disk(struct gendisk *disk);
 static void nvme_put_subsystem(struct nvme_subsystem *subsys);
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
@@ -3605,6 +3606,11 @@  static void nvme_fw_act_work(struct work_struct *work)
 				msecs_to_jiffies(admin_timeout * 1000);
 
 	nvme_stop_queues(ctrl);
+	nvme_sync_queues(ctrl);
+
+	blk_mq_quiesce_queue(ctrl->admin_q);
+	blk_sync_queue(ctrl->admin_q);
+
 	while (nvme_ctrl_pp_status(ctrl)) {
 		if (time_after(jiffies, fw_act_timeout)) {
 			dev_warn(ctrl->device,
@@ -3618,7 +3624,12 @@  static void nvme_fw_act_work(struct work_struct *work)
 	if (ctrl->state != NVME_CTRL_LIVE)
 		return;
 
+	blk_mq_reset_rqs(ctrl->admin_q);
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+
+	nvme_reset_queues(ctrl);
 	nvme_start_queues(ctrl);
+
 	/* read FW slot information to clear the AER */
 	nvme_get_fw_slot_info(ctrl);
 }
@@ -3901,6 +3912,15 @@  void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+static void nvme_reset_queues(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_reset_rqs(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
 
 void nvme_sync_queues(struct nvme_ctrl *ctrl)
 {