diff mbox

scsi_dh_alua: Add return value and check for alua_rtpg_queue() to avoid DM devices I/Os hang

Message ID 1479983105-7264-1-git-send-email-tang.junhui@zte.com.cn (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

tang.junhui@zte.com.cn Nov. 24, 2016, 10:25 a.m. UTC
From: "tang.junhui" <tang.junhui@zte.com.cn>

Activate_complete fn() must be called in alua_activate() if
alua_rtpg_queue() failed, otherwise, it would cause I/Os hang in DM
devices. So this patch add return value and check for alua_rtpg_queue().

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Bart Van Assche Dec. 1, 2016, 6:12 p.m. UTC | #1
On 11/24/2016 02:25 AM, tang.junhui@zte.com.cn wrote:
> Activate_complete fn() must be called in alua_activate() if
> alua_rtpg_queue() failed, otherwise, it would cause I/Os hang in DM
> devices. So this patch add return value and check for alua_rtpg_queue().

Hello Tang,

Please drop this patch. I think the alua_rtpg_queue() caller should 
ensure that pg != NULL.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Dec. 2, 2016, 2:58 a.m. UTC | #2
On 12/01/16 17:49, tang.junhui@zte.com.cn wrote:
>> Bart wrote:
>> Please drop this patch. I think the alua_rtpg_queue() caller should
>> ensure that pg != NULL.
>
> Failure may also be occurred in queue_delayed_work(),
> since it would cause serious problems,
> so I think we are worth checking for it.

Hello Tang,

Have you been able to trigger the condition explained in the patch 
description or is this only something you think that can happen based on 
your interpretation of the source code? My comments about the checks 
that have been added are:
* All alua_rtpg_queue() callers pass a non-NULL pointer as first 
argument which means that the return statement under "if (!pg)" in 
alua_rtpg_queue() is never executed.
* Even if queue_delayed_work() returns 0 the qdata work passed to 
alua_rtpg_queue() is still added to the pg->rtpg_list and hence will be 
executed once the delayed work is executed. So I think that the 
condition you described (fn() not called) cannot happen. From 
alua_rtpg_work():

	list_splice_init(&pg->rtpg_list, &qdata_list);
         [ ... ]
	list_for_each_entry_safe(qdata, tmp, &qdata_list, entry) {
		list_del(&qdata->entry);
		if (qdata->callback_fn)
			qdata->callback_fn(qdata->callback_data, err);
		kfree(qdata);
	}

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Dec. 2, 2016, 3:40 a.m. UTC | #3
On 12/01/16 19:21, tang.junhui@zte.com.cn wrote:
> Hello Bart,
>> * Even if queue_delayed_work() returns 0 the qdata work passed to
>> alua_rtpg_queue() is still added to the pg->rtpg_list and hence will be
>> executed once the delayed work is executed. So I think that the
>> condition you described (fn() not called) cannot happen.
>
> I find it by reading code.
>
> How did you think that it will be
> executed once the delayed work is executed?
> It is not re-queued to the pg->rtpg_work again.
> It is triggered by pgpath->activate_path.work in dm-mod,
> and maybe it would never run anymore.

Hello Tang,

Are you aware that if queue_delayed_work() returns 0 that that means 
that a work item has already been queued (pg->rtpg_work in this case)? 
 From kernel/workqueue.c:

/**
  * queue_delayed_work_on - queue work on specific CPU after delay
  * @cpu: CPU number to execute work on
  * @wq: workqueue to use
  * @dwork: work to queue
  * @delay: number of jiffies to wait before queueing
  *
  * Return: %false if @work was already on a queue, %true otherwise.  If
  * @delay is zero and @dwork is idle, it will be scheduled for immediate
  * execution.
  */

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 7bb2068..62075c7 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -113,7 +113,7 @@  struct alua_queue_data {
 #define ALUA_POLICY_SWITCH_ALL		1
 
 static void alua_rtpg_work(struct work_struct *work);
-static void alua_rtpg_queue(struct alua_port_group *pg,
+static int alua_rtpg_queue(struct alua_port_group *pg,
 			    struct scsi_device *sdev,
 			    struct alua_queue_data *qdata, bool force);
 static void alua_check(struct scsi_device *sdev, bool force);
@@ -862,7 +862,7 @@  static void alua_rtpg_work(struct work_struct *work)
 	kref_put(&pg->kref, release_port_group);
 }
 
-static void alua_rtpg_queue(struct alua_port_group *pg,
+static int alua_rtpg_queue(struct alua_port_group *pg,
 			    struct scsi_device *sdev,
 			    struct alua_queue_data *qdata, bool force)
 {
@@ -871,7 +871,7 @@  static void alua_rtpg_queue(struct alua_port_group *pg,
 	struct workqueue_struct *alua_wq = kaluad_wq;
 
 	if (!pg)
-		return;
+		return SCSI_DH_IO;
 
 	spin_lock_irqsave(&pg->lock, flags);
 	if (qdata) {
@@ -906,7 +906,10 @@  static void alua_rtpg_queue(struct alua_port_group *pg,
 		if (sdev)
 			scsi_device_put(sdev);
 		kref_put(&pg->kref, release_port_group);
+		return SCSI_DH_IO;
 	}
+
+	return SCSI_DH_OK;
 }
 
 /*
@@ -1007,11 +1010,12 @@  static int alua_activate(struct scsi_device *sdev,
 		mutex_unlock(&h->init_mutex);
 		goto out;
 	}
-	fn = NULL;
 	rcu_read_unlock();
 	mutex_unlock(&h->init_mutex);
 
-	alua_rtpg_queue(pg, sdev, qdata, true);
+	err = alua_rtpg_queue(pg, sdev, qdata, true);
+	if (!err)
+		fn = NULL;
 	kref_put(&pg->kref, release_port_group);
 out:
 	if (fn)