diff mbox series

[03/40] scsi: iscsi: remove unneeded task state check

Message ID 20210403232333.212927-4-michael.christie@oracle.com (mailing list archive)
State Changes Requested
Headers show
Series iscsi lock and refcount fix ups | expand

Commit Message

Mike Christie April 3, 2021, 11:22 p.m. UTC
The patch:

commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock")

added an extra task->state because for

commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix
list corruption regression")

we didn't know why we ended up with cmds on the list and thought it
might have been a bad target sending a response while we were still
sending the cmd. It turns out the bug was just a race in libiscsi/
libiscsi_tcp where

1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue.
2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t.
It dequeues it and iscsi_tcp_task_xmit starts to process it.
3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the
requeue list.
4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of
data, so the cmd is done.
5. target sends the response.
6. On a different CPU from #3, iscsi_complete_task processes the response.
Since there was no common lock for the list, the lists/tasks pointers are
not fully in sync, so could end up with list corruption.

Since it was just a race on our side, this patch removes the extra check.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/libiscsi.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Lee Duncan April 5, 2021, 4:03 p.m. UTC | #1
On 4/3/21 4:22 PM, Mike Christie wrote:
> The patch:
> 
> commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock")
> 
> added an extra task->state because for
> 
> commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix
> list corruption regression")
> 
> we didn't know why we ended up with cmds on the list and thought it
> might have been a bad target sending a response while we were still
> sending the cmd. It turns out the bug was just a race in libiscsi/
> libiscsi_tcp where
> 
> 1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue.
> 2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t.
> It dequeues it and iscsi_tcp_task_xmit starts to process it.
> 3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the
> requeue list.
> 4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of
> data, so the cmd is done.
> 5. target sends the response.
> 6. On a different CPU from #3, iscsi_complete_task processes the response.
> Since there was no common lock for the list, the lists/tasks pointers are
> not fully in sync, so could end up with list corruption.
> 
> Since it was just a race on our side, this patch removes the extra check.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/libiscsi.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 643edc4eb6fe..94cb9410230a 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -562,16 +562,19 @@ static bool cleanup_queued_task(struct iscsi_task *task)
>  	struct iscsi_conn *conn = task->conn;
>  	bool early_complete = false;
>  
> -	/* Bad target might have completed task while it was still running */
> +	/*
> +	 * We might have raced where we handled a R2T early and got a response
> +	 * but have not yet taken the task off the requeue list, then a TMF or
> +	 * recovery happened and so we can still see it here.
> +	 */
>  	if (task->state == ISCSI_TASK_COMPLETED)
>  		early_complete = true;
>  
>  	if (!list_empty(&task->running)) {
>  		list_del_init(&task->running);
>  		/*
> -		 * If it's on a list but still running, this could be from
> -		 * a bad target sending a rsp early, cleanup from a TMF, or
> -		 * session recovery.
> +		 * If it's on a list but still running this could be cleanup
> +		 * from a TMF or session recovery.
>  		 */
>  		if (task->state == ISCSI_TASK_RUNNING ||
>  		    task->state == ISCSI_TASK_COMPLETED)
> @@ -1470,7 +1473,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task,
>  	}
>  	/* regular RX path uses back_lock */
>  	spin_lock(&conn->session->back_lock);
> -	if (rc && task->state == ISCSI_TASK_RUNNING) {
> +	if (rc) {
>  		/*
>  		 * get an extra ref that is released next time we access it
>  		 * as conn->task above.
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>
diff mbox series

Patch

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 643edc4eb6fe..94cb9410230a 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -562,16 +562,19 @@  static bool cleanup_queued_task(struct iscsi_task *task)
 	struct iscsi_conn *conn = task->conn;
 	bool early_complete = false;
 
-	/* Bad target might have completed task while it was still running */
+	/*
+	 * We might have raced where we handled a R2T early and got a response
+	 * but have not yet taken the task off the requeue list, then a TMF or
+	 * recovery happened and so we can still see it here.
+	 */
 	if (task->state == ISCSI_TASK_COMPLETED)
 		early_complete = true;
 
 	if (!list_empty(&task->running)) {
 		list_del_init(&task->running);
 		/*
-		 * If it's on a list but still running, this could be from
-		 * a bad target sending a rsp early, cleanup from a TMF, or
-		 * session recovery.
+		 * If it's on a list but still running this could be cleanup
+		 * from a TMF or session recovery.
 		 */
 		if (task->state == ISCSI_TASK_RUNNING ||
 		    task->state == ISCSI_TASK_COMPLETED)
@@ -1470,7 +1473,7 @@  static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task,
 	}
 	/* regular RX path uses back_lock */
 	spin_lock(&conn->session->back_lock);
-	if (rc && task->state == ISCSI_TASK_RUNNING) {
+	if (rc) {
 		/*
 		 * get an extra ref that is released next time we access it
 		 * as conn->task above.